* list corruption in deferred_split_scan() @ 2019-07-10 21:43 Qian Cai 2019-07-11 0:16 ` Yang Shi ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Qian Cai @ 2019-07-10 21:43 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() 4e050f2df876 mm: thp: extract split_queue_* into a struct [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@ linux.alibaba.com/ [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is LIST_POISON1 (dead000000000100) [ 1145.739763][ T5764] ------------[ cut here ]------------ [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: G W 5.2.0-next-20190710+ #7 [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: ffffffffae95d318 [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888440bd380 [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: ffffed1108817a70 [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: dead000000000122 [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: dead000000000100 [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) knlGS:0000000000000000 [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: 00000000001406a0 [ 1145.870664][ T5764] Call Trace: [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 [ 1145.900159][ T5764] shrink_slab+0x253/0x440 [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf [ 1146.075426][ T5764] ? page_fault+0x5/0x20 [ 1146.079553][ T5764] page_fault+0x1b/0x20 [ 1146.083594][ T5764] RIP: 0033:0x410be0 [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f98f2674497 [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: 0000000000000000 [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: 0000000000000000 [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][ T5764] Shutting down cpus with NMI [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai @ 2019-07-11 0:16 ` Yang Shi 2019-07-11 21:07 ` Qian Cai 2019-07-15 4:52 ` Yang Shi 2019-07-24 21:13 ` Qian Cai 2 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-11 0:16 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel Hi Qian, Thanks for reporting the issue. But, I can't reproduce it on my machine. Could you please share more details about your test? How often did you run into this problem? Regards, Yang On 7/10/19 2:43 PM, Qian Cai wrote: > Running LTP oom01 test case with swap triggers a crash below. Revert the series > "Make deferred split shrinker memcg aware" [1] seems fix the issue. > > aefde94195ca mm: thp: make deferred split shrinker memcg aware > cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix > ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 > 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix > c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem > 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() > 4e050f2df876 mm: thp: extract split_queue_* into a struct > > [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@ > linux.alibaba.com/ > > [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is > LIST_POISON1 (dead000000000100) > [ 1145.739763][ T5764] ------------[ cut here ]------------ > [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! > [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: > G W 5.2.0-next-20190710+ #7 > [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 > Gen10, BIOS A40 01/25/2019 > [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a > [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e > a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f> > 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 > [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 > [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: > ffffffffae95d318 > [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: > ffff8888440bd380 > [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: > ffffed1108817a70 > [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: > dead000000000122 > [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: > dead000000000100 > [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) > knlGS:0000000000000000 > [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: > 00000000001406a0 > [ 1145.870664][ T5764] Call Trace: > [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 > [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 > [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 > [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 > [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 > [ 1145.900159][ T5764] shrink_slab+0x253/0x440 > [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 > [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 > [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 > [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 > [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 > [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 > [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 > [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 > [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 > [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 > [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 > [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 > [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 > [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 > [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 > [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 > [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 > [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 > [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 > [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 > [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 > [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 > [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 > [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 > [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 > [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 > [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 > [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 > [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 > [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 > [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 > [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 > [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 > [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf > [ 1146.075426][ T5764] ? page_fault+0x5/0x20 > [ 1146.079553][ T5764] page_fault+0x1b/0x20 > [ 1146.083594][ T5764] RIP: 0033:0x410be0 > [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 > 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> > 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f > [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 > [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: > 00007f98f2674497 > [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: > 0000000000000000 > [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: > 0000000000000000 > [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][ > T5764] Shutting down cpus with NMI > [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-11 0:16 ` Yang Shi @ 2019-07-11 21:07 ` Qian Cai 2019-07-12 19:12 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-07-11 21:07 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: > Hi Qian, > > > Thanks for reporting the issue. But, I can't reproduce it on my machine. > Could you please share more details about your test? How often did you > run into this problem? I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here is some more information. # cat .config https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config # numactl -H available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 node 0 size: 19984 MB node 0 free: 7251 MB node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 node 3 size: 0 MB node 3 free: 0 MB node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 node 4 size: 31524 MB node 4 free: 25165 MB node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 node 5 size: 0 MB node 5 free: 0 MB node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 node 6 size: 0 MB node 6 free: 0 MB node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 node 7 size: 0 MB node 7 free: 0 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7601 32-Core Processor Stepping: 2 CPU MHz: 2713.551 BogoMIPS: 4391.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task swapper/0/1 [ 9.072036][ T1] [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- 20190711+ #10 [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [ 9.072036][ T1] Call Trace: [ 9.072036][ T1] dump_stack+0x62/0x9a [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 [ 9.072036][ T1] ? shrink_slab+0x111/0x440 [ 9.072036][ T1] kasan_report+0xc/0xe [ 9.072036][ T1] __asan_load8+0x71/0xa0 [ 9.072036][ T1] shrink_slab+0x111/0x440 [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 [ 9.072036][ T1] shrink_node+0x31e/0xa30 [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 [ 9.072036][ T1] ? ktime_get+0x93/0x110 [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 9.072036][ T1] ? unwind_dump+0x260/0x260 [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 [ 9.072036][ T1] allocate_slab+0x600/0x11f0 [ 9.072036][ T1] new_slab+0x46/0x70 [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 [ 9.072036][ T1] ? create_object+0x3a/0x3e0 [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 [ 9.072036][ T1] ? create_object+0x3a/0x3e0 [ 9.072036][ T1] __slab_alloc+0x12/0x20 [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 [ 9.072036][ T1] create_object+0x3a/0x3e0 [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb [ 9.072036][ T1] acpi_load_tables+0x61/0x80 [ 9.072036][ T1] acpi_init+0x10d/0x44b [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 [ 9.072036][ T1] ? kernfs_get+0x13/0x20 [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 [ 9.072036][ T1] ? kset_register+0x31/0x50 [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.072036][ T1] do_one_initcall+0xfe/0x45a [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 [ 9.072036][ T1] ? up_write+0x6b/0x190 [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 [ 9.072036][ T1] ? rest_init+0x188/0x188 [ 9.072036][ T1] kernel_init+0x11/0x138 [ 9.072036][ T1] ? rest_init+0x188/0x188 [ 9.072036][ T1] ret_from_fork+0x22/0x40 [ 9.072036][ T1] ================================================================== [ 9.072036][ T1] Disabling lock debugging due to kernel taint [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: 0000000000000dc8 [ 9.152036][ T1] #PF: supervisor read access in kernel mode [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page [ 9.152036][ T1] PGD 0 P4D 0 [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 5.2.0-next-20190711+ #10 [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: ffffffff8112f288 [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: ffffffff824e0440 [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: fffffbfff049c088 [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: 00000000000001b8 [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88905757f440 [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) knlGS:0000000000000000 [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: 00000000001406b0 [ 9.152036][ T1] Call Trace: [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 [ 9.152036][ T1] shrink_node+0x31e/0xa30 [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 [ 9.152036][ T1] ? ktime_get+0x93/0x110 [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 9.152036][ T1] ? unwind_dump+0x260/0x260 [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 [ 9.152036][ T1] allocate_slab+0x600/0x11f0 [ 9.152036][ T1] new_slab+0x46/0x70 [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 [ 9.152036][ T1] ? create_object+0x3a/0x3e0 [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 [ 9.152036][ T1] ? create_object+0x3a/0x3e0 [ 9.152036][ T1] __slab_alloc+0x12/0x20 [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 [ 9.152036][ T1] create_object+0x3a/0x3e0 [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb [ 9.152036][ T1] acpi_load_tables+0x61/0x80 [ 9.152036][ T1] acpi_init+0x10d/0x44b [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 [ 9.152036][ T1] ? kernfs_get+0x13/0x20 [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 [ 9.152036][ T1] ? kset_register+0x31/0x50 [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 [ 9.152036][ T1] do_one_initcall+0xfe/0x45a [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 [ 9.152036][ T1] ? up_write+0x6b/0x190 [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 [ 9.152036][ T1] ? rest_init+0x188/0x188 [ 9.152036][ T1] kernel_init+0x11/0x138 [ 9.152036][ T1] ? rest_init+0x188/0x188 [ 9.152036][ T1] ret_from_fork+0x22/0x40 [ 9.152036][ T1] Modules linked in: [ 9.152036][ T1] CR2: 0000000000000dc8 [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: ffffffff8112f288 [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: ffffffff824e0440 [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: fffffbfff049c088 [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: 00000000000001b8 [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88905757f440 [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) knlGS:00000000 > > > Regards, > > Yang > > > > On 7/10/19 2:43 PM, Qian Cai wrote: > > Running LTP oom01 test case with swap triggers a crash below. Revert the > > series > > "Make deferred split shrinker memcg aware" [1] seems fix the issue. > > > > aefde94195ca mm: thp: make deferred split shrinker memcg aware > > cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix > > ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 > > 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix > > c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem > > 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() > > 4e050f2df876 mm: thp: extract split_queue_* into a struct > > > > [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang. > > shi@ > > linux.alibaba.com/ > > > > [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is > > LIST_POISON1 (dead000000000100) > > [ 1145.739763][ T5764] ------------[ cut here ]------------ > > [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! > > [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN > > NOPTI > > [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: > > G W 5.2.0-next-20190710+ #7 > > [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant > > DL385 > > Gen10, BIOS A40 01/25/2019 > > [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a > > [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 > > 9e > > a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff > > <0f> > > 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 > > [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 > > [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: > > ffffffffae95d318 > > [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: > > ffff8888440bd380 > > [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: > > ffffed1108817a70 > > [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: > > dead000000000122 > > [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: > > dead000000000100 > > [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) > > knlGS:0000000000000000 > > [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: > > 00000000001406a0 > > [ 1145.870664][ T5764] Call Trace: > > [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 > > [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 > > [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 > > [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 > > [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 > > [ 1145.900159][ T5764] shrink_slab+0x253/0x440 > > [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 > > [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 > > [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 > > [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 > > [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 > > [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 > > [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 > > [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 > > [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 > > [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 > > [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 > > [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > > [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 > > [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 > > [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 > > [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 > > [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 > > [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 > > [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 > > [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 > > [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 > > [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 > > [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 > > [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 > > [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 > > [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 > > [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 > > [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 > > [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 > > [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf > > [ 1146.075426][ T5764] ? page_fault+0x5/0x20 > > [ 1146.079553][ T5764] page_fault+0x1b/0x20 > > [ 1146.083594][ T5764] RIP: 0033:0x410be0 > > [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 > > 00 > > 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 > > <c6> > > 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f > > [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 > > [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: > > 00007f98f2674497 > > [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: > > 0000000000000000 > > [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: > > 0000000000000000 > > [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][ > > T5764] Shutting down cpus with NMI > > [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception > > ]--- > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-11 21:07 ` Qian Cai @ 2019-07-12 19:12 ` Yang Shi 2019-07-13 4:41 ` Yang Shi ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Yang Shi @ 2019-07-12 19:12 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/11/19 2:07 PM, Qian Cai wrote: > On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >> Hi Qian, >> >> >> Thanks for reporting the issue. But, I can't reproduce it on my machine. >> Could you please share more details about your test? How often did you >> run into this problem? > I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here > is some more information. > > # cat .config > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. Would you please try the below patch? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..66bd9db 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); } if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(page)); + list_del_init(page_deferred_list(page)); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); free_compound_page(page); > > # numactl -H > available: 8 nodes (0-7) > node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 > node 0 size: 19984 MB > node 0 free: 7251 MB > node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 > node 1 size: 0 MB > node 1 free: 0 MB > node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 > node 2 size: 0 MB > node 2 free: 0 MB > node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 > node 3 size: 0 MB > node 3 free: 0 MB > node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 > node 4 size: 31524 MB > node 4 free: 25165 MB > node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 > node 5 size: 0 MB > node 5 free: 0 MB > node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 > node 6 size: 0 MB > node 6 free: 0 MB > node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 > node 7 size: 0 MB > node 7 free: 0 MB > node distances: > node 0 1 2 3 4 5 6 7 > 0: 10 16 16 16 32 32 32 32 > 1: 16 10 16 16 32 32 32 32 > 2: 16 16 10 16 32 32 32 32 > 3: 16 16 16 10 32 32 32 32 > 4: 32 32 32 32 10 16 16 16 > 5: 32 32 32 32 16 10 16 16 > 6: 32 32 32 32 16 16 10 16 > 7: 32 32 32 32 16 16 16 10 > > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Thread(s) per core: 2 > Core(s) per socket: 32 > Socket(s): 2 > NUMA node(s): 8 > Vendor ID: AuthenticAMD > CPU family: 23 > Model: 1 > Model name: AMD EPYC 7601 32-Core Processor > Stepping: 2 > CPU MHz: 2713.551 > BogoMIPS: 4391.39 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 64K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-7,64-71 > NUMA node1 CPU(s): 8-15,72-79 > NUMA node2 CPU(s): 16-23,80-87 > NUMA node3 CPU(s): 24-31,88-95 > NUMA node4 CPU(s): 32-39,96-103 > NUMA node5 CPU(s): 40-47,104-111 > NUMA node6 CPU(s): 48-55,112-119 > NUMA node7 CPU(s): 56-63,120-127 > > Another possible lead is that without reverting the those commits below, kdump > kernel would always also crash in shrink_slab_memcg() at this line, > > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..bacda49 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; + if (!memcg->nodeinfo[nid]) + return 0; + if (!down_read_trylock(&shrinker_rwsem)) return 0; > > [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 > [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task > swapper/0/1 > [ 9.072036][ T1] > [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- > 20190711+ #10 > [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 > Gen10, BIOS A40 01/25/2019 > [ 9.072036][ T1] Call Trace: > [ 9.072036][ T1] dump_stack+0x62/0x9a > [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 > [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 > [ 9.072036][ T1] ? shrink_slab+0x111/0x440 > [ 9.072036][ T1] kasan_report+0xc/0xe > [ 9.072036][ T1] __asan_load8+0x71/0xa0 > [ 9.072036][ T1] shrink_slab+0x111/0x440 > [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 > [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 > [ 9.072036][ T1] shrink_node+0x31e/0xa30 > [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 > [ 9.072036][ T1] ? ktime_get+0x93/0x110 > [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 > [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 > [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 > [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 > [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 > [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > [ 9.072036][ T1] ? unwind_dump+0x260/0x260 > [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 > [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 > [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 > [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 > [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 > [ 9.072036][ T1] allocate_slab+0x600/0x11f0 > [ 9.072036][ T1] new_slab+0x46/0x70 > [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 > [ 9.072036][ T1] ? create_object+0x3a/0x3e0 > [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 > [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 > [ 9.072036][ T1] ? create_object+0x3a/0x3e0 > [ 9.072036][ T1] __slab_alloc+0x12/0x20 > [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 > [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 > [ 9.072036][ T1] create_object+0x3a/0x3e0 > [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 > [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 > [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 > [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d > [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 > [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 > [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 > [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed > [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 > [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb > [ 9.072036][ T1] acpi_load_tables+0x61/0x80 > [ 9.072036][ T1] acpi_init+0x10d/0x44b > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 > [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 > [ 9.072036][ T1] ? kernfs_get+0x13/0x20 > [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 > [ 9.072036][ T1] ? kset_register+0x31/0x50 > [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.072036][ T1] do_one_initcall+0xfe/0x45a > [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 > [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 > [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 > [ 9.072036][ T1] ? up_write+0x6b/0x190 > [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 > [ 9.072036][ T1] ? rest_init+0x188/0x188 > [ 9.072036][ T1] kernel_init+0x11/0x138 > [ 9.072036][ T1] ? rest_init+0x188/0x188 > [ 9.072036][ T1] ret_from_fork+0x22/0x40 > [ 9.072036][ T1] > ================================================================== > [ 9.072036][ T1] Disabling lock debugging due to kernel taint > [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: > 0000000000000dc8 > [ 9.152036][ T1] #PF: supervisor read access in kernel mode > [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page > [ 9.152036][ T1] PGD 0 P4D 0 > [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: > G B 5.2.0-next-20190711+ #10 > [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 > Gen10, BIOS A40 01/25/2019 > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00 > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: > ffffffff8112f288 > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: > ffffffff824e0440 > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: > fffffbfff049c088 > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: > 00000000000001b8 > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: > ffff88905757f440 > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) > knlGS:0000000000000000 > [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: > 00000000001406b0 > [ 9.152036][ T1] Call Trace: > [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 > [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 > [ 9.152036][ T1] shrink_node+0x31e/0xa30 > [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 > [ 9.152036][ T1] ? ktime_get+0x93/0x110 > [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 > [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 > [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 > [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 > [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 > [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > [ 9.152036][ T1] ? unwind_dump+0x260/0x260 > [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 > [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 > [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 > [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 > [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 > [ 9.152036][ T1] allocate_slab+0x600/0x11f0 > [ 9.152036][ T1] new_slab+0x46/0x70 > [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 > [ 9.152036][ T1] ? create_object+0x3a/0x3e0 > [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 > [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 > [ 9.152036][ T1] ? create_object+0x3a/0x3e0 > [ 9.152036][ T1] __slab_alloc+0x12/0x20 > [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 > [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 > [ 9.152036][ T1] create_object+0x3a/0x3e0 > [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 > [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 > [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 > [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d > [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 > [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 > [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 > [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed > [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 > [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb > [ 9.152036][ T1] acpi_load_tables+0x61/0x80 > [ 9.152036][ T1] acpi_init+0x10d/0x44b > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 > [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 > [ 9.152036][ T1] ? kernfs_get+0x13/0x20 > [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 > [ 9.152036][ T1] ? kset_register+0x31/0x50 > [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > [ 9.152036][ T1] do_one_initcall+0xfe/0x45a > [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 > [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 > [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 > [ 9.152036][ T1] ? up_write+0x6b/0x190 > [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 > [ 9.152036][ T1] ? rest_init+0x188/0x188 > [ 9.152036][ T1] kernel_init+0x11/0x138 > [ 9.152036][ T1] ? rest_init+0x188/0x188 > [ 9.152036][ T1] ret_from_fork+0x22/0x40 > [ 9.152036][ T1] Modules linked in: > [ 9.152036][ T1] CR2: 0000000000000dc8 > [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00 > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: > ffffffff8112f288 > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: > ffffffff824e0440 > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: > fffffbfff049c088 > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: > 00000000000001b8 > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: > ffff88905757f440 > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) > knlGS:00000000 > >> >> Regards, >> >> Yang >> >> >> >> On 7/10/19 2:43 PM, Qian Cai wrote: >>> Running LTP oom01 test case with swap triggers a crash below. Revert the >>> series >>> "Make deferred split shrinker memcg aware" [1] seems fix the issue. >>> >>> aefde94195ca mm: thp: make deferred split shrinker memcg aware >>> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix >>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 >>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix >>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem >>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() >>> 4e050f2df876 mm: thp: extract split_queue_* into a struct >>> >>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang. >>> shi@ >>> linux.alibaba.com/ >>> >>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is >>> LIST_POISON1 (dead000000000100) >>> [ 1145.739763][ T5764] ------------[ cut here ]------------ >>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! >>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN >>> NOPTI >>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: >>> G W 5.2.0-next-20190710+ #7 >>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>> DL385 >>> Gen10, BIOS A40 01/25/2019 >>> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a >>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 >>> 9e >>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff >>> <0f> >>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 >>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 >>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: >>> ffffffffae95d318 >>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: >>> ffff8888440bd380 >>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: >>> ffffed1108817a70 >>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: >>> dead000000000122 >>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: >>> dead000000000100 >>> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) >>> knlGS:0000000000000000 >>> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: >>> 00000000001406a0 >>> [ 1145.870664][ T5764] Call Trace: >>> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 >>> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 >>> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 >>> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 >>> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 >>> [ 1145.900159][ T5764] shrink_slab+0x253/0x440 >>> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 >>> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 >>> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 >>> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 >>> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 >>> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 >>> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 >>> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 >>> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 >>> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 >>> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 >>> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 >>> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 >>> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 >>> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 >>> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 >>> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 >>> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 >>> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 >>> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 >>> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 >>> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 >>> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 >>> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 >>> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 >>> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 >>> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 >>> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 >>> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf >>> [ 1146.075426][ T5764] ? page_fault+0x5/0x20 >>> [ 1146.079553][ T5764] page_fault+0x1b/0x20 >>> [ 1146.083594][ T5764] RIP: 0033:0x410be0 >>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 >>> 00 >>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 >>> <c6> >>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f >>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 >>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: >>> 00007f98f2674497 >>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: >>> 0000000000000000 >>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: >>> 0000000000000000 >>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][ >>> T5764] Shutting down cpus with NMI >>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception >>> ]--- >> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-12 19:12 ` Yang Shi @ 2019-07-13 4:41 ` Yang Shi 2019-07-15 21:23 ` Qian Cai 2019-07-19 0:54 ` Qian Cai 2 siblings, 0 replies; 21+ messages in thread From: Yang Shi @ 2019-07-13 4:41 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/12/19 12:12 PM, Yang Shi wrote: > > > On 7/11/19 2:07 PM, Qian Cai wrote: >> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >>> Hi Qian, >>> >>> >>> Thanks for reporting the issue. But, I can't reproduce it on my >>> machine. >>> Could you please share more details about your test? How often did you >>> run into this problem? >> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 >> server. Here >> is some more information. >> >> # cat .config >> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > I tried your kernel config, but I still can't reproduce it. My > compiler doesn't have retpoline support, so CONFIG_RETPOLINE is > disabled in my test, but I don't think this would make any difference > for this case. > > According to the bug call trace in the earlier email, it looks > deferred _split_scan lost race with put_compound_page. The > put_compound_page would call free_transhuge_page() which delete the > page from the deferred split queue, but it may still appear on the > deferred list due to some reason. > > Would you please try the below patch? > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index b7f709d..66bd9db 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, > struct list_head *list) > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > if (!list_empty(page_deferred_list(head))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(head)); > + list_del_init(page_deferred_list(head)); This line should not be changed. Please just apply the below part. > } > if (mapping) > __dec_node_page_state(page, NR_SHMEM_THPS); > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(page_deferred_list(page))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(page)); > + list_del_init(page_deferred_list(page)); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); > >> >> # numactl -H >> available: 8 nodes (0-7) >> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 >> node 0 size: 19984 MB >> node 0 free: 7251 MB >> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 >> node 1 size: 0 MB >> node 1 free: 0 MB >> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 >> node 2 size: 0 MB >> node 2 free: 0 MB >> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 >> node 3 size: 0 MB >> node 3 free: 0 MB >> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 >> node 4 size: 31524 MB >> node 4 free: 25165 MB >> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 >> node 5 size: 0 MB >> node 5 free: 0 MB >> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 >> node 6 size: 0 MB >> node 6 free: 0 MB >> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 >> node 7 size: 0 MB >> node 7 free: 0 MB >> node distances: >> node 0 1 2 3 4 5 6 7 >> 0: 10 16 16 16 32 32 32 32 >> 1: 16 10 16 16 32 32 32 32 >> 2: 16 16 10 16 32 32 32 32 >> 3: 16 16 16 10 32 32 32 32 >> 4: 32 32 32 32 10 16 16 16 >> 5: 32 32 32 32 16 10 16 16 >> 6: 32 32 32 32 16 16 10 16 >> 7: 32 32 32 32 16 16 16 10 >> >> # lscpu >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 128 >> On-line CPU(s) list: 0-127 >> Thread(s) per core: 2 >> Core(s) per socket: 32 >> Socket(s): 2 >> NUMA node(s): 8 >> Vendor ID: AuthenticAMD >> CPU family: 23 >> Model: 1 >> Model name: AMD EPYC 7601 32-Core Processor >> Stepping: 2 >> CPU MHz: 2713.551 >> BogoMIPS: 4391.39 >> Virtualization: AMD-V >> L1d cache: 32K >> L1i cache: 64K >> L2 cache: 512K >> L3 cache: 8192K >> NUMA node0 CPU(s): 0-7,64-71 >> NUMA node1 CPU(s): 8-15,72-79 >> NUMA node2 CPU(s): 16-23,80-87 >> NUMA node3 CPU(s): 24-31,88-95 >> NUMA node4 CPU(s): 32-39,96-103 >> NUMA node5 CPU(s): 40-47,104-111 >> NUMA node6 CPU(s): 48-55,112-119 >> NUMA node7 CPU(s): 56-63,120-127 >> >> Another possible lead is that without reverting the those commits >> below, kdump >> kernel would always also crash in shrink_slab_memcg() at this line, >> >> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, >> true); > > This looks a little bit weird. It seems nodeinfo[nid] is NULL? I > didn't think of where nodeinfo was freed but memcg was still online. > Maybe a check is needed: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..bacda49 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t > gfp_mask, int nid, > if (!mem_cgroup_online(memcg)) > return 0; > > + if (!memcg->nodeinfo[nid]) > + return 0; > + > if (!down_read_trylock(&shrinker_rwsem)) > return 0; > >> >> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in >> shrink_slab+0x111/0x440 >> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task >> swapper/0/1 >> [ 9.072036][ T1] >> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted >> 5.2.0-next- >> 20190711+ #10 >> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 >> Gen10/ProLiant DL385 >> Gen10, BIOS A40 01/25/2019 >> [ 9.072036][ T1] Call Trace: >> [ 9.072036][ T1] dump_stack+0x62/0x9a >> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 >> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 >> [ 9.072036][ T1] ? shrink_slab+0x111/0x440 >> [ 9.072036][ T1] kasan_report+0xc/0xe >> [ 9.072036][ T1] __asan_load8+0x71/0xa0 >> [ 9.072036][ T1] shrink_slab+0x111/0x440 >> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 >> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 >> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 >> [ 9.072036][ T1] shrink_node+0x31e/0xa30 >> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 >> [ 9.072036][ T1] ? ktime_get+0x93/0x110 >> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 >> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 >> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 >> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 >> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 >> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >> [ 9.072036][ T1] ? unwind_dump+0x260/0x260 >> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 >> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 >> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 >> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 >> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 >> [ 9.072036][ T1] allocate_slab+0x600/0x11f0 >> [ 9.072036][ T1] new_slab+0x46/0x70 >> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 >> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 >> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >> [ 9.072036][ T1] __slab_alloc+0x12/0x20 >> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 >> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 >> [ 9.072036][ T1] create_object+0x3a/0x3e0 >> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 >> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 >> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 >> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 >> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 >> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb >> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed >> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >> [ 9.072036][ T1] acpi_load_tables+0x61/0x80 >> [ 9.072036][ T1] acpi_init+0x10d/0x44b >> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 >> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 >> [ 9.072036][ T1] ? kernfs_get+0x13/0x20 >> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 >> [ 9.072036][ T1] ? kset_register+0x31/0x50 >> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 >> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a >> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 >> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 >> [ 9.072036][ T1] ? up_write+0x6b/0x190 >> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 >> [ 9.072036][ T1] ? rest_init+0x188/0x188 >> [ 9.072036][ T1] kernel_init+0x11/0x138 >> [ 9.072036][ T1] ? rest_init+0x188/0x188 >> [ 9.072036][ T1] ret_from_fork+0x22/0x40 >> [ 9.072036][ T1] >> ================================================================== >> [ 9.072036][ T1] Disabling lock debugging due to kernel taint >> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: >> 0000000000000dc8 >> [ 9.152036][ T1] #PF: supervisor read access in kernel mode >> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page >> [ 9.152036][ T1] PGD 0 P4D 0 >> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: >> G B 5.2.0-next-20190711+ #10 >> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 >> Gen10/ProLiant DL385 >> Gen10, BIOS A40 01/25/2019 >> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f >> 84 e2 02 00 >> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 >> 0e 00 <4f> >> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >> ffffffff8112f288 >> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >> ffffffff824e0440 >> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >> fffffbfff049c088 >> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >> 00000000000001b8 >> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >> ffff88905757f440 >> [ 9.152036][ T1] FS: 0000000000000000(0000) >> GS:ffff889062800000(0000) >> knlGS:0000000000000000 >> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: >> 00000000001406b0 >> [ 9.152036][ T1] Call Trace: >> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 >> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 >> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 >> [ 9.152036][ T1] shrink_node+0x31e/0xa30 >> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 >> [ 9.152036][ T1] ? ktime_get+0x93/0x110 >> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 >> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 >> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 >> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 >> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 >> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >> [ 9.152036][ T1] ? unwind_dump+0x260/0x260 >> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 >> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 >> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 >> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 >> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 >> [ 9.152036][ T1] allocate_slab+0x600/0x11f0 >> [ 9.152036][ T1] new_slab+0x46/0x70 >> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 >> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 >> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >> [ 9.152036][ T1] __slab_alloc+0x12/0x20 >> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 >> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 >> [ 9.152036][ T1] create_object+0x3a/0x3e0 >> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 >> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 >> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 >> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 >> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 >> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb >> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed >> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >> [ 9.152036][ T1] acpi_load_tables+0x61/0x80 >> [ 9.152036][ T1] acpi_init+0x10d/0x44b >> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 >> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 >> [ 9.152036][ T1] ? kernfs_get+0x13/0x20 >> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 >> [ 9.152036][ T1] ? kset_register+0x31/0x50 >> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 >> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a >> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 >> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 >> [ 9.152036][ T1] ? up_write+0x6b/0x190 >> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 >> [ 9.152036][ T1] ? rest_init+0x188/0x188 >> [ 9.152036][ T1] kernel_init+0x11/0x138 >> [ 9.152036][ T1] ? rest_init+0x188/0x188 >> [ 9.152036][ T1] ret_from_fork+0x22/0x40 >> [ 9.152036][ T1] Modules linked in: >> [ 9.152036][ T1] CR2: 0000000000000dc8 >> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- >> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f >> 84 e2 02 00 >> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 >> 0e 00 <4f> >> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >> ffffffff8112f288 >> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >> ffffffff824e0440 >> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >> fffffbfff049c088 >> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >> 00000000000001b8 >> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >> ffff88905757f440 >> [ 9.152036][ T1] FS: 0000000000000000(0000) >> GS:ffff889062800000(0000) >> knlGS:00000000 >> >>> >>> Regards, >>> >>> Yang >>> >>> >>> >>> On 7/10/19 2:43 PM, Qian Cai wrote: >>>> Running LTP oom01 test case with swap triggers a crash below. >>>> Revert the >>>> series >>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue. >>>> >>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware >>>> cf402211cacc >>>> mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix >>>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 >>>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix >>>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem >>>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of >>>> __page_cache_release() >>>> 4e050f2df876 mm: thp: extract split_queue_* into a struct >>>> >>>> [1] >>>> https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang. >>>> >>>> shi@ >>>> linux.alibaba.com/ >>>> >>>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is >>>> LIST_POISON1 (dead000000000100) >>>> [ 1145.739763][ T5764] ------------[ cut here ]------------ >>>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! >>>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP >>>> DEBUG_PAGEALLOC KASAN >>>> NOPTI >>>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: >>>> G W 5.2.0-next-20190710+ #7 >>>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 >>>> Gen10/ProLiant >>>> DL385 >>>> Gen10, BIOS A40 01/25/2019 >>>> [ 1145.776000][ T5764] RIP: >>>> 0010:__list_del_entry_valid.cold.0+0x12/0x4a >>>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 >>>> c7 c7 80 >>>> 9e >>>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c >>>> fe bc ff >>>> <0f> >>>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 >>>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 >>>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 >>>> RCX: >>>> ffffffffae95d318 >>>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 >>>> RDI: >>>> ffff8888440bd380 >>>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 >>>> R09: >>>> ffffed1108817a70 >>>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 >>>> R12: >>>> dead000000000122 >>>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 >>>> R15: >>>> dead000000000100 >>>> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) >>>> GS:ffff888844080000(0000) >>>> knlGS:0000000000000000 >>>> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: >>>> 0000000080050033 >>>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 >>>> CR4: >>>> 00000000001406a0 >>>> [ 1145.870664][ T5764] Call Trace: >>>> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 >>>> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 >>>> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 >>>> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 >>>> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 >>>> [ 1145.900159][ T5764] shrink_slab+0x253/0x440 >>>> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 >>>> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 >>>> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 >>>> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 >>>> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 >>>> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 >>>> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 >>>> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 >>>> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 >>>> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 >>>> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 >>>> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>>> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 >>>> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 >>>> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 >>>> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 >>>> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 >>>> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 >>>> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 >>>> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 >>>> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 >>>> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 >>>> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 >>>> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 >>>> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 >>>> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 >>>> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 >>>> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 >>>> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 >>>> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf >>>> [ 1146.075426][ T5764] ? page_fault+0x5/0x20 >>>> [ 1146.079553][ T5764] page_fault+0x1b/0x20 >>>> [ 1146.083594][ T5764] RIP: 0033:0x410be0 >>>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 >>>> 86 00 00 >>>> 00 >>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 >>>> 48 98 90 >>>> <c6> >>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f >>>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 >>>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 >>>> RCX: >>>> 00007f98f2674497 >>>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 >>>> RDI: >>>> 0000000000000000 >>>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff >>>> R09: >>>> 0000000000000000 >>>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ >>>> 1147.588181][ >>>> T5764] Shutting down cpus with NMI >>>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from >>>> 0xffffffff81000000 >>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal >>>> exception >>>> ]--- >>> > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-12 19:12 ` Yang Shi 2019-07-13 4:41 ` Yang Shi @ 2019-07-15 21:23 ` Qian Cai 2019-07-16 0:22 ` Yang Shi 2019-07-19 0:54 ` Qian Cai 2 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-07-15 21:23 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > > Another possible lead is that without reverting the those commits below, > > kdump > > kernel would always also crash in shrink_slab_memcg() at this line, > > > > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); > > This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > think of where nodeinfo was freed but memcg was still online. Maybe a > check is needed: Actually, "memcg" is NULL. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..bacda49 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t > gfp_mask, int nid, > if (!mem_cgroup_online(memcg)) > return 0; > > + if (!memcg->nodeinfo[nid]) > + return 0; > + > if (!down_read_trylock(&shrinker_rwsem)) > return 0; > > > > > [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 > > [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task > > swapper/0/1 > > [ 9.072036][ T1] > > [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- > > 20190711+ #10 > > [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant > > DL385 > > Gen10, BIOS A40 01/25/2019 > > [ 9.072036][ T1] Call Trace: > > [ 9.072036][ T1] dump_stack+0x62/0x9a > > [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 > > [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 > > [ 9.072036][ T1] ? shrink_slab+0x111/0x440 > > [ 9.072036][ T1] kasan_report+0xc/0xe > > [ 9.072036][ T1] __asan_load8+0x71/0xa0 > > [ 9.072036][ T1] shrink_slab+0x111/0x440 > > [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 > > [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 > > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 > > [ 9.072036][ T1] shrink_node+0x31e/0xa30 > > [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 > > [ 9.072036][ T1] ? ktime_get+0x93/0x110 > > [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 > > [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 > > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 > > [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 > > [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 > > [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 > > [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > > [ 9.072036][ T1] ? unwind_dump+0x260/0x260 > > [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 > > [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 > > [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 > > [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 > > [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 > > [ 9.072036][ T1] allocate_slab+0x600/0x11f0 > > [ 9.072036][ T1] new_slab+0x46/0x70 > > [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 > > [ 9.072036][ T1] ? create_object+0x3a/0x3e0 > > [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 > > [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 > > [ 9.072036][ T1] ? create_object+0x3a/0x3e0 > > [ 9.072036][ T1] __slab_alloc+0x12/0x20 > > [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 > > [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 > > [ 9.072036][ T1] create_object+0x3a/0x3e0 > > [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 > > [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 > > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 > > [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 > > [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d > > [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 > > [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 > > [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 > > [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 > > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > > [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb > > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed > > [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 > > [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb > > [ 9.072036][ T1] acpi_load_tables+0x61/0x80 > > [ 9.072036][ T1] acpi_init+0x10d/0x44b > > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 > > [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 > > [ 9.072036][ T1] ? kernfs_get+0x13/0x20 > > [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 > > [ 9.072036][ T1] ? kset_register+0x31/0x50 > > [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 > > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.072036][ T1] do_one_initcall+0xfe/0x45a > > [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 > > [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 > > [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 > > [ 9.072036][ T1] ? up_write+0x6b/0x190 > > [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 > > [ 9.072036][ T1] ? rest_init+0x188/0x188 > > [ 9.072036][ T1] kernel_init+0x11/0x138 > > [ 9.072036][ T1] ? rest_init+0x188/0x188 > > [ 9.072036][ T1] ret_from_fork+0x22/0x40 > > [ 9.072036][ T1] > > ================================================================== > > [ 9.072036][ T1] Disabling lock debugging due to kernel taint > > [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: > > 0000000000000dc8 > > [ 9.152036][ T1] #PF: supervisor read access in kernel mode > > [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page > > [ 9.152036][ T1] PGD 0 P4D 0 > > [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > > [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: > > G B 5.2.0-next-20190711+ #10 > > [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant > > DL385 > > Gen10, BIOS A40 01/25/2019 > > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 > > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 > > 00 > > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 > > <4f> > > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 > > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 > > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: > > ffffffff8112f288 > > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: > > ffffffff824e0440 > > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: > > fffffbfff049c088 > > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: > > 00000000000001b8 > > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: > > ffff88905757f440 > > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) > > knlGS:0000000000000000 > > [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: > > 00000000001406b0 > > [ 9.152036][ T1] Call Trace: > > [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 > > [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 > > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 > > [ 9.152036][ T1] shrink_node+0x31e/0xa30 > > [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 > > [ 9.152036][ T1] ? ktime_get+0x93/0x110 > > [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 > > [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 > > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 > > [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 > > [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 > > [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 > > [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > > [ 9.152036][ T1] ? unwind_dump+0x260/0x260 > > [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 > > [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 > > [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 > > [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 > > [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 > > [ 9.152036][ T1] allocate_slab+0x600/0x11f0 > > [ 9.152036][ T1] new_slab+0x46/0x70 > > [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 > > [ 9.152036][ T1] ? create_object+0x3a/0x3e0 > > [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 > > [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 > > [ 9.152036][ T1] ? create_object+0x3a/0x3e0 > > [ 9.152036][ T1] __slab_alloc+0x12/0x20 > > [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 > > [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 > > [ 9.152036][ T1] create_object+0x3a/0x3e0 > > [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 > > [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 > > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 > > [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 > > [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 > > [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d > > [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 > > [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 > > [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 > > [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 > > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 > > [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb > > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed > > [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 > > [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb > > [ 9.152036][ T1] acpi_load_tables+0x61/0x80 > > [ 9.152036][ T1] acpi_init+0x10d/0x44b > > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 > > [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 > > [ 9.152036][ T1] ? kernfs_get+0x13/0x20 > > [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 > > [ 9.152036][ T1] ? kset_register+0x31/0x50 > > [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 > > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 > > [ 9.152036][ T1] do_one_initcall+0xfe/0x45a > > [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 > > [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 > > [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 > > [ 9.152036][ T1] ? up_write+0x6b/0x190 > > [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 > > [ 9.152036][ T1] ? rest_init+0x188/0x188 > > [ 9.152036][ T1] kernel_init+0x11/0x138 > > [ 9.152036][ T1] ? rest_init+0x188/0x188 > > [ 9.152036][ T1] ret_from_fork+0x22/0x40 > > [ 9.152036][ T1] Modules linked in: > > [ 9.152036][ T1] CR2: 0000000000000dc8 > > [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- > > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 > > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 > > 00 > > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 > > <4f> > > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 > > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 > > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: > > ffffffff8112f288 > > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: > > ffffffff824e0440 > > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: > > fffffbfff049c088 > > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: > > 00000000000001b8 > > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: > > ffff88905757f440 > > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) > > knlGS:00000000 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-15 21:23 ` Qian Cai @ 2019-07-16 0:22 ` Yang Shi 2019-07-16 1:36 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-16 0:22 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/15/19 2:23 PM, Qian Cai wrote: > On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: >>> Another possible lead is that without reverting the those commits below, >>> kdump >>> kernel would always also crash in shrink_slab_memcg() at this line, >>> >>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >> think of where nodeinfo was freed but memcg was still online. Maybe a >> check is needed: > Actually, "memcg" is NULL. It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. > >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index a0301ed..bacda49 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t >> gfp_mask, int nid, >> if (!mem_cgroup_online(memcg)) >> return 0; >> >> + if (!memcg->nodeinfo[nid]) >> + return 0; >> + >> if (!down_read_trylock(&shrinker_rwsem)) >> return 0; >> >>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 >>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task >>> swapper/0/1 >>> [ 9.072036][ T1] >>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- >>> 20190711+ #10 >>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>> DL385 >>> Gen10, BIOS A40 01/25/2019 >>> [ 9.072036][ T1] Call Trace: >>> [ 9.072036][ T1] dump_stack+0x62/0x9a >>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 >>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 >>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440 >>> [ 9.072036][ T1] kasan_report+0xc/0xe >>> [ 9.072036][ T1] __asan_load8+0x71/0xa0 >>> [ 9.072036][ T1] shrink_slab+0x111/0x440 >>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 >>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 >>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 >>> [ 9.072036][ T1] shrink_node+0x31e/0xa30 >>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>> [ 9.072036][ T1] ? ktime_get+0x93/0x110 >>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 >>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 >>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 >>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 >>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 >>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260 >>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 >>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 >>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 >>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 >>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 >>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0 >>> [ 9.072036][ T1] new_slab+0x46/0x70 >>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 >>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 >>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>> [ 9.072036][ T1] __slab_alloc+0x12/0x20 >>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 >>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 >>> [ 9.072036][ T1] create_object+0x3a/0x3e0 >>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 >>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 >>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 >>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 >>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb >>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed >>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80 >>> [ 9.072036][ T1] acpi_init+0x10d/0x44b >>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 >>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 >>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20 >>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 >>> [ 9.072036][ T1] ? kset_register+0x31/0x50 >>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 >>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a >>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 >>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 >>> [ 9.072036][ T1] ? up_write+0x6b/0x190 >>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 >>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>> [ 9.072036][ T1] kernel_init+0x11/0x138 >>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>> [ 9.072036][ T1] ret_from_fork+0x22/0x40 >>> [ 9.072036][ T1] >>> ================================================================== >>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint >>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: >>> 0000000000000dc8 >>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode >>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page >>> [ 9.152036][ T1] PGD 0 P4D 0 >>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: >>> G B 5.2.0-next-20190711+ #10 >>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>> DL385 >>> Gen10, BIOS A40 01/25/2019 >>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>> 00 >>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>> <4f> >>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>> ffffffff8112f288 >>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>> ffffffff824e0440 >>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>> fffffbfff049c088 >>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>> 00000000000001b8 >>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>> ffff88905757f440 >>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>> knlGS:0000000000000000 >>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: >>> 00000000001406b0 >>> [ 9.152036][ T1] Call Trace: >>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 >>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 >>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 >>> [ 9.152036][ T1] shrink_node+0x31e/0xa30 >>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>> [ 9.152036][ T1] ? ktime_get+0x93/0x110 >>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 >>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 >>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 >>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 >>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 >>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260 >>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 >>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 >>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 >>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 >>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 >>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0 >>> [ 9.152036][ T1] new_slab+0x46/0x70 >>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 >>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 >>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>> [ 9.152036][ T1] __slab_alloc+0x12/0x20 >>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 >>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 >>> [ 9.152036][ T1] create_object+0x3a/0x3e0 >>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 >>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 >>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 >>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 >>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb >>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed >>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80 >>> [ 9.152036][ T1] acpi_init+0x10d/0x44b >>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 >>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 >>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20 >>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 >>> [ 9.152036][ T1] ? kset_register+0x31/0x50 >>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 >>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a >>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 >>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 >>> [ 9.152036][ T1] ? up_write+0x6b/0x190 >>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 >>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>> [ 9.152036][ T1] kernel_init+0x11/0x138 >>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>> [ 9.152036][ T1] ret_from_fork+0x22/0x40 >>> [ 9.152036][ T1] Modules linked in: >>> [ 9.152036][ T1] CR2: 0000000000000dc8 >>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- >>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>> 00 >>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>> <4f> >>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>> ffffffff8112f288 >>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>> ffffffff824e0440 >>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>> fffffbfff049c088 >>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>> 00000000000001b8 >>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>> ffff88905757f440 >>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>> knlGS:00000000 >>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-16 0:22 ` Yang Shi @ 2019-07-16 1:36 ` Qian Cai 2019-07-16 3:00 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-07-16 1:36 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML > On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > On 7/15/19 2:23 PM, Qian Cai wrote: >> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: >>>> Another possible lead is that without reverting the those commits below, >>>> kdump >>>> kernel would always also crash in shrink_slab_memcg() at this line, >>>> >>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >>> think of where nodeinfo was freed but memcg was still online. Maybe a >>> check is needed: >> Actually, "memcg" is NULL. > > It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) + if (!mem_cgroup_online(memcg)) return 0; Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, if (mem_cgroup_disabled()) return NULL; > >> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index a0301ed..bacda49 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t >>> gfp_mask, int nid, >>> if (!mem_cgroup_online(memcg)) >>> return 0; >>> >>> + if (!memcg->nodeinfo[nid]) >>> + return 0; >>> + >>> if (!down_read_trylock(&shrinker_rwsem)) >>> return 0; >>> >>>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 >>>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task >>>> swapper/0/1 >>>> [ 9.072036][ T1] >>>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- >>>> 20190711+ #10 >>>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>>> DL385 >>>> Gen10, BIOS A40 01/25/2019 >>>> [ 9.072036][ T1] Call Trace: >>>> [ 9.072036][ T1] dump_stack+0x62/0x9a >>>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 >>>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 >>>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440 >>>> [ 9.072036][ T1] kasan_report+0xc/0xe >>>> [ 9.072036][ T1] __asan_load8+0x71/0xa0 >>>> [ 9.072036][ T1] shrink_slab+0x111/0x440 >>>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 >>>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 >>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 >>>> [ 9.072036][ T1] shrink_node+0x31e/0xa30 >>>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>>> [ 9.072036][ T1] ? ktime_get+0x93/0x110 >>>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 >>>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 >>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 >>>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 >>>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 >>>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260 >>>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 >>>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 >>>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 >>>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 >>>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 >>>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0 >>>> [ 9.072036][ T1] new_slab+0x46/0x70 >>>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 >>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 >>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>>> [ 9.072036][ T1] __slab_alloc+0x12/0x20 >>>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 >>>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 >>>> [ 9.072036][ T1] create_object+0x3a/0x3e0 >>>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 >>>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 >>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 >>>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 >>>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb >>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed >>>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80 >>>> [ 9.072036][ T1] acpi_init+0x10d/0x44b >>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 >>>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 >>>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20 >>>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 >>>> [ 9.072036][ T1] ? kset_register+0x31/0x50 >>>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 >>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a >>>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 >>>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 >>>> [ 9.072036][ T1] ? up_write+0x6b/0x190 >>>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 >>>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>>> [ 9.072036][ T1] kernel_init+0x11/0x138 >>>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>>> [ 9.072036][ T1] ret_from_fork+0x22/0x40 >>>> [ 9.072036][ T1] >>>> ================================================================== >>>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint >>>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: >>>> 0000000000000dc8 >>>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode >>>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page >>>> [ 9.152036][ T1] PGD 0 P4D 0 >>>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >>>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: >>>> G B 5.2.0-next-20190711+ #10 >>>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>>> DL385 >>>> Gen10, BIOS A40 01/25/2019 >>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>>> 00 >>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>>> <4f> >>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>>> ffffffff8112f288 >>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>>> ffffffff824e0440 >>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>>> fffffbfff049c088 >>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>>> 00000000000001b8 >>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>>> ffff88905757f440 >>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>>> knlGS:0000000000000000 >>>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: >>>> 00000000001406b0 >>>> [ 9.152036][ T1] Call Trace: >>>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 >>>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 >>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 >>>> [ 9.152036][ T1] shrink_node+0x31e/0xa30 >>>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>>> [ 9.152036][ T1] ? ktime_get+0x93/0x110 >>>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 >>>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 >>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 >>>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 >>>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 >>>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260 >>>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 >>>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 >>>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 >>>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 >>>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 >>>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0 >>>> [ 9.152036][ T1] new_slab+0x46/0x70 >>>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 >>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 >>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>>> [ 9.152036][ T1] __slab_alloc+0x12/0x20 >>>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 >>>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 >>>> [ 9.152036][ T1] create_object+0x3a/0x3e0 >>>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 >>>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 >>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 >>>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 >>>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb >>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed >>>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80 >>>> [ 9.152036][ T1] acpi_init+0x10d/0x44b >>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 >>>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 >>>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20 >>>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 >>>> [ 9.152036][ T1] ? kset_register+0x31/0x50 >>>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 >>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a >>>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 >>>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 >>>> [ 9.152036][ T1] ? up_write+0x6b/0x190 >>>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 >>>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>>> [ 9.152036][ T1] kernel_init+0x11/0x138 >>>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>>> [ 9.152036][ T1] ret_from_fork+0x22/0x40 >>>> [ 9.152036][ T1] Modules linked in: >>>> [ 9.152036][ T1] CR2: 0000000000000dc8 >>>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- >>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>>> 00 >>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>>> <4f> >>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>>> ffffffff8112f288 >>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>>> ffffffff824e0440 >>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>>> fffffbfff049c088 >>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>>> 00000000000001b8 >>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>>> ffff88905757f440 >>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>>> knlGS:00000000 >>>> > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-16 1:36 ` Qian Cai @ 2019-07-16 3:00 ` Yang Shi 2019-07-16 23:36 ` Shakeel Butt 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-16 3:00 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML On 7/15/19 6:36 PM, Qian Cai wrote: > >> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: >> >> >> >> On 7/15/19 2:23 PM, Qian Cai wrote: >>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: >>>>> Another possible lead is that without reverting the those commits below, >>>>> kdump >>>>> kernel would always also crash in shrink_slab_memcg() at this line, >>>>> >>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >>>> think of where nodeinfo was freed but memcg was still online. Maybe a >>>> check is needed: >>> Actually, "memcg" is NULL. >> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. > Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), > > - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > + if (!mem_cgroup_online(memcg)) > return 0; > > Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, > > if (mem_cgroup_disabled()) > return NULL; Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() check before calling shrink_slab_memcg() as below: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..2f03c61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, unsigned long ret, freed = 0; struct shrinker *shrinker; - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); if (!down_read_trylock(&shrinker_rwsem)) > >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index a0301ed..bacda49 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t >>>> gfp_mask, int nid, >>>> if (!mem_cgroup_online(memcg)) >>>> return 0; >>>> >>>> + if (!memcg->nodeinfo[nid]) >>>> + return 0; >>>> + >>>> if (!down_read_trylock(&shrinker_rwsem)) >>>> return 0; >>>> >>>>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 >>>>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task >>>>> swapper/0/1 >>>>> [ 9.072036][ T1] >>>>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- >>>>> 20190711+ #10 >>>>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>>>> DL385 >>>>> Gen10, BIOS A40 01/25/2019 >>>>> [ 9.072036][ T1] Call Trace: >>>>> [ 9.072036][ T1] dump_stack+0x62/0x9a >>>>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4 >>>>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50 >>>>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440 >>>>> [ 9.072036][ T1] kasan_report+0xc/0xe >>>>> [ 9.072036][ T1] __asan_load8+0x71/0xa0 >>>>> [ 9.072036][ T1] shrink_slab+0x111/0x440 >>>>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840 >>>>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110 >>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260 >>>>> [ 9.072036][ T1] shrink_node+0x31e/0xa30 >>>>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>>>> [ 9.072036][ T1] ? ktime_get+0x93/0x110 >>>>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820 >>>>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30 >>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0 >>>>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0 >>>>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820 >>>>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>>>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>>>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260 >>>>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0 >>>>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0 >>>>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40 >>>>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130 >>>>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110 >>>>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0 >>>>> [ 9.072036][ T1] new_slab+0x46/0x70 >>>>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0 >>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>>>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>>>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0 >>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0 >>>>> [ 9.072036][ T1] __slab_alloc+0x12/0x20 >>>>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20 >>>>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400 >>>>> [ 9.072036][ T1] create_object+0x3a/0x3e0 >>>>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0 >>>>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400 >>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>>>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122 >>>>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>>>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>>>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61 >>>>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>>>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb >>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed >>>>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>>>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>>>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80 >>>>> [ 9.072036][ T1] acpi_init+0x10d/0x44b >>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30 >>>>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980 >>>>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20 >>>>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10 >>>>> [ 9.072036][ T1] ? kset_register+0x31/0x50 >>>>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0 >>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a >>>>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150 >>>>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>>>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20 >>>>> [ 9.072036][ T1] ? up_write+0x6b/0x190 >>>>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7 >>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>>>> [ 9.072036][ T1] kernel_init+0x11/0x138 >>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188 >>>>> [ 9.072036][ T1] ret_from_fork+0x22/0x40 >>>>> [ 9.072036][ T1] >>>>> ================================================================== >>>>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint >>>>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address: >>>>> 0000000000000dc8 >>>>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode >>>>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page >>>>> [ 9.152036][ T1] PGD 0 P4D 0 >>>>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >>>>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: >>>>> G B 5.2.0-next-20190711+ #10 >>>>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant >>>>> DL385 >>>>> Gen10, BIOS A40 01/25/2019 >>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>>>> 00 >>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>>>> <4f> >>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>>>> ffffffff8112f288 >>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>>>> ffffffff824e0440 >>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>>>> fffffbfff049c088 >>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>>>> 00000000000001b8 >>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>>>> ffff88905757f440 >>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4: >>>>> 00000000001406b0 >>>>> [ 9.152036][ T1] Call Trace: >>>>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840 >>>>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110 >>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260 >>>>> [ 9.152036][ T1] shrink_node+0x31e/0xa30 >>>>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560 >>>>> [ 9.152036][ T1] ? ktime_get+0x93/0x110 >>>>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820 >>>>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30 >>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0 >>>>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0 >>>>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820 >>>>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0 >>>>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>>>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260 >>>>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0 >>>>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0 >>>>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40 >>>>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130 >>>>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110 >>>>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0 >>>>> [ 9.152036][ T1] new_slab+0x46/0x70 >>>>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0 >>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>>>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30 >>>>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0 >>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0 >>>>> [ 9.152036][ T1] __slab_alloc+0x12/0x20 >>>>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20 >>>>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400 >>>>> [ 9.152036][ T1] create_object+0x3a/0x3e0 >>>>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0 >>>>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400 >>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20 >>>>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140 >>>>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122 >>>>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d >>>>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84 >>>>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61 >>>>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189 >>>>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2 >>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61 >>>>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb >>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed >>>>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 >>>>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb >>>>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80 >>>>> [ 9.152036][ T1] acpi_init+0x10d/0x44b >>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30 >>>>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980 >>>>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20 >>>>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10 >>>>> [ 9.152036][ T1] ? kset_register+0x31/0x50 >>>>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0 >>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36 >>>>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a >>>>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150 >>>>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930 >>>>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20 >>>>> [ 9.152036][ T1] ? up_write+0x6b/0x190 >>>>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7 >>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>>>> [ 9.152036][ T1] kernel_init+0x11/0x138 >>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188 >>>>> [ 9.152036][ T1] ret_from_fork+0x22/0x40 >>>>> [ 9.152036][ T1] Modules linked in: >>>>> [ 9.152036][ T1] CR2: 0000000000000dc8 >>>>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]--- >>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440 >>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 >>>>> 00 >>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 >>>>> <4f> >>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24 >>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282 >>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX: >>>>> ffffffff8112f288 >>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI: >>>>> ffffffff824e0440 >>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09: >>>>> fffffbfff049c088 >>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12: >>>>> 00000000000001b8 >>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: >>>>> ffff88905757f440 >>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000) >>>>> knlGS:00000000 >>>>> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-16 3:00 ` Yang Shi @ 2019-07-16 23:36 ` Shakeel Butt 2019-07-17 0:12 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Shakeel Butt @ 2019-07-16 23:36 UTC (permalink / raw) To: Yang Shi, Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko, Johannes Weiner, Roman Gushchin Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML Adding related people. The thread starts at: http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > On 7/15/19 6:36 PM, Qian Cai wrote: > > > >> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > >> > >> > >> > >> On 7/15/19 2:23 PM, Qian Cai wrote: > >>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > >>>>> Another possible lead is that without reverting the those commits below, > >>>>> kdump > >>>>> kernel would always also crash in shrink_slab_memcg() at this line, > >>>>> > >>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); > >>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > >>>> think of where nodeinfo was freed but memcg was still online. Maybe a > >>>> check is needed: > >>> Actually, "memcg" is NULL. > >> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. > > Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), > > > > - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > > + if (!mem_cgroup_online(memcg)) > > return 0; > > > > Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, > > > > if (mem_cgroup_disabled()) > > return NULL; > > Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). > Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() > check before calling shrink_slab_memcg() as below: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..2f03c61 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > nid, > unsigned long ret, freed = 0; > struct shrinker *shrinker; > > - if (!mem_cgroup_is_root(memcg)) > + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > if (!down_read_trylock(&shrinker_rwsem)) > We were seeing unneeded oom-kills on kernels with "cgroup_disabled=memory" and Yang's patch series basically expose the bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()") missed the case for "cgroup_disabled=memory". However I am surprised that root_mem_cgroup is allocated even for "cgroup_disabled=memory" and it seems like css_alloc() is called even before checking if the corresponding controller is disabled. Yang, can you please send the above change with signed-off and CC to stable as well? thanks, Shakeel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-16 23:36 ` Shakeel Butt @ 2019-07-17 0:12 ` Yang Shi 2019-07-17 17:02 ` Shakeel Butt 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-17 0:12 UTC (permalink / raw) To: Shakeel Butt, Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko, Johannes Weiner, Roman Gushchin Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML On 7/16/19 4:36 PM, Shakeel Butt wrote: > Adding related people. > > The thread starts at: > http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw > > On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: >> >> >> On 7/15/19 6:36 PM, Qian Cai wrote: >>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: >>>> >>>> >>>> >>>> On 7/15/19 2:23 PM, Qian Cai wrote: >>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: >>>>>>> Another possible lead is that without reverting the those commits below, >>>>>>> kdump >>>>>>> kernel would always also crash in shrink_slab_memcg() at this line, >>>>>>> >>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a >>>>>> check is needed: >>>>> Actually, "memcg" is NULL. >>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. >>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), >>> >>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) >>> + if (!mem_cgroup_online(memcg)) >>> return 0; >>> >>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, >>> >>> if (mem_cgroup_disabled()) >>> return NULL; >> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). >> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() >> check before calling shrink_slab_memcg() as below: >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index a0301ed..2f03c61 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int >> nid, >> unsigned long ret, freed = 0; >> struct shrinker *shrinker; >> >> - if (!mem_cgroup_is_root(memcg)) >> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) >> return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >> >> if (!down_read_trylock(&shrinker_rwsem)) >> > We were seeing unneeded oom-kills on kernels with > "cgroup_disabled=memory" and Yang's patch series basically expose the > bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: > generalize shrink_slab() calls in shrink_node()") missed the case for > "cgroup_disabled=memory". However I am surprised that root_mem_cgroup > is allocated even for "cgroup_disabled=memory" and it seems like > css_alloc() is called even before checking if the corresponding > controller is disabled. I'm surprised too. A quick test with drgn shows root memcg is definitely allocated: >>> prog['root_mem_cgroup'] *(struct mem_cgroup *)0xffff8902cf058000 = { [snip] But, isn't this a bug? Thanks, Yang > > Yang, can you please send the above change with signed-off and CC to > stable as well? > > thanks, > Shakeel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-17 0:12 ` Yang Shi @ 2019-07-17 17:02 ` Shakeel Butt 2019-07-17 17:09 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Shakeel Butt @ 2019-07-17 17:02 UTC (permalink / raw) To: Yang Shi Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko, Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > On 7/16/19 4:36 PM, Shakeel Butt wrote: > > Adding related people. > > > > The thread starts at: > > http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw > > > > On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: > >> > >> > >> On 7/15/19 6:36 PM, Qian Cai wrote: > >>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > >>>> > >>>> > >>>> > >>>> On 7/15/19 2:23 PM, Qian Cai wrote: > >>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > >>>>>>> Another possible lead is that without reverting the those commits below, > >>>>>>> kdump > >>>>>>> kernel would always also crash in shrink_slab_memcg() at this line, > >>>>>>> > >>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); > >>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > >>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a > >>>>>> check is needed: > >>>>> Actually, "memcg" is NULL. > >>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. > >>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), > >>> > >>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > >>> + if (!mem_cgroup_online(memcg)) > >>> return 0; > >>> > >>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, > >>> > >>> if (mem_cgroup_disabled()) > >>> return NULL; > >> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). > >> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() > >> check before calling shrink_slab_memcg() as below: > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index a0301ed..2f03c61 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > >> nid, > >> unsigned long ret, freed = 0; > >> struct shrinker *shrinker; > >> > >> - if (!mem_cgroup_is_root(memcg)) > >> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > >> return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > >> > >> if (!down_read_trylock(&shrinker_rwsem)) > >> > > We were seeing unneeded oom-kills on kernels with > > "cgroup_disabled=memory" and Yang's patch series basically expose the > > bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: > > generalize shrink_slab() calls in shrink_node()") missed the case for > > "cgroup_disabled=memory". However I am surprised that root_mem_cgroup > > is allocated even for "cgroup_disabled=memory" and it seems like > > css_alloc() is called even before checking if the corresponding > > controller is disabled. > > I'm surprised too. A quick test with drgn shows root memcg is definitely > allocated: > > >>> prog['root_mem_cgroup'] > *(struct mem_cgroup *)0xffff8902cf058000 = { > [snip] > > But, isn't this a bug? It can be treated as a bug as this is not expected but we can discuss and take care of it later. I think we need your patch urgently as memory reclaim and /proc/sys/vm/drop_caches is broken for "cgroup_disabled=memory" kernel. So, please send your patch asap. thanks, Shakeel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-17 17:02 ` Shakeel Butt @ 2019-07-17 17:09 ` Yang Shi 0 siblings, 0 replies; 21+ messages in thread From: Yang Shi @ 2019-07-17 17:09 UTC (permalink / raw) To: Shakeel Butt Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko, Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML On 7/17/19 10:02 AM, Shakeel Butt wrote: > On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: >> >> >> On 7/16/19 4:36 PM, Shakeel Butt wrote: >>> Adding related people. >>> >>> The thread starts at: >>> http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw >>> >>> On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote: >>>> >>>> On 7/15/19 6:36 PM, Qian Cai wrote: >>>>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 7/15/19 2:23 PM, Qian Cai wrote: >>>>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: >>>>>>>>> Another possible lead is that without reverting the those commits below, >>>>>>>>> kdump >>>>>>>>> kernel would always also crash in shrink_slab_memcg() at this line, >>>>>>>>> >>>>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >>>>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >>>>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a >>>>>>>> check is needed: >>>>>>> Actually, "memcg" is NULL. >>>>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. >>>>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), >>>>> >>>>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) >>>>> + if (!mem_cgroup_online(memcg)) >>>>> return 0; >>>>> >>>>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, >>>>> >>>>> if (mem_cgroup_disabled()) >>>>> return NULL; >>>> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). >>>> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() >>>> check before calling shrink_slab_memcg() as below: >>>> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index a0301ed..2f03c61 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int >>>> nid, >>>> unsigned long ret, freed = 0; >>>> struct shrinker *shrinker; >>>> >>>> - if (!mem_cgroup_is_root(memcg)) >>>> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) >>>> return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >>>> >>>> if (!down_read_trylock(&shrinker_rwsem)) >>>> >>> We were seeing unneeded oom-kills on kernels with >>> "cgroup_disabled=memory" and Yang's patch series basically expose the >>> bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: >>> generalize shrink_slab() calls in shrink_node()") missed the case for >>> "cgroup_disabled=memory". However I am surprised that root_mem_cgroup >>> is allocated even for "cgroup_disabled=memory" and it seems like >>> css_alloc() is called even before checking if the corresponding >>> controller is disabled. >> I'm surprised too. A quick test with drgn shows root memcg is definitely >> allocated: >> >> >>> prog['root_mem_cgroup'] >> *(struct mem_cgroup *)0xffff8902cf058000 = { >> [snip] >> >> But, isn't this a bug? > It can be treated as a bug as this is not expected but we can discuss > and take care of it later. I think we need your patch urgently as > memory reclaim and /proc/sys/vm/drop_caches is broken for > "cgroup_disabled=memory" kernel. So, please send your patch asap. Sure. I'm going to post the patch soon. > > thanks, > Shakeel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-12 19:12 ` Yang Shi 2019-07-13 4:41 ` Yang Shi 2019-07-15 21:23 ` Qian Cai @ 2019-07-19 0:54 ` Qian Cai 2019-07-19 0:59 ` Yang Shi 2 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-07-19 0:54 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel > On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > On 7/11/19 2:07 PM, Qian Cai wrote: >> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >>> Hi Qian, >>> >>> >>> Thanks for reporting the issue. But, I can't reproduce it on my machine. >>> Could you please share more details about your test? How often did you >>> run into this problem? >> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here >> is some more information. >> >> # cat .config >> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. > > According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. > > Would you please try the below patch? > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index b7f709d..66bd9db 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > if (!list_empty(page_deferred_list(head))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(head)); > + list_del_init(page_deferred_list(head)); > } > if (mapping) > __dec_node_page_state(page, NR_SHMEM_THPS); > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(page_deferred_list(page))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(page)); > + list_del_init(page_deferred_list(page)); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-19 0:54 ` Qian Cai @ 2019-07-19 0:59 ` Yang Shi 2019-07-24 18:10 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-19 0:59 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel On 7/18/19 5:54 PM, Qian Cai wrote: > >> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: >> >> >> >> On 7/11/19 2:07 PM, Qian Cai wrote: >>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >>>> Hi Qian, >>>> >>>> >>>> Thanks for reporting the issue. But, I can't reproduce it on my machine. >>>> Could you please share more details about your test? How often did you >>>> run into this problem? >>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here >>> is some more information. >>> >>> # cat .config >>> >>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >> I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. >> >> According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. >> >> Would you please try the below patch? >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index b7f709d..66bd9db 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) >> if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { >> if (!list_empty(page_deferred_list(head))) { >> ds_queue->split_queue_len--; >> - list_del(page_deferred_list(head)); >> + list_del_init(page_deferred_list(head)); >> } >> if (mapping) >> __dec_node_page_state(page, NR_SHMEM_THPS); >> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) >> spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> if (!list_empty(page_deferred_list(page))) { >> ds_queue->split_queue_len--; >> - list_del(page_deferred_list(page)); >> + list_del_init(page_deferred_list(page)); >> } >> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); >> free_compound_page(page); > Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next. It is because the patches have been dropped from -mm tree by Andrew due to this problem I guess. You have to use next-20190711, or apply the patches on today's linux-next. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-19 0:59 ` Yang Shi @ 2019-07-24 18:10 ` Qian Cai 0 siblings, 0 replies; 21+ messages in thread From: Qian Cai @ 2019-07-24 18:10 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel On Thu, 2019-07-18 at 17:59 -0700, Yang Shi wrote: > > On 7/18/19 5:54 PM, Qian Cai wrote: > > > > > On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > > > > > > > > > On 7/11/19 2:07 PM, Qian Cai wrote: > > > > On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: > > > > > Hi Qian, > > > > > > > > > > > > > > > Thanks for reporting the issue. But, I can't reproduce it on my > > > > > machine. > > > > > Could you please share more details about your test? How often did you > > > > > run into this problem? > > > > > > > > I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 > > > > server. Here > > > > is some more information. > > > > > > > > # cat .config > > > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > > > I tried your kernel config, but I still can't reproduce it. My compiler > > > doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my > > > test, but I don't think this would make any difference for this case. > > > > > > According to the bug call trace in the earlier email, it looks deferred > > > _split_scan lost race with put_compound_page. The put_compound_page would > > > call free_transhuge_page() which delete the page from the deferred split > > > queue, but it may still appear on the deferred list due to some reason. > > > > > > Would you please try the below patch? > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index b7f709d..66bd9db 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, > > > struct list_head *list) > > > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > > > if (!list_empty(page_deferred_list(head))) { > > > ds_queue->split_queue_len--; > > > - list_del(page_deferred_list(head)); > > > + list_del_init(page_deferred_list(head)); > > > } > > > if (mapping) > > > __dec_node_page_state(page, NR_SHMEM_THPS); > > > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > > if (!list_empty(page_deferred_list(page))) { > > > ds_queue->split_queue_len--; > > > - list_del(page_deferred_list(page)); > > > + list_del_init(page_deferred_list(page)); > > > } > > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > > free_compound_page(page); > > > > Unfortunately, I am no longer be able to reproduce the original list > > corruption with today’s linux-next. > > It is because the patches have been dropped from -mm tree by Andrew due > to this problem I guess. You have to use next-20190711, or apply the > patches on today's linux-next. > The patch you have here does not help. Only applied the part for free_transhuge_page() as you requested. [ 375.006307][ T3580] list_del corruption. next->prev should be ffffea0030e10098, but was ffff888ea8d0cdb8 [ 375.015928][ T3580] ------------[ cut here ]------------ [ 375.021296][ T3580] kernel BUG at lib/list_debug.c:56! [ 375.026491][ T3580] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 375.033680][ T3580] CPU: 84 PID: 3580 Comm: oom01 Tainted: G W 5.2.0-next-20190711+ #2 [ 375.042964][ T3580] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 375.052256][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6 [ 375.058135][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7 c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f> 0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7 [ 375.077722][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082 [ 375.083684][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX: ffffffffb015d728 [ 375.091566][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88903263d380 [ 375.099448][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09: ffffed12064c7a70 [ 375.107330][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12: ffffea0030e10098 [ 375.115212][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15: ffffea0031d40098 [ 375.123095][ T3580] FS: 00007fc3dc851700(0000) GS:ffff889032600000(0000) knlGS:0000000000000000 [ 375.131937][ T3580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.138421][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4: 00000000001406a0 [ 375.146301][ T3580] Call Trace: [ 375.149472][ T3580] deferred_split_scan+0x337/0x740 [ 375.154475][ T3580] ? split_huge_page_to_list+0xe30/0xe30 [ 375.160002][ T3580] ? __sched_text_start+0x8/0x8 [ 375.164743][ T3580] ? __radix_tree_lookup+0x12d/0x1e0 [ 375.169923][ T3580] do_shrink_slab+0x244/0x5a0 [ 375.174490][ T3580] shrink_slab+0x253/0x440 [ 375.178794][ T3580] ? unregister_shrinker+0x110/0x110 [ 375.183972][ T3580] ? kasan_check_read+0x11/0x20 [ 375.188715][ T3580] ? mem_cgroup_protected+0x20f/0x260 [ 375.193976][ T3580] ? shrink_node+0x1ad/0xa30 [ 375.198453][ T3580] shrink_node+0x31e/0xa30 [ 375.202755][ T3580] ? shrink_node_memcg+0x1560/0x1560 [ 375.207934][ T3580] ? ktime_get+0x93/0x110 [ 375.212147][ T3580] do_try_to_free_pages+0x22f/0x820 [ 375.217236][ T3580] ? shrink_node+0xa30/0xa30 [ 375.221711][ T3580] ? kasan_check_read+0x11/0x20 [ 375.226450][ T3580] ? check_chain_key+0x1df/0x2e0 [ 375.231277][ T3580] try_to_free_pages+0x242/0x4d0 [ 375.236102][ T3580] ? do_try_to_free_pages+0x820/0x820 [ 375.241370][ T3580] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 375.246721][ T3580] ? kasan_check_read+0x11/0x20 [ 375.251459][ T3580] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 375.256722][ T3580] ? kasan_check_read+0x11/0x20 [ 375.261458][ T3580] ? check_chain_key+0x1df/0x2e0 [ 375.266287][ T3580] ? do_anonymous_page+0x343/0xe30 [ 375.271289][ T3580] ? lock_downgrade+0x390/0x390 [ 375.276029][ T3580] ? __count_memcg_events+0x8b/0x1c0 [ 375.281204][ T3580] ? kasan_check_read+0x11/0x20 [ 375.285945][ T3580] ? __lru_cache_add+0x122/0x160 [ 375.290774][ T3580] alloc_pages_vma+0x89/0x2c0 [ 375.295339][ T3580] do_anonymous_page+0x3e1/0xe30 [ 375.300168][ T3580] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 375.305692][ T3580] ? finish_fault+0x120/0x120 [ 375.310257][ T3580] ? alloc_pages_vma+0x21e/0x2c0 [ 375.315085][ T3580] handle_pte_fault+0x457/0x12c0 [ 375.319912][ T3580] __handle_mm_fault+0x79a/0xa50 [ 375.324738][ T3580] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 375.330175][ T3580] ? kasan_check_read+0x11/0x20 [ 375.334913][ T3580] ? __count_memcg_events+0x8b/0x1c0 [ 375.340090][ T3580] handle_mm_fault+0x17f/0x370 [ 375.344745][ T3580] __do_page_fault+0x25b/0x5d0 [ 375.349398][ T3580] do_page_fault+0x4c/0x2cf [ 375.353793][ T3580] ? page_fault+0x5/0x20 [ 375.357920][ T3580] page_fault+0x1b/0x20 [ 375.361959][ T3580] RIP: 0033:0x410be0 [ 375.365737][ T3580] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f [ 375.385323][ T3580] RSP: 002b:00007fc3dc850ec0 EFLAGS: 00010206 [ 375.391283][ T3580] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007fda6c168497 [ 375.399164][ T3580] RDX: 00000000041e9000 RSI: 00000000c0000000 RDI: 0000000000000000 [ 375.407047][ T3580] RBP: 00007fc25b850000 R08: 00000000ffffffff R09: 0000000000000000 [ 375.414928][ T3580] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001 [ 375.422812][ T3580] R13: 00007ffc4a58701f R14: 0000000000000000 R15: 00007fc3dc850fc0 [ 375.430694][ T3580] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod efivarfs [ 375.455820][ T3580] ---[ end trace 82d52f9627313e53 ]--- [ 375.461172][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6 [ 375.467048][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7 c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f> 0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7 [ 375.486635][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082 [ 375.492597][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX: ffffffffb015d728 [ 375.500479][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88903263d380 [ 375.508361][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09: ffffed12064c7a70 [ 375.516244][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12: ffffea0030e10098 [ 375.524124][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15: ffffea0031d40098 [ 375.532007][ T3580] FS: 00007fc3dc851700(0000) GS:ffff889032600000(0000) knlGS:0000000000000000 [ 375.540851][ T3580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.547335][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4: 00000000001406a0 [ 375.555217][ T3580] Kernel panic - not syncing: Fatal exception [ 376.868640][ T3580] Shutting down cpus with NMI [ 376.873223][ T3580] Kernel Offset: 0x2ec00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 376.884878][ T3580] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai 2019-07-11 0:16 ` Yang Shi @ 2019-07-15 4:52 ` Yang Shi 2019-07-24 21:13 ` Qian Cai 2 siblings, 0 replies; 21+ messages in thread From: Yang Shi @ 2019-07-15 4:52 UTC (permalink / raw) To: Hillf Danton, Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/13/19 8:53 PM, Hillf Danton wrote: > On Wed, 10 Jul 2019 14:43:28 -0700 (PDT) Qian Cai wrote: >> Running LTP oom01 test case with swap triggers a crash below. Revert the series >> "Make deferred split shrinker memcg aware" [1] seems fix the issue. >> >> aefde94195ca mm: thp: make deferred split shrinker memcg aware >> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix >> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 >> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix >> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem >> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() >> 4e050f2df876 mm: thp: extract split_queue_* into a struct >> >> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@linux.alibaba.com/ >> >> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is LIST_POISON1 (dead000000000100) >> [ 1145.739763][ T5764] ------------[ cut here ]------------ >> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! >> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: G W 5.2.0-next-20190710+ #7 >> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 >> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a >> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e >> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f> >> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 >> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082 >> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: ffffffffae95d318 >> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888440bd380 >> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: ffffed1108817a70 >> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: dead000000000122 >> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: dead000000000100 >> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) knlGS:0000000000000000 >> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: 00000000001406a0 >> [ 1145.870664][ T5764] Call Trace: >> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 >> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 >> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 >> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 >> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 >> [ 1145.900159][ T5764] shrink_slab+0x253/0x440 >> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 >> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 >> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 >> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 >> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 >> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 >> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 >> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 >> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 >> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 >> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 >> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 >> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 >> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 >> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 >> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 >> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 >> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 >> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 >> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 >> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 >> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 >> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 >> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 >> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 >> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 >> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 >> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 >> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf >> [ 1146.075426][ T5764] ? page_fault+0x5/0x20 >> [ 1146.079553][ T5764] page_fault+0x1b/0x20 >> [ 1146.083594][ T5764] RIP: 0033:0x410be0 >> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 >> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> >> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f >> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206 >> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f98f2674497 >> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: 0000000000000000 >> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: 0000000000000000 >> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000 >> [ 1147.588181][ T5764] Shutting down cpus with NMI >> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > Ignore the noise if there is no chance you think to corrupt the local list walk > in some way like: > > CPU0 CPU1 > ---- ---- > take no lock spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > list_for_each_safe(pos, next, > &list) > list_del(page_deferred_list(page)); > page = list_entry((void *)pos, > struct page, mapping); > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); IMHO, I didn't see the race could happen really. list_del() is called at 3 places: 1. Parallel free_transhuge_page(): The refcount bump should prevent from the race. 2. Parallel reclaimer: split_queue_lock should prevent this, so the other reclaimer should not see the same page. 3. Parallel split_huge_page(): I'm not sure about this one. But, page lock should be acquired before calling split_huge_page() in other call paths too. I'm not sure if I miss anything, please feel free to correct me. > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > if (!list_empty(page_deferred_list(head))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(head)); > + list_del_init(page_deferred_list(head)); > } > if (mapping) > __dec_node_page_state(page, NR_SHMEM_THPS); > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(page_deferred_list(page))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(page)); > + list_del_init(page_deferred_list(page)); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); > -- I proposed the similar thing. > The major important is listed above; the minor trivial part below. > Both are only for thought collectings. > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2869,9 +2869,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > struct pglist_data *pgdata = NODE_DATA(sc->nid); > struct deferred_split *ds_queue; > unsigned long flags; > - LIST_HEAD(list), *pos, *next; > struct page *page; > - int split = 0; > + unsigned long nr_split = 0; > > #ifdef CONFIG_MEMCG > if (sc->memcg) > @@ -2884,44 +2883,44 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > /* Take pin on all head pages to avoid freeing them under us */ > - list_for_each_safe(pos, next, &ds_queue->split_queue) { > - page = list_entry((void *)pos, struct page, mapping); > + while (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) { > + bool locked, pinned; > + > + page = list_first_entry(&ds_queue->split_queue, struct page, > + mapping); > page = compound_head(page); > + > if (get_page_unless_zero(page)) { > - list_move(page_deferred_list(page), &list); > + pinned = true; > + locked = trylock_page(page); > } else { > /* We lost race with put_compound_page() */ > - list_del_init(page_deferred_list(page)); > - ds_queue->split_queue_len--; > + pinned = false; > + locked = false; > + } > + list_del_init(page_deferred_list(page)); > + ds_queue->split_queue_len--; > + --sc->nr_to_scan; > + if (!pinned) > + continue; > + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > + if (locked) { > + if (!split_huge_page(page)) > + nr_split++; > + unlock_page(page); > } > - if (!--sc->nr_to_scan) > - break; > - } > - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > - > - list_for_each_safe(pos, next, &list) { > - page = list_entry((void *)pos, struct page, mapping); > - if (!trylock_page(page)) > - goto next; > - /* split_huge_page() removes page from list on success */ > - if (!split_huge_page(page)) > - split++; > - unlock_page(page); > -next: > put_page(page); > + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > } > - > - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > - list_splice_tail(&list, &ds_queue->split_queue); > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > /* > * Stop shrinker if we didn't split any page, but the queue is empty. > * This can happen if pages were freed under us. > */ > - if (!split && list_empty(&ds_queue->split_queue)) > + if (!nr_split && list_empty(&ds_queue->split_queue)) > return SHRINK_STOP; > - return split; > + return nr_split; > } > > static struct shrinker deferred_split_shrinker = { > -- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai 2019-07-11 0:16 ` Yang Shi 2019-07-15 4:52 ` Yang Shi @ 2019-07-24 21:13 ` Qian Cai 2019-07-25 21:46 ` Yang Shi 2 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-07-24 21:13 UTC (permalink / raw) To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: > Running LTP oom01 test case with swap triggers a crash below. Revert the > series > "Make deferred split shrinker memcg aware" [1] seems fix the issue. You might want to look harder on this commit, as reverted it alone on the top of 5.2.0-next-20190711 fixed the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ linux.alibaba.com/ There are all console output while running LTP oom01 before the crash that might be useful. [ 656.302886][ T3384] WARNING: CPU: 79 PID: 3384 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1a8a/0x1bc0 [ 656.304395][ T3409] kmemleak: Cannot allocate a kmemleak_object structure [ 656.312714][ T3384] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure dax_pmem irqbypass dax_pmem_core efivars ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs [ 656.320916][ T3409] kmemleak: Kernel memory leak detector disabled [ 656.344509][ T3384] CPU: 79 PID: 3384 Comm: oom01 Not tainted 5.2.0-next- 20190711+ #3 [ 656.344523][ T3384] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 656.352100][ T829] kmemleak: Automatic memory scanning thread ended [ 656.358648][ T3384] RIP: 0010:__alloc_pages_nodemask+0x1a8a/0x1bc0 [ 656.358658][ T3384] Code: 00 85 d2 0f 85 a1 00 00 00 48 c7 c7 e0 29 c3 a3 e8 3b 98 62 00 65 48 8b 1c 25 80 ee 01 00 e9 85 fa ff ff 0f 0b e9 3e fb ff ff <0f> 0b 48 8b b5 00 ff ff ff 8b 8d 84 fe ff ff 48 c7 c2 00 1d 6c a3 [ 656.358675][ T3384] RSP: 0000:ffff888efa4a6210 EFLAGS: 00010046 [ 656.406140][ T3384] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffa2b28be2 [ 656.414033][ T3384] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffffa4d15d60 [ 656.421926][ T3384] RBP: ffff888efa4a6420 R08: fffffbfff49a2bad R09: fffffbfff49a2bac [ 656.429818][ T3384] R10: fffffbfff49a2bac R11: 0000000000000003 R12: ffffffffa4d15d60 [ 656.437711][ T3384] R13: 0000000000000000 R14: 0000000000000800 R15: 0000000000000000 [ 656.445605][ T3384] FS: 00007ff44adfc700(0000) GS:ffff889032f80000(0000) knlGS:0000000000000000 [ 656.454459][ T3384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 656.460952][ T3384] CR2: 00007ff2f05e1000 CR3: 0000001012e44000 CR4: 00000000001406a0 [ 656.468843][ T3384] Call Trace: [ 656.472026][ T3384] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 656.477303][ T3384] ? stack_depot_save+0x215/0x58b [ 656.482228][ T3384] ? lock_downgrade+0x390/0x390 [ 656.486976][ T3384] ? stack_depot_save+0x183/0x58b [ 656.491900][ T3384] ? kasan_check_read+0x11/0x20 [ 656.496647][ T3384] ? do_raw_spin_unlock+0xa8/0x140 [ 656.501658][ T3384] ? stack_depot_save+0x215/0x58b [ 656.506582][ T3384] alloc_pages_current+0x9c/0x110 [ 656.511505][ T3384] allocate_slab+0x351/0x11f0 [ 656.516077][ T3384] ? kasan_slab_alloc+0x11/0x20 [ 656.520824][ T3384] new_slab+0x46/0x70 [ 656.524702][ T3384] ? pageout.isra.4+0x3e5/0xa00 [ 656.529449][ T3384] ___slab_alloc+0x5d4/0x9c0 [ 656.533933][ T3384] ? try_to_free_pages+0x242/0x4d0 [ 656.538941][ T3384] ? __alloc_pages_nodemask+0x9ce/0x1bc0 [ 656.544476][ T3384] ? alloc_pages_vma+0x89/0x2c0 [ 656.549226][ T3384] ? __do_page_fault+0x25b/0x5d0 [ 656.554064][ T3384] ? create_object+0x3a/0x3e0 [ 656.558637][ T3384] ? init_object+0x7e/0x90 [ 656.562947][ T3384] ? create_object+0x3a/0x3e0 [ 656.567520][ T3384] __slab_alloc+0x12/0x20 [ 656.571742][ T3384] ? __slab_alloc+0x12/0x20 [ 656.576142][ T3384] kmem_cache_alloc+0x32a/0x400 [ 656.580890][ T3384] create_object+0x3a/0x3e0 [ 656.585291][ T3384] ? stack_depot_save+0x183/0x58b [ 656.590215][ T3384] kmemleak_alloc+0x71/0xa0 [ 656.594611][ T3384] kmem_cache_alloc+0x272/0x400 [ 656.599361][ T3384] ? ___might_sleep+0xab/0xc0 [ 656.603934][ T3384] ? mempool_free+0x170/0x170 [ 656.608507][ T3384] mempool_alloc_slab+0x2d/0x40 [ 656.613254][ T3384] mempool_alloc+0x10a/0x29e [ 656.617739][ T3384] ? alloc_pages_vma+0x89/0x2c0 [ 656.622485][ T3384] ? mempool_resize+0x390/0x390 [ 656.627233][ T3384] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 656.633730][ T3384] bio_alloc_bioset+0x150/0x330 [ 656.638477][ T3384] ? bvec_alloc+0x1b0/0x1b0 [ 656.642892][ T3384] alloc_io+0x2f/0x230 [dm_mod] [ 656.647654][ T3384] __split_and_process_bio+0x99/0x630 [dm_mod] [ 656.653714][ T3384] ? blk_rq_map_sg+0x9f0/0x9f0 [ 656.658388][ T3384] ? __send_empty_flush.constprop.11+0x1f0/0x1f0 [dm_mod] [ 656.665407][ T3384] ? check_chain_key+0x1df/0x2e0 [ 656.670244][ T3384] ? kasan_check_read+0x11/0x20 [ 656.674992][ T3384] ? blk_queue_split+0x60/0x90 [ 656.679654][ T3384] ? __blk_queue_split+0x970/0x970 [ 656.684679][ T3384] dm_process_bio+0x33f/0x520 [dm_mod] [ 656.690054][ T3384] ? __process_bio+0x230/0x230 [dm_mod] [ 656.695515][ T3384] dm_make_request+0xbd/0x150 [dm_mod] [ 656.700888][ T3384] ? dm_wq_work+0x1b0/0x1b0 [dm_mod] [ 656.706073][ T3384] ? lock_downgrade+0x390/0x390 [ 656.710821][ T3384] generic_make_request+0x179/0x4a0 [ 656.715917][ T3384] ? blk_queue_exit+0xc0/0xc0 [ 656.720489][ T3384] ? __unlock_page_memcg+0x4f/0x90 [ 656.725495][ T3384] ? unlock_page_memcg+0x1f/0x30 [ 656.730329][ T3384] submit_bio+0xaa/0x270 [ 656.734466][ T3384] ? generic_make_request+0x4a0/0x4a0 [ 656.739739][ T3384] __swap_writepage+0x8f5/0xba0 [ 656.744484][ T3384] ? __x64_sys_madvise.cold.0+0x22/0x22 [ 656.749931][ T3384] ? generic_swapfile_activate+0x2a0/0x2a0 [ 656.755638][ T3384] ? do_raw_spin_lock+0x118/0x1d0 [ 656.760559][ T3384] ? rwlock_bug.part.0+0x60/0x60 [ 656.765393][ T3384] ? page_swapcount+0x68/0xc0 [ 656.769967][ T3384] ? kasan_check_read+0x11/0x20 [ 656.774713][ T3384] ? do_raw_spin_unlock+0xa8/0x140 [ 656.779724][ T3384] ? __frontswap_store+0x103/0x2b0 [ 656.784735][ T3384] swap_writepage+0x65/0xb0 [ 656.789134][ T3384] pageout.isra.4+0x3e5/0xa00 [ 656.793707][ T3384] ? shrink_slab+0x440/0x440 [ 656.798192][ T3384] ? kasan_check_read+0x11/0x20 [ 656.802939][ T3384] shrink_page_list+0x159f/0x2650 [ 656.807860][ T3384] ? page_evictable+0x150/0x150 [ 656.812606][ T3384] ? kasan_check_read+0x11/0x20 [ 656.817352][ T3384] ? check_chain_key+0x1df/0x2e0 [ 656.822185][ T3384] ? shrink_inactive_list+0x2ea/0x770 [ 656.827456][ T3384] ? lock_downgrade+0x390/0x390 [ 656.832202][ T3384] ? do_raw_spin_lock+0x118/0x1d0 [ 656.837126][ T3384] ? rwlock_bug.part.0+0x60/0x60 [ 656.841959][ T3384] ? kasan_check_read+0x11/0x20 [ 656.846706][ T3384] ? do_raw_spin_unlock+0xa8/0x140 [ 656.851715][ T3384] shrink_inactive_list+0x373/0x770 [ 656.856812][ T3384] ? move_pages_to_lru+0xb60/0xb60 [ 656.861820][ T3384] ? shrink_node_memcg+0xcfa/0x1560 [ 656.866917][ T3384] ? lock_downgrade+0x390/0x390 [ 656.871665][ T3384] ? find_next_bit+0x2c/0xa0 [ 656.876151][ T3384] shrink_node_memcg+0x4ff/0x1560 [ 656.881075][ T3384] ? shrink_active_list+0xa10/0xa10 [ 656.886173][ T3384] ? dev_ifsioc+0xb0/0x4d0 [ 656.890485][ T3384] ? mem_cgroup_iter+0x18e/0x840 [ 656.895319][ T3384] ? kasan_check_read+0x11/0x20 [ 656.900066][ T3384] ? mem_cgroup_protected+0x20f/0x260 [ 656.905334][ T3384] shrink_node+0x1d3/0xa30 [ 656.909644][ T3384] ? shrink_node_memcg+0x1560/0x1560 [ 656.914828][ T3384] ? ktime_get+0x93/0x110 [ 656.919050][ T3384] do_try_to_free_pages+0x22f/0x820 [ 656.924146][ T3384] ? shrink_node+0xa30/0xa30 [ 656.928632][ T3384] ? kasan_check_read+0x11/0x20 [ 656.933379][ T3384] ? check_chain_key+0x1df/0x2e0 [ 656.938212][ T3384] try_to_free_pages+0x242/0x4d0 [ 656.943046][ T3384] ? do_try_to_free_pages+0x820/0x820 [ 656.948318][ T3384] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 656.953677][ T3384] ? kasan_check_read+0x11/0x20 [ 656.958424][ T3384] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 656.963697][ T3384] ? kasan_check_read+0x11/0x20 [ 656.968443][ T3384] ? check_chain_key+0x1df/0x2e0 [ 656.973277][ T3384] ? do_anonymous_page+0x343/0xe30 [ 656.978288][ T3384] ? lock_downgrade+0x390/0x390 [ 656.983035][ T3384] ? __count_memcg_events+0x8b/0x1c0 [ 656.988218][ T3384] ? kasan_check_read+0x11/0x20 [ 656.992966][ T3384] ? __lru_cache_add+0x122/0x160 [ 656.997802][ T3384] alloc_pages_vma+0x89/0x2c0 [ 657.002375][ T3384] do_anonymous_page+0x3e1/0xe30 [ 657.007211][ T3384] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 657.012743][ T3384] ? finish_fault+0x120/0x120 [ 657.017314][ T3384] ? alloc_pages_vma+0x21e/0x2c0 [ 657.022148][ T3384] handle_pte_fault+0x457/0x12c0 [ 657.026984][ T3384] __handle_mm_fault+0x79a/0xa50 [ 657.031819][ T3384] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 657.037267][ T3384] ? kasan_check_read+0x11/0x20 [ 657.042013][ T3384] ? __count_memcg_events+0x8b/0x1c0 [ 657.047199][ T3384] handle_mm_fault+0x17f/0x370 [ 657.051863][ T3384] __do_page_fault+0x25b/0x5d0 [ 657.056521][ T3384] do_page_fault+0x4c/0x2cf [ 657.060922][ T3384] ? page_[ 659.105948][ T3124] kworker/2:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4 [ 659.106045][ T1598] kworker/10:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4 [ 659.118049][ T3124] CPU: 2 PID: 3124 Comm: kworker/2:1H Tainted: G W 5.2.0-next-20190711+ #3 [ 659.137325][ T762] ODEBUG: Out of memory. ODEBUG disabled [ 659.140015][ T3124] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 659.140032][ T3124] Workqueue: kblockd blk_mq_run_work_fn [ 659.160266][ T3124] Call Trace: [ 659.163442][ T3124] dump_stack+0x62/0x9a [ 659.167487][ T3124] warn_alloc.cold.45+0x8a/0x12a [ 659.172315][ T3124] ? zone_watermark_ok_safe+0x1a0/0x1a0 [ 659.177756][ T3124] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 659.184252][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.190658][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.197060][ T3124] ? __isolate_free_page+0x390/0x390 [ 659.202239][ T3124] __alloc_pages_nodemask+0x1aab/0x1bc0 [ 659.207680][ T3124] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 659.212949][ T3124] ? stack_trace_save+0x87/0xb0 [ 659.217689][ T3124] ? freezing_slow_path.cold.1+0x35/0x35 [ 659.223219][ T3124] ? __kasan_kmalloc.part.0+0x81/0xc0 [ 659.228485][ T3124] ? __kasan_kmalloc.part.0+0x44/0xc0 [ 659.233750][ T3124] ? __kasan_kmalloc.constprop.1+0xac/0xc0 [ 659.239451][ T3124] ? kasan_slab_alloc+0x11/0x20 [ 659.244196][ T3124] ? kmem_cache_alloc+0x17a/0x400 [ 659.249113][ T3124] ? alloc_iova+0x33/0x210 [ 659.253418][ T3124] ? alloc_iova_fast+0x47/0xba [ 659.258073][ T3124] ? dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 659.263603][ T3124] ? map_sg+0x99/0x2f0 [ 659.267558][ T3124] ? scsi_dma_map+0xc6/0x160 [ 659.272042][ T3124] ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 659.280020][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.286421][ T3124] ? scsi_queue_rq+0x7c6/0x1280 [ 659.291163][ T3124] ? ftrace_graph_ret_addr+0x2a/0xb0 [ 659.296340][ T3124] ? stack_trace_save+0x87/0xb0 [ 659.301081][ T3124] alloc_pages_current+0x9c/0x110 [ 659.305998][ T3124] allocate_slab+0x351/0x11f0 [ 659.310564][ T3124] new_slab+0x46/0x70 [ 659.314433][ T3124] ___slab_alloc+0x5d4/0x9c0 [ 659.318913][ T3124] ? should_fail+0x107/0x3bc [ 659.323393][ T3124] ? alloc_iova+0x33/0x210 [ 659.327700][ T3124] ? lock_downgrade+0x390/0x390 [ 659.332441][ T3124] ? lock_downgrade+0x390/0x390 [ 659.337183][ T3124] ? alloc_iova+0x33/0x210 [ 659.341487][ T3124] __slab_alloc+0x12/0x20 [ 659.345704][ T3124] ? __slab_alloc+0x12/0x20 [ 659.350096][ T3124] kmem_cache_alloc+0x32a/0x400 [ 659.354838][ T3124] ? kasan_check_read+0x11/0x20 [ 659.359580][ T3124] ? do_raw_spin_unlock+0xa8/0x140 [ 659.364585][ T3124] alloc_iova+0x33/0x210 [ 659.368714][ T3124] ? iova_rcache_get+0x1a1/0x300 [ 659.373545][ T3124] alloc_iova_fast+0x47/0xba [ 659.378026][ T3124] dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 659.383381][ T3124] map_sg+0x99/0x2f0 [ 659.387161][ T3124] scsi_dma_map+0xc6/0x160 [ 659.391470][ T3124] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 659.399274][ T3124] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi] [ 659.405507][ T3124] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.411733][ T3124] ? scsi_init_io+0x102/0x150 [ 659.416306][ T3124] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod] [ 659.422713][ T3124] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi] [ 659.428593][ T3124] ? sd_init_command+0x88b/0x930 [sd_mod] [ 659.434211][ T3124] ? blk_add_timer+0xd7/0x110 [ 659.438780][ T3124] scsi_queue_rq+0x7c6/0x1280 [ 659.443350][ T3124] blk_mq_dispatch_rq_list+0x9d3/0xba0 [ 659.448702][ T3124] ? blk_mq_flush_busy_ctxs+0x1c5/0x450 [ 659.454145][ T3124] ? blk_mq_get_driver_tag+0x290/0x290 [ 659.459498][ T3124] ? __lock_acquire.isra.13+0x430/0x830 [ 659.464938][ T3124] blk_mq_sched_dispatch_requests+0x2f4/0x300 [ 659.470903][ T3124] ? blk_mq_sched_restart+0x60/0x60 [ 659.475993][ T3124] __blk_mq_run_hw_queue+0x156/0x230 [ 659.481172][ T3124] ? hctx_lock+0xc0/0xc0 [ 659.485301][ T3124] ? process_one_work+0x426/0xa70 [ 659.490217][ T3124] blk_mq_run_work_fn+0x3b/0x40 [ 659.494959][ T3124] process_one_work+0x53b/0xa70 [ 659.499703][ T3124] ? pwq_dec_nr_in_flight+0x170/0x170 [ 659.504967][ T3124] worker_thread+0x63/0x5b0 [ 659.509361][ T3124] kthread+0x1df/0x200 [ 659.513316][ T3124] ? process_one_work+0xa70/0xa70 [ 659.518231][ T3124] ? kthread_park+0xc0/0xc0 [ 659.522625][ T3124] ret_from_fork+0x22/0x40 [ 659.526937][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted: G W 5.2.0-next-20190711+ #3 [ 659.526991][ T3124] Mem-Info: [ 659.536921][ T1598] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 659.536934][ T1598] Workqueue: kblockd blk_mq_run_work_fn [ 659.540067][ T3124] active_anon:4662210 inactive_anon:359358 isolated_anon:2005 [ 659.540067][ T3124] active_file:10032 inactive_file:12947 isolated_file:0 [ 659.540067][ T3124] unevictable:0 dirty:12 writeback:0 unstable:0 [ 659.540067][ T3124] slab_reclaimable:71207 slab_unreclaimable:1252996 [ 659.540067][ T3124] mapped:17530 shmem:1850 pagetables:11491 bounce:0 [ 659.540067][ T3124] free:54096 free_pcp:5994 free_cma:84 [ 659.549192][ T1598] Call Trace: [ 659.549203][ T1598] dump_stack+0x62/0x9a [ 659.554639][ T3124] Node 0 active_anon:2246440kB inactive_anon:572540kB active_file:19500kB inactive_file:19016kB unevictable:0kB isolated(anon):7708kB isolated(file):0kB mapped:24840kB dirty:8kB writeback:0kB shmem:1372kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1689600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.593619][ T1598] warn_alloc.cold.45+0x8a/0x12a [ 659.596785][ T3124] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.600821][ T1598] ? zone_watermark_ok_safe+0x1a0/0x1a0 [ 659.630195][ T3124] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.635021][ T1598] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 659.661328][ T3124] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.661337][ T3124] Node 4 active_anon:16402112kB inactive_anon:865180kB active_file:20600kB inactive_file:32712kB unevictable:0kB isolated(anon):304kB isolated(file):0kB mapped:45216kB dirty:40kB writeback:12kB shmem:6028kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.666778][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.693086][ T3124] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.693096][ T3124] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.699583][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.725894][ T3124] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 659.755524][ T1598] ? __isolate_free_page+0x390/0x390 [ 659.761953][ T3124] Node 0 DMA free:15908kB min:24kB low:36kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 659.788234][ T1598] __alloc_pages_nodemask+0x1aab/0x1bc0 [ 659.814544][ T3124] lowmem_reserve[]: 0 1532 19982 19982 19982 [ 659.820945][ T1598] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 659.847287][ T3124] Node 0 DMA32 free:73504kB min:2676kB low:4244kB high:5812kB active_anon:1190128kB inactive_anon:362496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1923080kB managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1432kB local_pcp:0kB free_cma:0kB [ 659.852428][ T1598] ? stack_trace_save+0x87/0xb0 [ 659.852435][ T1598] ? freezing_slow_path.cold.1+0x35/0x35 [ 659.879003][ T3124] lowmem_reserve[]: 0 0 18450 18450 18450 [ 659.884446][ T1598] ? __kasan_kmalloc.part.0+0x81/0xc0 [ 659.890346][ T3124] Node 0 Normal free:47760kB min:137264kB low:156156kB high:175048kB active_anon:1056208kB inactive_anon:209672kB active_file:19456kB inactive_file:18996kB unevictable:0kB writepending:0kB present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB bounce:0kB free_pcp:9340kB local_pcp:164kB free_cma:0kB [ 659.895574][ T1598] ? __kasan_kmalloc.part.0+0x44/0xc0 [ 659.895581][ T1598] ? __kasan_kmalloc.constprop.1+0xac/0xc0 [ 659.924420][ T3124] lowmem_reserve[]: 0 0 0 0 0 [ 659.929163][ T1598] ? kasan_slab_alloc+0x11/0x20 [ 659.929170][ T1598] ? kmem_cache_alloc+0x17a/0x400 [ 659.934724][ T3124] Node 4 Normal free:72728kB min:234904kB low:267232kB high:299560kB active_anon:16401776kB inactive_anon:865580kB active_file:20596kB inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB bounce:0kB free_pcp:12956kB local_pcp:24kB free_cma:336kB [ 659.940301][ T1598] ? alloc_iova+0x33/0x210 [ 659.940307][ T1598] ? alloc_iova_fast+0x47/0xba [ 659.945563][ T3124] lowmem_reserve[]: 0 0 0 0 0 [ 659.976773][ T1598] ? dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 659.976780][ T1598] ? map_sg+0x99/0x2f0 [ 659.982039][ T3124] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB [ 659.987736][ T1598] ? scsi_dma_map+0xc6/0x160 [ 659.987747][ T1598] ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 659.992300][ T3124] Node 0 DMA32: 0*4kB 0*8kB 2*16kB (M) 5*32kB (UM) 17*64kB (UM) 8*128kB (UM) 12*256kB (UM) 11*512kB (UM) 10*1024kB (UM) 2*2048kB (UM) 12*4096kB (M) = 74496kB [ 659.997045][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 659.997051][ T1598] ? scsi_queue_rq+0x7c6/0x1280 [ 660.001958][ T3124] Node 0 Normal: 0*4kB 0*8kB 198*16kB (MEH) 356*32kB (ME) 83*64kB (UME) 15*128kB (UME) 101*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47648kB [ 660.033521][ T1598] ? ftrace_graph_ret_addr+0x2a/0xb0 [ 660.033528][ T1598] ? stack_trace_save+0x87/0xb0 [ 660.037828][ T3124] Node 4 Normal: 0*4kB 0*8kB 211*16kB (UME) 441*32kB (UME) 449*64kB (UME) 71*128kB (ME) 62*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71184kB [ 660.042481][ T1598] alloc_pages_current+0x9c/0x110 [ 660.047042][ T3124] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 660.052569][ T1598] allocate_slab+0x351/0x11f0 [ 660.056516][ T3124] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 660.056521][ T3124] Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 660.070694][ T1598] new_slab+0x46/0x70 [ 660.075169][ T3124] Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 660.083141][ T1598] ___slab_alloc+0x5d4/0x9c0 [ 660.098879][ T3124] 26058 total pagecache pages [ 660.098894][ T3124] 1298 pages in swap cache [ 660.105279][ T1598] ? should_fail+0x107/0x3bc [ 660.105285][ T1598] ? alloc_iova+0x33/0x210 [ 660.110020][ T3124] Swap cache stats: add 2607, delete 1311, find 0/1 [ 660.110024][ T3124] Free swap = 32919548kB [ 660.124719][ T1598] ? lock_downgrade+0x390/0x390 [ 660.124725][ T1598] ? lock_downgrade+0x390/0x390 [ 660.129894][ T3124] Total swap = 32952316kB [ 660.129899][ T3124] 15685025 pages RAM [ 660.134637][ T1598] ? alloc_iova+0x33/0x210 [ 660.149328][ T3124] 0 pages HighMem/MovableOnly [ 660.149332][ T3124] 2465994 pages reserved [ 660.154245][ T1598] __slab_alloc+0x12/0x20 [ 660.154252][ T1598] ? __slab_alloc+0x12/0x20 [ 660.163701][ T3124] 16384 pages cma reserved [ 660.163763][ T3124] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [ 660.168269][ T1598] kmem_cache_alloc+0x32a/0x400 [ 660.168276][ T1598] ? kasan_check_read+0x11/0x20 [ 660.177465][ T3124] cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 [ 660.177470][ T3124] node 0: slabs: 10580, objs: 95220, free: 0 [ 660.186924][ T1598] ? do_raw_spin_unlock+0xa8/0x140 [ 660.186930][ T1598] alloc_iova+0x33/0x210 [ 660.190792][ T3124] node 4: slabs: 2292, objs: 20628, free: 25 [ 660.199982][ T1598] ? iova_rcache_get+0x1a1/0x300 [ 660.199989][ T1598] alloc_iova_fast+0x47/0xba [ 660.204513][ T3124] kworker/2:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4 [ 660.209026][ T1598] dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 660.351109][ T1598] map_sg+0x99/0x2f0 [ 660.354891][ T1598] ? __debug_object_init+0x412/0x7a0 [ 660.360070][ T1598] scsi_dma_map+0xc6/0x160 [ 660.364381][ T1598] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 660.372184][ T1598] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi] [ 660.378415][ T1598] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 660.384644][ T1598] ? scsi_init_io+0x102/0x150 [ 660.389217][ T1598] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod] [ 660.395622][ T1598] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi] [ 660.401503][ T1598] ? sd_init_command+0x88b/0x930 [sd_mod] [ 660.407119][ T1598] ? blk_add_timer+0xd7/0x110 [ 660.411686][ T1598] scsi_queue_rq+0x7c6/0x1280 [ 660.416252][ T1598] blk_mq_dispatch_rq_list+0x9d3/0xba0 [ 660.421604][ T1598] ? blk_mq_flush_busy_ctxs+0x1c5/0x450 [ 660.427045][ T1598] ? blk_mq_get_driver_tag+0x290/0x290 [ 660.432396][ T1598] ? __lock_acquire.isra.13+0xT3124] __blk_mq_run_hw_queue+0x156/0x230 [ 660.822569][ T3124] ? hctx_lock+0xc0/0xc0 [ 660.826700][ T3124] ? process_one_work+0x426/0xa70 [ 660.831617][ T3124] blk_mq_run_work_fn+0x3b/0x40 [ 660.836358][ T3124] process_one_work+0x53b/0xa70 [ 660.841100][ T3124] ? pwq_dec_nr_in_flight+0x170/0x170 [ 660.846365][ T3124] worker_thread+0x63/0x5b0 [ 660.850756][ T3124] kthread+0x1df/0x200 [ 660.854712][ T3124] ? process_one_work+0xa70/0xa70 [ 660.859626][ T3124] ? kthread_park+0xc0/0xc0 [ 660.864021][ T3124] ret_from_fork+0x22/0x40 [ 660.868328][ T3124] warn_alloc_show_mem: 1 callbacks suppressed [ 660.868332][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted: G W 5.2.0-next-20190711+ #3 [ 660.868335][ T3124] Mem-Info: [ 660.868485][ T3124] active_anon:4662011 inactive_anon:359383 isolated_anon:2155 [ 660.868485][ T3124] active_file:10012 inactive_file:12922 isolated_file:0 [ 660.868485][ T3124] unevictable:0 dirty:12 writeback:0 unstable:0 [ 660.868485][ T3h:175048kB active_anon:1056208kB inactive_anon:209448kB active_file:19452kB inactive_file:18996kB unevictable:0kB writepending:0kB present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB bounce:0kB free_pcp:8784kB local_pcp:164kB free_cma:0kB [ 661.222532][ T1598] ? kernel_poison_pages.cold.2+0x8c/0x8c [ 661.228397][ T3124] lowmem_reserve[]: 0 0 0 0 0 [ 661.233138][ T1598] ? vprintk_default+0x1f/0x30 [ 661.233146][ T1598] alloc_pages_current+0x9c/0x110 [ 661.238174][ T3124] Node 4 Normal free:71384kB min:234904kB low:267232kB high:299560kB active_anon:16401776kB inactive_anon:865588kB active_file:20596kB inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB bounce:0kB free_pcp:12872kB local_pcp:24kB free_cma:336kB [ 661.266900][ T1598] allocate_slab+0x351/0x11f0 [ 661.266905][ T1598] new_slab+0x46/0x70 [ 661.271461][ T3124] lowmem_reserve[]: 0 0 0 0 0 [ 661.275941][ T1598] ___slab_alloc+0x5d4/0x9c0 [ 661.275948][ T1598] ? should0 [ 661.543007][ T3132] cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 [ 661.543011][ T3203] node 0: slabs: 10582, objs: 95238, free: 7 [ 661.543016][ T3132] node 0: slabs: 10582, objs: 95238, free: 7 [ 661.543020][ T3203] node 4: slabs: 2293, objs: 20637, free: 30 [ 661.543026][ T3132] node 4: slabs: 2293, objs: 20637, free: 30 [ 661.543040][ T3203] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [ 661.543046][ T3203] cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 [ 661.543052][ T3203] node 0: slabs: 10582, objs: 95238, free: 7 [ 661.543057][ T3132] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [ 661.543061][ T3203] node 4: slabs: 2293, objs: 20637, free: 30 [ 661.543066][ T3132] cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 [ 661.543072][ T3132] node 0: slabs: 10582, objs: 95238, free: 7 [ 661.543078][ T3132] node 4: slabs: 2293, objs: 20637, free: 30 [ 661.543544][ T3205] SLUB: Unable to allocnevictable:0kB isolated(anon):352kB isolated(file):0kB mapped:45056kB dirty:40kB writeback:52kB shmem:6028kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 662.181289][ T1598] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 662.207607][ T3209] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 662.212434][ T1598] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 662.238751][ T3209] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 662.244187][ T1598] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(ano alloc_iova_fast+0x47/0xba [ 662.835750][ T3209] dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 662.841103][ T3209] map_sg+0x99/0x2f0 [ 662.844886][ T3209] ? kasan_check_read+0x11/0x20 [ 662.849627][ T3209] scsi_dma_map+0xc6/0x160 [ 662.853938][ T3209] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 662.861740][ T3209] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi] [ 662.867971][ T3209] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 662.874198][ T3209] ? scsi_init_io+0x102/0x150 [ 662.878768][ T3209] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod] [ 662.885176][ T3209] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi] [ 662.891055][ T3209] ? sd_init_command+0x88b/0x930 [sd_mod] [ 662.896672][ T3209] ? blk_add_timer+0xd7/0x110 [ 662.901240][ T3209] scsi_queue_rq+0x7c6/0x1280 [ 662.905807][ T3209] blk_mq_dispatch_rq_list+0x9d3/0xba0 [ 662.911159][ T3209] ? blk_mq_flush_busy_ctxs+0x1c5/0x450 [ 662.916601][ T3209] ? blk_mq_get_driver_tag+0x290/0x290 [ 662.921953][ T3209] ? __lock_acquire.isra.13+0x430/0x830 [ 662.927394][ T3209] blk_mq_sched_diag+0x290/0x290 [ 663.313403][ T3146] ? __lock_acquire.isra.13+0x430/0x830 [ 663.318844][ T3146] blk_mq_sched_dispatch_requests+0x2f4/0x300 [ 663.324807][ T3146] ? blk_mq_sched_restart+0x60/0x60 [ 663.329898][ T3146] __blk_mq_run_hw_queue+0x156/0x230 [ 663.335076][ T3146] ? hctx_lock+0xc0/0xc0 [ 663.339211][ T3146] ? process_one_work+0x426/0xa70 [ 663.344128][ T3146] blk_mq_run_work_fn+0x3b/0x40 [ 663.348870][ T3146] process_one_work+0x53b/0xa70 [ 663.353613][ T3146] ? pwq_dec_nr_in_flight+0x170/0x170 [ 663.358880][ T3146] worker_thread+0x63/0x5b0 [ 663.363277][ T3146] kthread+0x1df/0x200 [ 663.367233][ T3146] ? process_one_work+0xa70/0xa70 [ 663.372148][ T3146] ? kthread_park+0xc0/0xc0 [ 663.376543][ T3146] ret_from_fork+0x22/0x40 [ 663.380848][ T3146] warn_alloc_show_mem: 1 callbacks suppressed [ 663.380855][ T3123] CPU: 1 PID: 3123 Comm: kworker/1:1H Tainted: G W 5.2.0-next-20190711+ #3 [ 663.380857][ T3146] Mem-Info: [ 663.381000][ T3146] active_anon:4654271 inactive_anon:367023 isolated_anon:2263 [ 663.381000T3123] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 663.744691][ T3146] Node 0 Normal free:74264kB min:137264kB low:156156kB high:175048kB active_anon:1055816kB inactive_anon:209292kB active_file:19416kB inactive_file:18964kB unevictable:0kB writepending:248kB present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB bounce:0kB free_pcp:9356kB local_pcp:124kB free_cma:0kB [ 663.750101][ T3123] ? lock_downgrade+0x390/0x390 [ 663.778942][ T3146] lowmem_reserve[]: 0 0 0 0 0 [ 663.783688][ T3123] ? do_raw_spin_lock+0x118/0x1d0 [ 663.789326][ T3146] Node 4 Normal free:81632kB min:234904kB low:267232kB high:299560kB active_anon:16368972kB inactive_anon:898504kB active_file:20548kB inactive_file:32468kB unevictable:0kB writepending:104kB present:33538048kB managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB bounce:0kB free_pcp:11372kB local_pcp:160kB free_cma:0kB [ 663.794556][ T3123] ? rwlock_bug.part.0+0x60/0x60 [ 663.794563][ T3123] ? get_partial_node+0x48/0x540 [ 663.825936][ T3146] lowmem_reserve[]: 0 0 0 0 0 [ 663.830678][ T3123] #3 [ 664.269661][ T3202] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 664.278993][ T3202] Workqueue: kblockd blk_mq_run_work_fn [ 664.284453][ T3202] Call Trace: [ 664.287655][ T3202] dump_stack+0x62/0x9a [ 664.291721][ T3202] warn_alloc.cold.45+0x8a/0x12a [ 664.296577][ T3202] ? zone_watermark_ok_safe+0x1a0/0x1a0 [ 664.302044][ T3202] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 664.308564][ T3202] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 664.314996][ T3202] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 664.321420][ T3202] ? __isolate_free_page+0x390/0x390 [ 664.326613][ T3202] __alloc_pages_nodemask+0x1aab/0x1bc0 [ 664.332062][ T3202] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 664.337345][ T3202] ? stack_trace_save+0x87/0xb0 [ 664.342103][ T3202] ? freezing_slow_path.cold.1+0x35/0x35 [ 664.347647][ T3202] ? __kasan_kmalloc.part.0+0x81/0xc0 [ 664.352925][ T3202] ? __kasan_kmalloc.part.0+0x44/0xc0 [ 664.358204][ T3202] ? __kasan_kmalloc.constprop.1+0xac/0xc0 [ 664.363922][ hmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 664.759472][ T3127] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 664.759508][ T3127] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 664.785836][ T3202] Node 4 active_anon:15362196kB inactive_anon:1296156kB active_file:15052kB inactive_file:17752kB unevictable:0kB isolated(anon):66644kB isolated(file):112kB mapped:30596kB dirty:0kB writeback:3968kB shmem:1080kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14735360kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 664.789031][ T3127] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 664.793056][ T3202] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 664.819386][ T3127] ? __isolate_free_page+0x390/0x390 [ 664.819401][ T3127] __alloc_pages_nodemask+0x1aab/0x1bc0 [ 664.824245][ T3202] Node 6 active_anon7] map_sg+0x99/0x2f0 [ 665.159320][ T3202] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 665.191157][ T3127] ? kasan_check_read+0x11/0x20 [ 665.191176][ T3127] scsi_dma_map+0xc6/0x160 [ 665.195480][ T3202] Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 665.195490][ T3202] Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 665.200248][ T3127] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 665.204805][ T3202] 69668 total pagecache pages [ 665.209566][ T3127] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi] [ 665.213886][ T3202] 65404 pages in swap cache [ 665.228054][ T3127] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 665.228074][ T3127] ? scsi_init_io+0x102/0x150 [ 665.232285][ T3202] Swap cache stats: add 486050, delete 428240, find 59/149 [ 665.232294][ T3202] Free swap = 30975484kB [ 665.236832][ T3127] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod] [ 665.236858][ T3127] ? pqi_event_worker+0xdf0/0xdf0 [smar390 [ 665.806891][ T3141] ? lock_downgrade+0x390/0x390 [ 665.811664][ T3141] ? alloc_iova+0x33/0x210 [ 665.815987][ T3141] __slab_alloc+0x12/0x20 [ 665.820232][ T3141] ? __slab_alloc+0x12/0x20 [ 665.824654][ T3141] kmem_cache_alloc+0x32a/0x400 [ 665.829413][ T3141] ? kasan_check_read+0x11/0x20 [ 665.834179][ T3141] ? do_raw_spin_unlock+0xa8/0x140 [ 665.839221][ T3141] alloc_iova+0x33/0x210 [ 665.843369][ T3141] ? iova_rcache_get+0x1a1/0x300 [ 665.848225][ T3141] alloc_iova_fast+0x47/0xba [ 665.852736][ T3141] dma_ops_alloc_iova.isra.5+0x86/0xa0 [ 665.858122][ T3141] map_sg+0x99/0x2f0 [ 665.861957][ T3141] ? kasan_check_read+0x11/0x20 [ 665.866759][ T3141] scsi_dma_map+0xc6/0x160 [ 665.871098][ T3141] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] [ 665.878918][ T3141] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi] [ 665.885172][ T3141] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi] [ 665.891435][ T3141] ? scsi_init_io+0x102/0x150 [ 665.896103][ T3141] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod] [ 665.902619][ T3141] ? pqie:0kB unevictable:0kB writepending:0kB present:15996kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 666.300385][ T3141] lowmem_reserve[]: 0 1532 19982 19982 19982 [ 666.306395][ T3141] Node 0 DMA32 free:75568kB min:2676kB low:4244kB high:5812kB active_anon:749752kB inactive_anon:395332kB active_file:128kB inactive_file:168kB unevictable:0kB writepending:0kB present:1923080kB managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:28kB bounce:0kB free_pcp:55484kB local_pcp:248kB free_cma:0kB [ 666.335894][ T3141] lowmem_reserve[]: 0 0 18450 18450 18450 [ 666.341762][ T3141] Node 0 Normal free:52856kB min:52716kB low:71608kB high:90500kB active_anon:1127696kB inactive_anon:80184kB active_file:492kB inactive_file:656kB unevictable:0kB writepending:2208kB present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10372kB bounce:0kB free_pcp:12848kB local_pcp:36kB free_cma:0kB [ 666.372602][ T3141] lowmem_reserve[]: 0 0 0 0 0 [ 666.377419][ T3141] Node 4 Normal free:234488kB m[ 685.274656][ T3456] list_del corruption. prev->next should be ffffea0022b10098, but was 0000000000000000 [ 685.284254][ T3456] ------------[ cut here ]------------ [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: G W 5.2.0-next-20190711+ #3 [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082 [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX: ffffffffa2d5d708 [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888442bd380 [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09: ffffed1108857a70 [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12: 0000000000000000 [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15: ffffea0022b10098 [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000) knlGS:0000000000000000 [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4: 00000000001406a0 [ 685.414563][ T3456] Call Trace: [ 685.417736][ T3456] deferred_split_scan+0x337/0x740 [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10 [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0 [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40 [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0 [ 685.444071][ T3456] shrink_slab+0x253/0x440 [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110 [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20 [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260 [ 685.463555][ T3456] shrink_node+0x31e/0xa30 [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560 [ 685.473036][ T3456] ? ktime_get+0x93/0x110 [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820 [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30 [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20 [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0 [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0 [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820 [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20 [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0 [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30 [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390 [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0 [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160 [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0 [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30 [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 685.560796][ T3456] ? finish_fault+0x120/0x120 [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0 [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0 [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50 [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20 [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0 [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370 [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0 [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf [ 685.608892][ T3456] ? page_fault+0x5/0x20 [ 685.613019][ T3456] page_fault+0x1b/0x20 [ 685.617058][ T3456] RIP: 0033:0x410be0 [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f [ 68[ 687.120156][ T3456] Shutting down cpus with NMI [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-24 21:13 ` Qian Cai @ 2019-07-25 21:46 ` Yang Shi 2019-08-05 22:15 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-07-25 21:46 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/24/19 2:13 PM, Qian Cai wrote: > On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: >> Running LTP oom01 test case with swap triggers a crash below. Revert the >> series >> "Make deferred split shrinker memcg aware" [1] seems fix the issue. > You might want to look harder on this commit, as reverted it alone on the top of > 5.2.0-next-20190711 fixed the issue. > > aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] > > [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ > linux.alibaba.com/ This is the real meat of the patch series, which converted to memcg deferred split queue actually. > > > list_del corruption. prev->next should be ffffea0022b10098, but was > 0000000000000000 Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too. Actually, I found two issues with THP swap: 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override. 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list: A B deferred_split_scan list_move try_to_unmap list_add_tail list_splice <-- The list might get corrupted here free_transhuge_page list_del <-- kernel bug triggered I hope the below patch would solve your problem (tested locally). diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..d6612ec 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) VM_BUG_ON_PAGE(!PageTransHuge(page), page); + /* + * The try_to_unmap() in page reclaim path might reach here too, + * this may cause a race condition to corrupt deferred split queue. + * And, if page reclaim is already handling the same page, it is + * unnecessary to handle it again in shrinker. + * + * Check PageSwapCache to determine if the page is being + * handled by page reclaim since THP swap would add the page into + * swap cache before reaching try_to_unmap(). + */ + if (PageSwapCache(page)) + return; + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..40c684a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Is there need to periodically free_page_list? It would * appear not as the counts should be low */ - if (unlikely(PageTransHuge(page))) { - mem_cgroup_uncharge(page); + if (unlikely(PageTransHuge(page))) (*get_compound_page_dtor(page))(page); - } else + else list_add(&page->lru, &free_pages); continue; @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); - mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); spin_lock_irq(&pgdat->lru_lock); } else > [ 685.284254][ T3456] ------------[ cut here ]------------ > [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! > [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: > G W 5.2.0-next-20190711+ #3 > [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 > Gen10, BIOS A40 06/24/2019 > [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 > [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00 > 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f> > 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b > [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082 > [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX: > ffffffffa2d5d708 > [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI: > ffff8888442bd380 > [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09: > ffffed1108857a70 > [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12: > 0000000000000000 > [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15: > ffffea0022b10098 > [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000) > knlGS:0000000000000000 > [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4: > 00000000001406a0 > [ 685.414563][ T3456] Call Trace: > [ 685.417736][ T3456] deferred_split_scan+0x337/0x740 > [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10 > [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0 > [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40 > [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0 > [ 685.444071][ T3456] shrink_slab+0x253/0x440 > [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110 > [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20 > [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260 > [ 685.463555][ T3456] shrink_node+0x31e/0xa30 > [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560 > [ 685.473036][ T3456] ? ktime_get+0x93/0x110 > [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820 > [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30 > [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20 > [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0 > [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0 > [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820 > [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0 > [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20 > [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0 > [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30 > [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390 > [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0 > [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160 > [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0 > [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30 > [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490 > [ 685.560796][ T3456] ? finish_fault+0x120/0x120 > [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0 > [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0 > [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50 > [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20 > [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20 > [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0 > [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370 > [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0 > [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf > [ 685.608892][ T3456] ? page_fault+0x5/0x20 > [ 685.613019][ T3456] page_fault+0x1b/0x20 > [ 685.617058][ T3456] RIP: 0033:0x410be0 > [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 > 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> > 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f > [ 68[ 687.120156][ T3456] Shutting down cpus with NMI > [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-07-25 21:46 ` Yang Shi @ 2019-08-05 22:15 ` Yang Shi 2019-08-06 1:05 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-08-05 22:15 UTC (permalink / raw) To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel On 7/25/19 2:46 PM, Yang Shi wrote: > > > On 7/24/19 2:13 PM, Qian Cai wrote: >> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: >>> Running LTP oom01 test case with swap triggers a crash below. Revert >>> the >>> series >>> "Make deferred split shrinker memcg aware" [1] seems fix the issue. >> You might want to look harder on this commit, as reverted it alone on >> the top of >> 5.2.0-next-20190711 fixed the issue. >> >> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] >> >> [1] >> https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ >> linux.alibaba.com/ > > This is the real meat of the patch series, which converted to memcg > deferred split queue actually. > >> >> >> list_del corruption. prev->next should be ffffea0022b10098, but was >> 0000000000000000 > > Finally I could reproduce the list corruption issue on my machine with > THP swap (swap device is fast device). I should checked this with you > at the first place. The problem can't be reproduced with rotate swap > device. So, I'm supposed you were using THP swap too. > > Actually, I found two issues with THP swap: > 1. free_transhuge_page() is called in reclaim path instead of > put_page. The mem_cgroup_uncharge() is called before > free_transhuge_page() in reclaim path, which causes page->mem_cgroup > is NULL so the wrong deferred_split_queue would be used, so the THP > was not deleted from the memcg's list at all. Then the page might be > split or reused later, page->mapping would be override. > > 2. There is a race condition caused by try_to_unmap() with THP swap. > The try_to_unmap() just calls page_remove_rmap() to add THP to > deferred split queue in reclaim path. This might cause the below race > condition to corrupt the list: > > A B > deferred_split_scan > list_move > try_to_unmap > list_add_tail > > list_splice <-- The list might get corrupted here > > free_transhuge_page > list_del <-- > kernel bug triggered > > I hope the below patch would solve your problem (tested locally). Hi Qian, Did the below patch solve your problem? I would like the fold the fix into the series then target to 5.4 release. Thanks, Yang > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index b7f709d..d6612ec 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) > > VM_BUG_ON_PAGE(!PageTransHuge(page), page); > > + /* > + * The try_to_unmap() in page reclaim path might reach here too, > + * this may cause a race condition to corrupt deferred split > queue. > + * And, if page reclaim is already handling the same page, it is > + * unnecessary to handle it again in shrinker. > + * > + * Check PageSwapCache to determine if the page is being > + * handled by page reclaim since THP swap would add the page into > + * swap cache before reaching try_to_unmap(). > + */ > + if (PageSwapCache(page)) > + return; > + > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (list_empty(page_deferred_list(page))) { > count_vm_event(THP_DEFERRED_SPLIT_PAGE); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..40c684a 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct > list_head *page_list, > * Is there need to periodically free_page_list? It would > * appear not as the counts should be low > */ > - if (unlikely(PageTransHuge(page))) { > - mem_cgroup_uncharge(page); > + if (unlikely(PageTransHuge(page))) > (*get_compound_page_dtor(page))(page); > - } else > + else > list_add(&page->lru, &free_pages); > continue; > > @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack > move_pages_to_lru(struct lruvec *lruvec, > > if (unlikely(PageCompound(page))) { > spin_unlock_irq(&pgdat->lru_lock); > - mem_cgroup_uncharge(page); > (*get_compound_page_dtor(page))(page); > spin_lock_irq(&pgdat->lru_lock); > } else > >> [ 685.284254][ T3456] ------------[ cut here ]------------ >> [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! >> [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC >> KASAN NOPTI >> [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: >> G W 5.2.0-next-20190711+ #3 >> [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 >> Gen10/ProLiant DL385 >> Gen10, BIOS A40 06/24/2019 >> [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 >> [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b >> b8 01 00 00 >> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa >> bc ff <0f> >> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b >> [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082 >> [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX: >> ffffffffa2d5d708 >> [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI: >> ffff8888442bd380 >> [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09: >> ffffed1108857a70 >> [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12: >> 0000000000000000 >> [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15: >> ffffea0022b10098 >> [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) >> GS:ffff888844280000(0000) >> knlGS:0000000000000000 >> [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4: >> 00000000001406a0 >> [ 685.414563][ T3456] Call Trace: >> [ 685.417736][ T3456] deferred_split_scan+0x337/0x740 >> [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10 >> [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0 >> [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40 >> [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0 >> [ 685.444071][ T3456] shrink_slab+0x253/0x440 >> [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110 >> [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20 >> [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260 >> [ 685.463555][ T3456] shrink_node+0x31e/0xa30 >> [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560 >> [ 685.473036][ T3456] ? ktime_get+0x93/0x110 >> [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820 >> [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30 >> [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20 >> [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0 >> [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0 >> [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820 >> [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0 >> [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >> [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20 >> [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0 >> [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30 >> [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390 >> [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0 >> [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160 >> [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0 >> [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30 >> [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490 >> [ 685.560796][ T3456] ? finish_fault+0x120/0x120 >> [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0 >> [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0 >> [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50 >> [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20 >> [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20 >> [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0 >> [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370 >> [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0 >> [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf >> [ 685.608892][ T3456] ? page_fault+0x5/0x20 >> [ 685.613019][ T3456] page_fault+0x1b/0x20 >> [ 685.617058][ T3456] RIP: 0033:0x410be0 >> [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 >> 86 00 00 00 >> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 >> 98 90 <c6> >> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f >> [ 68[ 687.120156][ T3456] Shutting down cpus with NMI >> [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000 >> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal >> exception ]--- > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan() 2019-08-05 22:15 ` Yang Shi @ 2019-08-06 1:05 ` Qian Cai 0 siblings, 0 replies; 21+ messages in thread From: Qian Cai @ 2019-08-06 1:05 UTC (permalink / raw) To: Yang Shi Cc: Kirill A. Shutemov, Andrew Morton, Linux-MM, Linux List Kernel Mailing > On Aug 5, 2019, at 6:15 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote: > > > > On 7/25/19 2:46 PM, Yang Shi wrote: >> >> >> On 7/24/19 2:13 PM, Qian Cai wrote: >>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: >>>> Running LTP oom01 test case with swap triggers a crash below. Revert the >>>> series >>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue. >>> You might want to look harder on this commit, as reverted it alone on the top of >>> 5.2.0-next-20190711 fixed the issue. >>> >>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] >>> >>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ >>> linux.alibaba.com/ >> >> This is the real meat of the patch series, which converted to memcg deferred split queue actually. >> >>> >>> >>> list_del corruption. prev->next should be ffffea0022b10098, but was >>> 0000000000000000 >> >> Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too. >> >> Actually, I found two issues with THP swap: >> 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override. >> >> 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list: >> >> A B >> deferred_split_scan >> list_move >> try_to_unmap >> list_add_tail >> >> list_splice <-- The list might get corrupted here >> >> free_transhuge_page >> list_del <-- kernel bug triggered >> >> I hope the below patch would solve your problem (tested locally). > > Hi Qian, > > Did the below patch solve your problem? I would like the fold the fix into the series then target to 5.4 release. It is going to take a while before I would be able to access that system again. Since you can reproduce this and test yourself now, I’d say go ahead posting the patch. > > Thanks, > Yang > >> >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index b7f709d..d6612ec 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) >> >> VM_BUG_ON_PAGE(!PageTransHuge(page), page); >> >> + /* >> + * The try_to_unmap() in page reclaim path might reach here too, >> + * this may cause a race condition to corrupt deferred split queue. >> + * And, if page reclaim is already handling the same page, it is >> + * unnecessary to handle it again in shrinker. >> + * >> + * Check PageSwapCache to determine if the page is being >> + * handled by page reclaim since THP swap would add the page into >> + * swap cache before reaching try_to_unmap(). >> + */ >> + if (PageSwapCache(page)) >> + return; >> + >> spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> if (list_empty(page_deferred_list(page))) { >> count_vm_event(THP_DEFERRED_SPLIT_PAGE); >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index a0301ed..40c684a 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, >> * Is there need to periodically free_page_list? It would >> * appear not as the counts should be low >> */ >> - if (unlikely(PageTransHuge(page))) { >> - mem_cgroup_uncharge(page); >> + if (unlikely(PageTransHuge(page))) >> (*get_compound_page_dtor(page))(page); >> - } else >> + else >> list_add(&page->lru, &free_pages); >> continue; >> >> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, >> >> if (unlikely(PageCompound(page))) { >> spin_unlock_irq(&pgdat->lru_lock); >> - mem_cgroup_uncharge(page); >> (*get_compound_page_dtor(page))(page); >> spin_lock_irq(&pgdat->lru_lock); >> } else >> >>> [ 685.284254][ T3456] ------------[ cut here ]------------ >>> [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! >>> [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >>> [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: >>> G W 5.2.0-next-20190711+ #3 >>> [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 >>> Gen10, BIOS A40 06/24/2019 >>> [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 >>> [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00 >>> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f> >>> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b >>> [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082 >>> [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX: >>> ffffffffa2d5d708 >>> [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI: >>> ffff8888442bd380 >>> [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09: >>> ffffed1108857a70 >>> [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12: >>> 0000000000000000 >>> [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15: >>> ffffea0022b10098 >>> [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000) >>> knlGS:0000000000000000 >>> [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4: >>> 00000000001406a0 >>> [ 685.414563][ T3456] Call Trace: >>> [ 685.417736][ T3456] deferred_split_scan+0x337/0x740 >>> [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10 >>> [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0 >>> [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40 >>> [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0 >>> [ 685.444071][ T3456] shrink_slab+0x253/0x440 >>> [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110 >>> [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20 >>> [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260 >>> [ 685.463555][ T3456] shrink_node+0x31e/0xa30 >>> [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560 >>> [ 685.473036][ T3456] ? ktime_get+0x93/0x110 >>> [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820 >>> [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30 >>> [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20 >>> [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0 >>> [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0 >>> [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820 >>> [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0 >>> [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0 >>> [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20 >>> [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0 >>> [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30 >>> [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390 >>> [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0 >>> [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160 >>> [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0 >>> [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30 >>> [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490 >>> [ 685.560796][ T3456] ? finish_fault+0x120/0x120 >>> [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0 >>> [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0 >>> [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50 >>> [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20 >>> [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20 >>> [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0 >>> [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370 >>> [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0 >>> [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf >>> [ 685.608892][ T3456] ? page_fault+0x5/0x20 >>> [ 685.613019][ T3456] page_fault+0x1b/0x20 >>> [ 685.617058][ T3456] RIP: 0033:0x410be0 >>> [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 >>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6> >>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f >>> [ 68[ 687.120156][ T3456] Shutting down cpus with NMI >>> [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000 >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>> [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]--- >> > ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-08-06 1:05 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai 2019-07-11 0:16 ` Yang Shi 2019-07-11 21:07 ` Qian Cai 2019-07-12 19:12 ` Yang Shi 2019-07-13 4:41 ` Yang Shi 2019-07-15 21:23 ` Qian Cai 2019-07-16 0:22 ` Yang Shi 2019-07-16 1:36 ` Qian Cai 2019-07-16 3:00 ` Yang Shi 2019-07-16 23:36 ` Shakeel Butt 2019-07-17 0:12 ` Yang Shi 2019-07-17 17:02 ` Shakeel Butt 2019-07-17 17:09 ` Yang Shi 2019-07-19 0:54 ` Qian Cai 2019-07-19 0:59 ` Yang Shi 2019-07-24 18:10 ` Qian Cai 2019-07-15 4:52 ` Yang Shi 2019-07-24 21:13 ` Qian Cai 2019-07-25 21:46 ` Yang Shi 2019-08-05 22:15 ` Yang Shi 2019-08-06 1:05 ` Qian Cai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).