[1.] One line summary of the problem: zswap with z3fold makes swap stuck [2.] Full description of the problem/report: I've enabled zwswap using kernel parameters: zswap.enabled=1 zswap.zpool=z3fold When there is issue, every process using swapping is stuck. I can reproduce almost always in vanilla v5.3-rc4 running tool "stress", repeatedly. Issue starts with these messages: [ 41.818966] BUG: unable to handle page fault for address: fffff54cf8000028 [ 14.458709] general protection fault: 0000 [#1] SMP PTI [ 14.143173] kernel BUG at lib/list_debug.c:54! [ 127.971860] kernel BUG at include/linux/mm.h:607! [3.] Keywords (i.e., modules, networking, kernel): zswap z3fold swapping swap bisect [4.] Kernel information [4.1.] Kernel version (from /proc/version): $ cat /proc/version Linux version 5.3.0-rc4 (maage@workstation.lan) (gcc version 9.1.1 20190503 (Red Hat 9.1.1-1) (GCC)) #69 SMP Fri Aug 16 19:52:23 EEST 2019 [4.2.] Kernel .config file: Attached as config-5.3.0-rc4 My vanilla kernel config is based on Fedora kernel kernel config, but most drivers not used in testing machine disabled to speed up test builds. [5.] Most recent kernel version which did not have the bug: I'm able to reproduce the issue in vanilla v5.3-rc4 and what ever came as bad during git bisect from v5.1 (good) and v5.3-rc4 (bad). And I can also reproduce issue with some Fedora kernels, at least from 5.2.1-200.fc30.x86_64 on. About Fedora kernels: https://bugzilla.redhat.com/show_bug.cgi?id=1740690 Result from git bisect: 7c2b8baa61fe578af905342938ad12f8dbaeae79 is the first bad commit commit 7c2b8baa61fe578af905342938ad12f8dbaeae79 Author: Vitaly Wool Date: Mon May 13 17:22:49 2019 -0700 mm/z3fold.c: add structure for buddy handles For z3fold to be able to move its pages per request of the memory subsystem, it should not use direct object addresses in handles. Instead, it will create abstract handles (3 per page) which will contain pointers to z3fold objects. Thus, it will be possible to change these pointers when z3fold page is moved. Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com Signed-off-by: Vitaly Wool Cc: Bartlomiej Zolnierkiewicz Cc: Dan Streetman Cc: Krzysztof Kozlowski Cc: Oleksiy Avramchenko Cc: Uladzislau Rezki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds :040000 040000 1a27b311b3ad8556062e45fff84d46a57ba8a4b1 a79e463e14ab8ea271a89fb5f3069c3c84221478 M mm bisect run success [6.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/admin-guide/bug-hunting.rst) 1st Full dmesg attached: dmesg-5.3.0-rc4-1566111932.476354086.txt [ 105.710330] BUG: unable to handle page fault for address: ffffd2df8a000028 [ 105.714547] #PF: supervisor read access in kernel mode [ 105.717893] #PF: error_code(0x0000) - not-present page [ 105.721227] PGD 0 P4D 0 [ 105.722884] Oops: 0000 [#1] SMP PTI [ 105.725152] CPU: 0 PID: 1240 Comm: stress Not tainted 5.3.0-rc4 #69 [ 105.729219] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 105.734756] RIP: 0010:z3fold_zpool_map+0x52/0x110 [ 105.737801] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 70 eb e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 4e eb e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89 [ 105.749901] RSP: 0018:ffffa82d809a33f8 EFLAGS: 00010286 [ 105.753230] RAX: 0000000000000000 RBX: ffffd2df8a000000 RCX: 0000000000000000 [ 105.757754] RDX: 0000000080000000 RSI: ffff90edbab538d8 RDI: ffff90edb5fdd600 [ 105.762362] RBP: 0000000000000000 R08: ffff90edb5fdd600 R09: 0000000000000000 [ 105.766973] R10: 0000000000000003 R11: 0000000000000000 R12: ffff90edbab538d8 [ 105.771577] R13: ffff90edb5fdd6a0 R14: ffff90edb5fdd600 R15: ffffa82d809a3438 [ 105.776190] FS: 00007ff6a887b740(0000) GS:ffff90edbe400000(0000) knlGS:0000000000000000 [ 105.780549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 105.781436] CR2: ffffd2df8a000028 CR3: 0000000036fde006 CR4: 0000000000160ef0 [ 105.782365] Call Trace: [ 105.782668] zswap_writeback_entry+0x50/0x410 [ 105.783199] z3fold_zpool_shrink+0x4a6/0x540 [ 105.783717] zswap_frontswap_store+0x424/0x7c1 [ 105.784329] __frontswap_store+0xc4/0x162 [ 105.784815] swap_writepage+0x39/0x70 [ 105.785282] pageout.isra.0+0x12c/0x5d0 [ 105.785730] shrink_page_list+0x1124/0x1830 [ 105.786335] shrink_inactive_list+0x1da/0x460 [ 105.786882] ? lruvec_lru_size+0x10/0x130 [ 105.787472] shrink_node_memcg+0x202/0x770 [ 105.788011] ? sched_clock_cpu+0xc/0xc0 [ 105.788594] shrink_node+0xdc/0x4a0 [ 105.789012] do_try_to_free_pages+0xdb/0x3c0 [ 105.789528] try_to_free_pages+0x112/0x2e0 [ 105.790009] __alloc_pages_slowpath+0x422/0x1000 [ 105.790547] ? __lock_acquire+0x247/0x1900 [ 105.791040] __alloc_pages_nodemask+0x37f/0x400 [ 105.791580] alloc_pages_vma+0x79/0x1e0 [ 105.792064] __read_swap_cache_async+0x1ec/0x3e0 [ 105.792639] swap_cluster_readahead+0x184/0x330 [ 105.793194] ? find_held_lock+0x32/0x90 [ 105.793681] swapin_readahead+0x2b4/0x4e0 [ 105.794182] ? sched_clock_cpu+0xc/0xc0 [ 105.794668] do_swap_page+0x3ac/0xc30 [ 105.795658] __handle_mm_fault+0x8dd/0x1900 [ 105.796729] handle_mm_fault+0x159/0x340 [ 105.797723] do_user_addr_fault+0x1fe/0x480 [ 105.798736] do_page_fault+0x31/0x210 [ 105.799700] page_fault+0x3e/0x50 [ 105.800597] RIP: 0033:0x56076f49e298 [ 105.801561] Code: 7e 01 00 00 89 df e8 47 e1 ff ff 44 8b 2d 84 4d 00 00 4d 85 ff 7e 40 31 c0 eb 0f 0f 1f 80 00 00 00 00 4c 01 f0 49 39 c7 7e 2d <80> 7c 05 00 5a 4c 8d 54 05 00 74 ec 4c 89 14 24 45 85 ed 0f 89 de [ 105.804770] RSP: 002b:00007ffe5fc72e70 EFLAGS: 00010206 [ 105.805931] RAX: 00000000013ad000 RBX: ffffffffffffffff RCX: 00007ff6a8974156 [ 105.807300] RDX: 0000000000000000 RSI: 000000000b78d000 RDI: 0000000000000000 [ 105.808679] RBP: 00007ff69d0ee010 R08: 00007ff69d0ee010 R09: 0000000000000000 [ 105.810055] R10: 00007ff69e49a010 R11: 0000000000000246 R12: 000056076f4a0004 [ 105.811383] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000b78cc00 [ 105.812713] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon virtio_net net_failover intel_agp failover intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw agpgart virtio_blk virtio_console qemu_fw_cfg [ 105.821561] CR2: ffffd2df8a000028 [ 105.822552] ---[ end trace d5f24e2cb83a2b76 ]--- [ 105.823659] RIP: 0010:z3fold_zpool_map+0x52/0x110 [ 105.824785] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 70 eb e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 4e eb e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89 [ 105.828082] RSP: 0018:ffffa82d809a33f8 EFLAGS: 00010286 [ 105.829287] RAX: 0000000000000000 RBX: ffffd2df8a000000 RCX: 0000000000000000 [ 105.830713] RDX: 0000000080000000 RSI: ffff90edbab538d8 RDI: ffff90edb5fdd600 [ 105.832157] RBP: 0000000000000000 R08: ffff90edb5fdd600 R09: 0000000000000000 [ 105.833607] R10: 0000000000000003 R11: 0000000000000000 R12: ffff90edbab538d8 [ 105.835054] R13: ffff90edb5fdd6a0 R14: ffff90edb5fdd600 R15: ffffa82d809a3438 [ 105.836489] FS: 00007ff6a887b740(0000) GS:ffff90edbe400000(0000) knlGS:0000000000000000 [ 105.838103] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 105.839405] CR2: ffffd2df8a000028 CR3: 0000000036fde006 CR4: 0000000000160ef0 [ 105.840883] ------------[ cut here ]------------ (gdb) l *zswap_writeback_entry+0x50 0xffffffff812e8490 is in zswap_writeback_entry (/src/linux/mm/zswap.c:858). 853 .sync_mode = WB_SYNC_NONE, 854 }; 855 856 /* extract swpentry from data */ 857 zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); 858 swpentry = zhdr->swpentry; /* here */ 859 zpool_unmap_handle(pool, handle); 860 tree = zswap_trees[swp_type(swpentry)]; 861 offset = swp_offset(swpentry); (gdb) l *z3fold_zpool_map+0x52 0xffffffff81337b32 is in z3fold_zpool_map (/src/linux/arch/x86/include/asm/bitops.h:207). 202 return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr); 203 } 204 205 static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr) 206 { 207 return ((1UL << (nr & (BITS_PER_LONG-1))) & 208 (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; 209 } 210 211 static __always_inline bool variable_test_bit(long nr, volatile const unsigned long *addr) (gdb) l *z3fold_zpool_shrink+0x4a6 0xffffffff81338796 is in z3fold_zpool_shrink (/src/linux/mm/z3fold.c:1173). 1168 ret = pool->ops->evict(pool, first_handle); 1169 if (ret) 1170 goto next; 1171 } 1172 if (last_handle) { 1173 ret = pool->ops->evict(pool, last_handle); 1174 if (ret) 1175 goto next; 1176 } 1177 next: Because of test setup and swapping, usually ssh/shell etc are stuck and it is not possible to get dmesg of other situations. So I've used console logging. It misses other boot messages though. They should be about the same as 1st case. 2st console log attached: console-1566133726.340057021.log [ 14.324867] general protection fault: 0000 [#1] SMP PTI [ 14.330269] CPU: 1 PID: 150 Comm: kswapd0 Tainted: G W 5.3.0-rc4 #69 [ 14.331359] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 14.332511] RIP: 0010:handle_to_buddy+0x20/0x30 [ 14.333478] Code: 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb 83 e7 01 0f 85 01 26 00 00 48 8b 03 5b 48 89 c2 48 81 e2 00 f0 ff ff <0f> b6 92 ca 00 00 00 29 d0 83 e0 03 c3 0f 1f 00 0f 1f 44 00 00 55 [ 14.336310] RSP: 0000:ffffb6cc0019f820 EFLAGS: 00010206 [ 14.337112] RAX: 00ffff8b24c22ed0 RBX: fffff46a4008bb40 RCX: 0000000000000000 [ 14.338174] RDX: 00ffff8b24c22000 RSI: ffff8b24fe7d89c8 RDI: ffff8b24fe7d89c8 [ 14.339112] RBP: ffff8b24c22ed000 R08: ffff8b24fe7d89c8 R09: 0000000000000000 [ 14.340407] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b24c22ed001 [ 14.341445] R13: ffff8b24c22ed010 R14: ffff8b24f5f70a00 R15: ffffb6cc0019f868 [ 14.342439] FS: 0000000000000000(0000) GS:ffff8b24fe600000(0000) knlGS:0000000000000000 [ 14.343937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.344771] CR2: 00007f37563d4010 CR3: 0000000008212005 CR4: 0000000000160ee0 [ 14.345816] Call Trace: [ 14.346182] z3fold_zpool_map+0x76/0x110 [ 14.347111] zswap_writeback_entry+0x50/0x410 [ 14.347828] z3fold_zpool_shrink+0x3c4/0x540 [ 14.348457] zswap_frontswap_store+0x424/0x7c1 [ 14.349134] __frontswap_store+0xc4/0x162 [ 14.349746] swap_writepage+0x39/0x70 [ 14.350292] pageout.isra.0+0x12c/0x5d0 [ 14.350899] shrink_page_list+0x1124/0x1830 [ 14.351473] shrink_inactive_list+0x1da/0x460 [ 14.352068] shrink_node_memcg+0x202/0x770 [ 14.352697] shrink_node+0xdc/0x4a0 [ 14.353204] balance_pgdat+0x2e7/0x580 [ 14.353773] kswapd+0x239/0x500 [ 14.354241] ? finish_wait+0x90/0x90 [ 14.355003] kthread+0x108/0x140 [ 14.355619] ? balance_pgdat+0x580/0x580 [ 14.356216] ? kthread_park+0x80/0x80 [ 14.356782] ret_from_fork+0x3a/0x50 [ 14.357859] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net net_failover virtio_balloon failover intel_agp intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_blk virtio_console agpgart qemu_fw_cfg [ 14.369818] ---[ end trace 351ba6e5814522bd ]--- (gdb) l *z3fold_zpool_map+0x76 0xffffffff81337b56 is in z3fold_zpool_map (/src/linux/mm/z3fold.c:1239). 1234 if (test_bit(PAGE_HEADLESS, &page->private)) 1235 goto out; 1236 1237 z3fold_page_lock(zhdr); 1238 buddy = handle_to_buddy(handle); 1239 switch (buddy) { 1240 case FIRST: 1241 addr += ZHDR_SIZE_ALIGNED; 1242 break; 1243 case MIDDLE: (gdb) l *z3fold_zpool_shrink+0x3c4 0xffffffff813386b4 is in z3fold_zpool_shrink (/src/linux/mm/z3fold.c:1168). 1163 ret = pool->ops->evict(pool, middle_handle); 1164 if (ret) 1165 goto next; 1166 } 1167 if (first_handle) { 1168 ret = pool->ops->evict(pool, first_handle); 1169 if (ret) 1170 goto next; 1171 } 1172 if (last_handle) { (gdb) l *handle_to_buddy+0x20 0xffffffff81337550 is in handle_to_buddy (/src/linux/mm/z3fold.c:425). 420 unsigned long addr; 421 422 WARN_ON(handle & (1 << PAGE_HEADLESS)); 423 addr = *(unsigned long *)handle; 424 zhdr = (struct z3fold_header *)(addr & PAGE_MASK); 425 return (addr - zhdr->first_num) & BUDDY_MASK; 426 } 427 428 static inline struct z3fold_pool *zhdr_to_pool(struct z3fold_header *zhdr) 429 { 3st console log attached: console-1566146080.512045588.log [ 4180.615506] kernel BUG at lib/list_debug.c:54! [ 4180.617034] invalid opcode: 0000 [#1] SMP PTI [ 4180.618059] CPU: 3 PID: 2129 Comm: stress Tainted: G W 5.3.0-rc4 #69 [ 4180.619811] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 4180.621757] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55 [ 4180.623035] Code: c7 c7 20 fb 11 8f e8 55 7e bf ff 0f 0b 48 89 fe 48 c7 c7 b0 fb 11 8f e8 44 7e bf ff 0f 0b 48 c7 c7 60 fc 11 8f e8 36 7e bf ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 20 fc 11 8f e8 22 7e bf ff 0f 0b [ 4180.627262] RSP: 0000:ffffacfcc097f4c8 EFLAGS: 00010246 [ 4180.628459] RAX: 0000000000000054 RBX: ffff88a102053000 RCX: 0000000000000000 [ 4180.630077] RDX: 0000000000000000 RSI: ffff88a13bbd89c8 RDI: ffff88a13bbd89c8 [ 4180.631693] RBP: ffff88a102053000 R08: ffff88a13bbd89c8 R09: 0000000000000000 [ 4180.633271] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88a13098a200 [ 4180.634899] R13: ffff88a13098a208 R14: 0000000000000000 R15: ffff88a102053010 [ 4180.636539] FS: 00007f86b900e740(0000) GS:ffff88a13ba00000(0000) knlGS:0000000000000000 [ 4180.638394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4180.639733] CR2: 00007f86b1e1f010 CR3: 000000002f21e002 CR4: 0000000000160ee0 [ 4180.641383] Call Trace: [ 4180.641965] z3fold_zpool_malloc+0x106/0xa40 [ 4180.642965] zswap_frontswap_store+0x2e8/0x7c1 [ 4180.643978] __frontswap_store+0xc4/0x162 [ 4180.644875] swap_writepage+0x39/0x70 [ 4180.645695] pageout.isra.0+0x12c/0x5d0 [ 4180.646553] shrink_page_list+0x1124/0x1830 [ 4180.647538] shrink_inactive_list+0x1da/0x460 [ 4180.648564] shrink_node_memcg+0x202/0x770 [ 4180.649529] ? sched_clock_cpu+0xc/0xc0 [ 4180.650432] shrink_node+0xdc/0x4a0 [ 4180.651258] do_try_to_free_pages+0xdb/0x3c0 [ 4180.652261] try_to_free_pages+0x112/0x2e0 [ 4180.653217] __alloc_pages_slowpath+0x422/0x1000 [ 4180.654294] ? __lock_acquire+0x247/0x1900 [ 4180.655254] __alloc_pages_nodemask+0x37f/0x400 [ 4180.656312] alloc_pages_vma+0x79/0x1e0 [ 4180.657169] __read_swap_cache_async+0x1ec/0x3e0 [ 4180.658197] swap_cluster_readahead+0x184/0x330 [ 4180.659211] ? find_held_lock+0x32/0x90 [ 4180.660111] swapin_readahead+0x2b4/0x4e0 [ 4180.661046] ? sched_clock_cpu+0xc/0xc0 [ 4180.661949] do_swap_page+0x3ac/0xc30 [ 4180.662807] __handle_mm_fault+0x8dd/0x1900 [ 4180.663790] handle_mm_fault+0x159/0x340 [ 4180.664713] do_user_addr_fault+0x1fe/0x480 [ 4180.665691] do_page_fault+0x31/0x210 [ 4180.666552] page_fault+0x3e/0x50 [ 4180.667818] RIP: 0033:0x555b3127d298 [ 4180.669153] Code: 7e 01 00 00 89 df e8 47 e1 ff ff 44 8b 2d 84 4d 00 00 4d 85 ff 7e 40 31 c0 eb 0f 0f 1f 80 00 00 00 00 4c 01 f0 49 39 c7 7e 2d <80> 7c 05 00 5a 4c 8d 54 05 00 74 ec 4c 89 14 24 45 85 ed 0f 89 de [ 4180.676117] RSP: 002b:00007ffc7a9f9bf0 EFLAGS: 00010206 [ 4180.678515] RAX: 0000000000038000 RBX: ffffffffffffffff RCX: 00007f86b9107156 [ 4180.681657] RDX: 0000000000000000 RSI: 000000000b805000 RDI: 0000000000000000 [ 4180.684762] RBP: 00007f86ad809010 R08: 00007f86ad809010 R09: 0000000000000000 [ 4180.687846] R10: 00007f86ad840010 R11: 0000000000000246 R12: 0000555b3127f004 [ 4180.690919] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000b804000 [ 4180.693967] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net virtio_balloon net_failover intel_agp failover intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_blk virtio_console agpgart qemu_fw_cfg [ 4180.715768] ---[ end trace 6eab0ae003d4d2ea ]--- [ 4180.718021] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x55 [ 4180.720602] Code: c7 c7 20 fb 11 8f e8 55 7e bf ff 0f 0b 48 89 fe 48 c7 c7 b0 fb 11 8f e8 44 7e bf ff 0f 0b 48 c7 c7 60 fc 11 8f e8 36 7e bf ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 20 fc 11 8f e8 22 7e bf ff 0f 0b [ 4180.728474] RSP: 0000:ffffacfcc097f4c8 EFLAGS: 00010246 [ 4180.730969] RAX: 0000000000000054 RBX: ffff88a102053000 RCX: 0000000000000000 [ 4180.734130] RDX: 0000000000000000 RSI: ffff88a13bbd89c8 RDI: ffff88a13bbd89c8 [ 4180.737285] RBP: ffff88a102053000 R08: ffff88a13bbd89c8 R09: 0000000000000000 [ 4180.740442] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88a13098a200 [ 4180.743609] R13: ffff88a13098a208 R14: 0000000000000000 R15: ffff88a102053010 [ 4180.746774] FS: 00007f86b900e740(0000) GS:ffff88a13ba00000(0000) knlGS:0000000000000000 [ 4180.750294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4180.752986] CR2: 00007f86b1e1f010 CR3: 000000002f21e002 CR4: 0000000000160ee0 [ 4180.756176] ------------[ cut here ]------------ (gdb) l *z3fold_zpool_malloc+0x106 0xffffffff81338936 is in z3fold_zpool_malloc (/src/linux/include/linux/list.h:190). 185 * list_del_init - deletes entry from list and reinitialize it. 186 * @entry: the element to delete from the list. 187 */ 188 static inline void list_del_init(struct list_head *entry) 189 { 190 __list_del_entry(entry); 191 INIT_LIST_HEAD(entry); 192 } 193 194 /** (gdb) l *zswap_frontswap_store+0x2e8 0xffffffff812e8b38 is in zswap_frontswap_store (/src/linux/mm/zswap.c:1073). 1068 goto put_dstmem; 1069 } 1070 1071 /* store */ 1072 hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0; 1073 ret = zpool_malloc(entry->pool->zpool, hlen + dlen, 1074 __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM, 1075 &handle); 1076 if (ret == -ENOSPC) { 1077 zswap_reject_compress_poor++; 4th console log attached: console-1566151496.204958451.log [ 66.090333] BUG: unable to handle page fault for address: ffffeab2e2000028 [ 66.091245] #PF: supervisor read access in kernel mode [ 66.091904] #PF: error_code(0x0000) - not-present page [ 66.092552] PGD 0 P4D 0 [ 66.092885] Oops: 0000 [#1] SMP PTI [ 66.093332] CPU: 2 PID: 1193 Comm: stress Not tainted 5.3.0-rc4 #69 [ 66.094127] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 66.095204] RIP: 0010:z3fold_zpool_map+0x52/0x110 [ 66.095799] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 70 eb e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 4e eb e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89 [ 66.098132] RSP: 0000:ffffb7a2009375e8 EFLAGS: 00010286 [ 66.098792] RAX: 0000000000000000 RBX: ffffeab2e2000000 RCX: 0000000000000000 [ 66.099685] RDX: 0000000080000000 RSI: ffff9f67bb10e688 RDI: ffff9f67b39bca00 [ 66.100579] RBP: 0000000000000000 R08: ffff9f67b39bca00 R09: 0000000000000000 [ 66.101477] R10: 0000000000000003 R11: 0000000000000000 R12: ffff9f67bb10e688 [ 66.102367] R13: ffff9f67b39bcaa0 R14: ffff9f67b39bca00 R15: ffffb7a200937628 [ 66.103263] FS: 00007f33df62b740(0000) GS:ffff9f67be800000(0000) knlGS:0000000000000000 [ 66.104264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 66.104988] CR2: ffffeab2e2000028 CR3: 000000003798a001 CR4: 0000000000160ee0 [ 66.105878] Call Trace: [ 66.106202] zswap_writeback_entry+0x50/0x410 [ 66.106761] z3fold_zpool_shrink+0x29d/0x540 [ 66.107305] zswap_frontswap_store+0x424/0x7c1 [ 66.107870] __frontswap_store+0xc4/0x162 [ 66.108383] swap_writepage+0x39/0x70 [ 66.108847] pageout.isra.0+0x12c/0x5d0 [ 66.109340] shrink_page_list+0x1124/0x1830 [ 66.109872] shrink_inactive_list+0x1da/0x460 [ 66.110430] shrink_node_memcg+0x202/0x770 [ 66.110955] shrink_node+0xdc/0x4a0 [ 66.111403] do_try_to_free_pages+0xdb/0x3c0 [ 66.111946] try_to_free_pages+0x112/0x2e0 [ 66.112468] __alloc_pages_slowpath+0x422/0x1000 [ 66.113064] ? __lock_acquire+0x247/0x1900 [ 66.113596] __alloc_pages_nodemask+0x37f/0x400 [ 66.114179] alloc_pages_vma+0x79/0x1e0 [ 66.114675] __handle_mm_fault+0x99c/0x1900 [ 66.115218] handle_mm_fault+0x159/0x340 [ 66.115719] do_user_addr_fault+0x1fe/0x480 [ 66.116256] do_page_fault+0x31/0x210 [ 66.116730] page_fault+0x3e/0x50 [ 66.117168] RIP: 0033:0x556945873250 [ 66.117624] Code: 0f 84 88 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f f1 00 00 00 31 c0 4d 85 ff 7e 12 0f 1f 44 00 00 44 05 00 5a 4c 01 f0 49 39 c7 7f f3 48 85 db 0f 84 dd 01 00 00 [ 66.120514] RSP: 002b:00007fffa5fc06c0 EFLAGS: 00010206 [ 66.121722] RAX: 000000000a0ad000 RBX: ffffffffffffffff RCX: 00007f33df724156 [ 66.123171] RDX: 0000000000000000 RSI: 000000000b7a4000 RDI: 0000000000000000 [ 66.124616] RBP: 00007f33d3e87010 R08: 00007f33d3e87010 R09: 0000000000000000 [ 66.126064] R10: 0000000000000022 R11: 0000000000000246 R12: 0000556945875004 [ 66.127499] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000b7a3000 [ 66.128936] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon intel_agp virtio_net net_failover failover intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_blk virtio_console agpgart qemu_fw_cfg [ 66.138533] CR2: ffffeab2e2000028 [ 66.139562] ---[ end trace bfa9f40a545e4544 ]--- [ 66.140733] RIP: 0010:z3fold_zpool_map+0x52/0x110 [ 66.141886] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 70 eb e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 4e eb e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89 [ 66.145387] RSP: 0000:ffffb7a2009375e8 EFLAGS: 00010286 [ 66.146654] RAX: 0000000000000000 RBX: ffffeab2e2000000 RCX: 0000000000000000 [ 66.148137] RDX: 0000000080000000 RSI: ffff9f67bb10e688 RDI: ffff9f67b39bca00 [ 66.149626] RBP: 0000000000000000 R08: ffff9f67b39bca00 R09: 0000000000000000 [ 66.151128] R10: 0000000000000003 R11: 0000000000000000 R12: ffff9f67bb10e688 [ 66.152606] R13: ffff9f67b39bcaa0 R14: ffff9f67b39bca00 R15: ffffb7a200937628 [ 66.154076] FS: 00007f33df62b740(0000) GS:ffff9f67be800000(0000) knlGS:0000000000000000 [ 66.155695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 66.157020] CR2: ffffeab2e2000028 CR3: 000000003798a001 CR4: 0000000000160ee0 [ 66.158535] ------------[ cut here ]------------ (gdb) l *z3fold_zpool_shrink+0x29d 0xffffffff8133858d is in z3fold_zpool_shrink (/src/linux/mm/z3fold.c:1168). 1163 ret = pool->ops->evict(pool, middle_handle); 1164 if (ret) 1165 goto next; 1166 } 1167 if (first_handle) { 1168 ret = pool->ops->evict(pool, first_handle); 1169 if (ret) 1170 goto next; 1171 } 1172 if (last_handle) { 5th console log is: console-1566152424.019311951.log [ 22.529023] kernel BUG at include/linux/mm.h:607! [ 22.529092] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 22.531789] #PF: supervisor read access in kernel mode [ 22.532954] #PF: error_code(0x0000) - not-present page [ 22.533722] PGD 0 P4D 0 [ 22.534097] Oops: 0000 [#1] SMP PTI [ 22.534585] CPU: 0 PID: 186 Comm: kworker/u8:4 Not tainted 5.3.0-rc4 #69 [ 22.535488] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 22.536633] Workqueue: zswap1 compact_page_work [ 22.537263] RIP: 0010:__list_add_valid+0x3/0x40 [ 22.537868] Code: f4 ff ff ff e9 3a ff ff ff 49 c7 07 00 00 00 00 41 c7 47 08 00 00 00 00 e9 66 ff ff ff e8 15 f6 b6 ff 90 90 90 90 90 49 89 d0 <48> 8b 52 08 48 39 f2 0f 85 7c 00 00 00 4c 8b 0a 4d 39 c1 0f 85 98 [ 22.540322] RSP: 0000:ffffa073802cfdf8 EFLAGS: 00010206 [ 22.540953] RAX: 00000000000003c0 RBX: ffff8d69ad052000 RCX: 8888888888888889 [ 22.541838] RDX: 0000000000000000 RSI: ffffc0737f6012e8 RDI: ffff8d69ad052000 [ 22.542747] RBP: ffffc0737f6012e8 R08: 0000000000000000 R09: 0000000000000001 [ 22.543660] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 22.544614] R13: ffff8d69bd0dfc00 R14: ffff8d69bd0dfc08 R15: ffff8d69ad052010 [ 22.545578] FS: 0000000000000000(0000) GS:ffff8d69be400000(0000) knlGS:0000000000000000 [ 22.546662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.547452] CR2: 0000000000000008 CR3: 0000000035304001 CR4: 0000000000160ef0 [ 22.548488] Call Trace: [ 22.548845] do_compact_page+0x31e/0x430 [ 22.549406] process_one_work+0x272/0x5a0 [ 22.549972] worker_thread+0x50/0x3b0 [ 22.550488] kthread+0x108/0x140 [ 22.550939] ? process_one_work+0x5a0/0x5a0 [ 22.551531] ? kthread_park+0x80/0x80 [ 22.552034] ret_from_fork+0x3a/0x50 [ 22.552554] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon virtio_net net_failover intel_agp intel_gtt failover qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_console virtio_blk agpgart qemu_fw_cfg [ 22.559889] CR2: 0000000000000008 [ 22.560328] ---[ end trace cfa4596e38137687 ]--- [ 22.560330] invalid opcode: 0000 [#2] SMP PTI [ 22.560981] RIP: 0010:__list_add_valid+0x3/0x40 [ 22.561515] CPU: 2 PID: 1063 Comm: stress Tainted: G D 5.3.0-rc4 #69 [ 22.562143] Code: f4 ff ff ff e9 3a ff ff ff 49 c7 07 00 00 00 00 41 c7 47 08 00 00 00 00 e9 66 ff ff ff e8 15 f6 b6 ff 90 90 90 90 90 49 89 d0 <48> 8b 52 08 48 39 f2 0f 85 7c 00 00 00 4c 8b 0a 4d 39 c1 0f 85 98 [ 22.563034] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 22.565759] RSP: 0000:ffffa073802cfdf8 EFLAGS: 00010206 [ 22.565760] RAX: 00000000000003c0 RBX: ffff8d69ad052000 RCX: 8888888888888889 [ 22.565761] RDX: 0000000000000000 RSI: ffffc0737f6012e8 RDI: ffff8d69ad052000 [ 22.565761] RBP: ffffc0737f6012e8 R08: 0000000000000000 R09: 0000000000000001 [ 22.565762] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 22.565763] R13: ffff8d69bd0dfc00 R14: ffff8d69bd0dfc08 R15: ffff8d69ad052010 [ 22.565765] FS: 0000000000000000(0000) GS:ffff8d69be400000(0000) knlGS:0000000000000000 [ 22.565766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.565766] CR2: 0000000000000008 CR3: 0000000035304001 CR4: 0000000000160ef0 [ 22.565797] note: kworker/u8:4[186] exited with preempt_count 3 [ 22.581957] RIP: 0010:__free_pages+0x2d/0x30 [ 22.583146] Code: 00 00 8b 47 34 85 c0 74 15 f0 ff 4f 34 75 09 85 f6 75 06 e9 75 ff ff ff c3 e9 4f e2 ff ff 48 c7 c6 e8 8c 0a bb e8 d3 7f fd ff <0f> 0b 90 0f 1f 44 00 00 89 f1 41 bb 01 00 00 00 49 89 fa 41 d3 e3 [ 22.586649] RSP: 0018:ffffa073809ef4d0 EFLAGS: 00010246 [ 22.587963] RAX: 000000000000003e RBX: ffff8d6992d10000 RCX: 0000000000000006 [ 22.589579] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffbb0e5774 [ 22.591181] RBP: ffffd090004b4408 R08: 000000053ed5634a R09: 0000000000000000 [ 22.592781] R10: 0000000000000000 R11: 0000000000000000 R12: ffffd090004b4400 [ 22.594339] R13: ffff8d69bd0dfca0 R14: ffff8d69bd0dfc00 R15: ffff8d69bd0dfc08 [ 22.595832] FS: 00007f48316b7740(0000) GS:ffff8d69be800000(0000) knlGS:0000000000000000 [ 22.598649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.601196] CR2: 00007fbcae5049b0 CR3: 00000000352fe002 CR4: 0000000000160ee0 [ 22.603539] Call Trace: [ 22.605103] z3fold_zpool_shrink+0x25f/0x540 [ 22.607218] zswap_frontswap_store+0x424/0x7c1 [ 22.609115] __frontswap_store+0xc4/0x162 [ 22.610819] swap_writepage+0x39/0x70 [ 22.612525] pageout.isra.0+0x12c/0x5d0 [ 22.613957] shrink_page_list+0x1124/0x1830 [ 22.615130] shrink_inactive_list+0x1da/0x460 [ 22.616311] shrink_node_memcg+0x202/0x770 [ 22.617473] ? sched_clock_cpu+0xc/0xc0 [ 22.619145] shrink_node+0xdc/0x4a0 [ 22.620279] do_try_to_free_pages+0xdb/0x3c0 [ 22.621450] try_to_free_pages+0x112/0x2e0 [ 22.622582] __alloc_pages_slowpath+0x422/0x1000 [ 22.623749] ? __lock_acquire+0x247/0x1900 [ 22.624876] __alloc_pages_nodemask+0x37f/0x400 [ 22.626007] alloc_pages_vma+0x79/0x1e0 [ 22.627040] __read_swap_cache_async+0x1ec/0x3e0 [ 22.628143] swap_cluster_readahead+0x184/0x330 [ 22.629234] ? find_held_lock+0x32/0x90 [ 22.630292] swapin_readahead+0x2b4/0x4e0 [ 22.631370] ? sched_clock_cpu+0xc/0xc0 [ 22.632379] do_swap_page+0x3ac/0xc30 [ 22.633356] __handle_mm_fault+0x8dd/0x1900 [ 22.634373] handle_mm_fault+0x159/0x340 [ 22.635714] do_user_addr_fault+0x1fe/0x480 [ 22.636738] do_page_fault+0x31/0x210 [ 22.637674] page_fault+0x3e/0x50 [ 22.638559] RIP: 0033:0x562b503bd298 [ 22.639476] Code: 7e 01 00 00 89 df e8 47 e1 ff ff 44 8b 2d 84 4d 00 00 4d 85 ff 7e 40 31 c0 eb 0f 0f 1f 80 00 00 00 00 4c 01 f0 49 39 c7 7e 2d <80> 7c 05 00 5a 4c 8d 54 05 00 74 ec 4c 89 14 24 45 85 ed 0f 89 de [ 22.642658] RSP: 002b:00007ffd83e31e80 EFLAGS: 00010206 [ 22.643900] RAX: 0000000000f09000 RBX: ffffffffffffffff RCX: 00007f48317b0156 [ 22.645242] RDX: 0000000000000000 RSI: 000000000b276000 RDI: 0000000000000000 [ 22.646571] RBP: 00007f4826441010 R08: 00007f4826441010 R09: 0000000000000000 [ 22.647888] R10: 00007f4827349010 R11: 0000000000000246 R12: 0000562b503bf004 [ 22.649210] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000b275800 [ 22.650518] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon virtio_net net_failover intel_agp intel_gtt failover qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_console virtio_blk agpgart qemu_fw_cfg [ 22.659276] ---[ end trace cfa4596e38137688 ]--- [ 22.660398] RIP: 0010:__list_add_valid+0x3/0x40 [ 22.661493] Code: f4 ff ff ff e9 3a ff ff ff 49 c7 07 00 00 00 00 41 c7 47 08 00 00 00 00 e9 66 ff ff ff e8 15 f6 b6 ff 90 90 90 90 90 49 89 d0 <48> 8b 52 08 48 39 f2 0f 85 7c 00 00 00 4c 8b 0a 4d 39 c1 0f 85 98 [ 22.664800] RSP: 0000:ffffa073802cfdf8 EFLAGS: 00010206 [ 22.666779] RAX: 00000000000003c0 RBX: ffff8d69ad052000 RCX: 8888888888888889 [ 22.669830] RDX: 0000000000000000 RSI: ffffc0737f6012e8 RDI: ffff8d69ad052000 [ 22.672878] RBP: ffffc0737f6012e8 R08: 0000000000000000 R09: 0000000000000001 [ 22.675920] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 22.678966] R13: ffff8d69bd0dfc00 R14: ffff8d69bd0dfc08 R15: ffff8d69ad052010 [ 22.682014] FS: 00007f48316b7740(0000) GS:ffff8d69be800000(0000) knlGS:0000000000000000 [ 22.685399] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.687991] CR2: 00007fbcae5049b0 CR3: 00000000352fe002 CR4: 0000000000160ee0 [ 22.691068] ------------[ cut here ]------------ (gdb) l *__list_add_valid+0x3 0xffffffff81551b43 is in __list_add_valid (/srv/s_maage/pkg/linux/linux/lib/list_debug.c:23). 18 */ 19 20 bool __list_add_valid(struct list_head *new, struct list_head *prev, 21 struct list_head *next) 22 { 23 if (CHECK_DATA_CORRUPTION(next->prev != prev, 24 "list_add corruption. next->prev should be prev (%px), but was %px. (next=%px).\n", 25 prev, next->prev, next) || 26 CHECK_DATA_CORRUPTION(prev->next != next, 27 "list_add corruption. prev->next should be next (%px), but was %px. (prev=%px).\n", (gdb) l *do_compact_page+0x31e 0xffffffff813396fe is in do_compact_page (/srv/s_maage/pkg/linux/linux/include/linux/list.h:60). 55 */ 56 static inline void __list_add(struct list_head *new, 57 struct list_head *prev, 58 struct list_head *next) 59 { 60 if (!__list_add_valid(new, prev, next)) 61 return; 62 63 next->prev = new; 64 new->next = next; (gdb) l *z3fold_zpool_shrink+0x25f 0xffffffff8133854f is in z3fold_zpool_shrink (/srv/s_maage/pkg/linux/linux/arch/x86/include/asm/atomic64_64.h:102). 97 * 98 * Atomically decrements @v by 1. 99 */ 100 static __always_inline void arch_atomic64_dec(atomic64_t *v) 101 { 102 asm volatile(LOCK_PREFIX "decq %0" 103 : "=m" (v->counter) 104 : "m" (v->counter) : "memory"); 105 } 106 #define arch_atomic64_dec arch_atomic64_dec (gdb) l *zswap_frontswap_store+0x424 0xffffffff812e8c74 is in zswap_frontswap_store (/srv/s_maage/pkg/linux/linux/mm/zswap.c:955). 950 951 pool = zswap_pool_last_get(); 952 if (!pool) 953 return -ENOENT; 954 955 ret = zpool_shrink(pool->zpool, 1, NULL); 956 957 zswap_pool_put(pool); 958 959 return ret; [7.] A small shell script or example program which triggers the problem (if possible) for tmout in 10 10 10 20 20 20 30 120 $((3600/2)) 10; do stress --vm $(($(nproc)+2)) --vm-bytes $(($(awk '"'"'/MemAvail/{print $2}'"'"' /proc/meminfo)*1024/$(nproc))) --timeout '"$tmout" done [8.] Environment My test machine is Fedora 30 (minimal install) virtual machine running 4 vCPU and 1GiB RAM and 2GiB swap. Origninally I noticed the problem in other machines (Fedora 30). I guess any amount of memory pressure and zswap activation can cause problems. Test machine does only have whatever comes from install and whatever is enabled by default. Then I've also enabled serial console "console=tty0 console=ttyS0". Enabled passwordless sudo to help testing and then installed "stress." stress package version is stress-1.0.4-22.fc30 [8.1.] Software (add the output of the ver_linux script here) $ ./ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux localhost.localdomain 5.3.0-rc4 #69 SMP Fri Aug 16 19:52:23 EEST 2019 x86_64 x86_64 x86_64 GNU/Linux Util-linux 2.33.2 Mount 2.33.2 Module-init-tools 25 E2fsprogs 1.44.6 Linux C Library 2.29 Dynamic linker (ldd) 2.29 Linux C++ Library 6.0.26 Procps 3.3.15 Kbd 2.0.4 Console-tools 2.0.4 Sh-utils 8.31 Udev 241 Modules Loaded agpgart crc32c_intel crc32_pclmul crct10dif_pclmul drm drm_kms_helper failover fb_sys_fops ghash_clmulni_intel intel_agp intel_gtt ip6table_filter ip6table_mangle ip6table_nat ip6table_raw ip6_tables ip6table_security ip6t_REJECT ip6t_rpfilter ip_set iptable_filter iptable_mangle iptable_nat iptable_raw ip_tables iptable_security ipt_REJECT libcrc32c net_failover nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 nf_nat nfnetlink nf_reject_ipv4 nf_reject_ipv6 qemu_fw_cfg qxl serio_raw syscopyarea sysfillrect sysimgblt ttm virtio_balloon virtio_blk virtio_console virtio_net xt_conntrack [8.2.] Processor information (from /proc/cpuinfo): $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel Core Processor (Haswell, no TSX, IBRS) stepping : 1 microcode : 0x1 cpu MHz : 3198.099 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 6396.19 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel Core Processor (Haswell, no TSX, IBRS) stepping : 1 microcode : 0x1 cpu MHz : 3198.099 cache size : 16384 KB physical id : 1 siblings : 1 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 6468.62 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel Core Processor (Haswell, no TSX, IBRS) stepping : 1 microcode : 0x1 cpu MHz : 3198.099 cache size : 16384 KB physical id : 2 siblings : 1 core id : 0 cpu cores : 1 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 6627.92 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel Core Processor (Haswell, no TSX, IBRS) stepping : 1 microcode : 0x1 cpu MHz : 3198.099 cache size : 16384 KB physical id : 3 siblings : 1 core id : 0 cpu cores : 1 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 6662.16 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: [8.3.] Module information (from /proc/modules): $ cat /proc/modules ip6t_rpfilter 16384 1 - Live 0x0000000000000000 ip6t_REJECT 16384 2 - Live 0x0000000000000000 nf_reject_ipv6 20480 1 ip6t_REJECT, Live 0x0000000000000000 ipt_REJECT 16384 2 - Live 0x0000000000000000 nf_reject_ipv4 16384 1 ipt_REJECT, Live 0x0000000000000000 xt_conntrack 16384 13 - Live 0x0000000000000000 ip6table_nat 16384 1 - Live 0x0000000000000000 ip6table_mangle 16384 1 - Live 0x0000000000000000 ip6table_raw 16384 1 - Live 0x0000000000000000 ip6table_security 16384 1 - Live 0x0000000000000000 iptable_nat 16384 1 - Live 0x0000000000000000 nf_nat 126976 2 ip6table_nat,iptable_nat, Live 0x0000000000000000 iptable_mangle 16384 1 - Live 0x0000000000000000 iptable_raw 16384 1 - Live 0x0000000000000000 iptable_security 16384 1 - Live 0x0000000000000000 nf_conntrack 241664 2 xt_conntrack,nf_nat, Live 0x0000000000000000 nf_defrag_ipv6 24576 1 nf_conntrack, Live 0x0000000000000000 nf_defrag_ipv4 16384 1 nf_conntrack, Live 0x0000000000000000 libcrc32c 16384 2 nf_nat,nf_conntrack, Live 0x0000000000000000 ip_set 69632 0 - Live 0x0000000000000000 nfnetlink 20480 1 ip_set, Live 0x0000000000000000 ip6table_filter 16384 1 - Live 0x0000000000000000 ip6_tables 36864 7 ip6table_nat,ip6table_mangle,ip6table_raw,ip6table_security,ip6table_filter, Live 0x0000000000000000 iptable_filter 16384 1 - Live 0x0000000000000000 ip_tables 32768 5 iptable_nat,iptable_mangle,iptable_raw,iptable_security,iptable_filter, Live 0x0000000000000000 crct10dif_pclmul 16384 1 - Live 0x0000000000000000 crc32_pclmul 16384 0 - Live 0x0000000000000000 ghash_clmulni_intel 16384 0 - Live 0x0000000000000000 virtio_net 61440 0 - Live 0x0000000000000000 virtio_balloon 24576 0 - Live 0x0000000000000000 net_failover 24576 1 virtio_net, Live 0x0000000000000000 failover 16384 1 net_failover, Live 0x0000000000000000 intel_agp 24576 0 - Live 0x0000000000000000 intel_gtt 24576 1 intel_agp, Live 0x0000000000000000 qxl 77824 0 - Live 0x0000000000000000 drm_kms_helper 221184 3 qxl, Live 0x0000000000000000 syscopyarea 16384 1 drm_kms_helper, Live 0x0000000000000000 sysfillrect 16384 1 drm_kms_helper, Live 0x0000000000000000 sysimgblt 16384 1 drm_kms_helper, Live 0x0000000000000000 fb_sys_fops 16384 1 drm_kms_helper, Live 0x0000000000000000 ttm 126976 1 qxl, Live 0x0000000000000000 drm 602112 4 qxl,drm_kms_helper,ttm, Live 0x0000000000000000 crc32c_intel 24576 5 - Live 0x0000000000000000 serio_raw 20480 0 - Live 0x0000000000000000 virtio_blk 20480 3 - Live 0x0000000000000000 virtio_console 45056 0 - Live 0x0000000000000000 qemu_fw_cfg 20480 0 - Live 0x0000000000000000 agpgart 53248 4 intel_agp,intel_gtt,ttm,drm, Live 0x0000000000000000 [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) $ cat /proc/ioports 0000-0000 : PCI Bus 0000:00 0000-0000 : dma1 0000-0000 : pic1 0000-0000 : timer0 0000-0000 : timer1 0000-0000 : keyboard 0000-0000 : keyboard 0000-0000 : rtc0 0000-0000 : dma page reg 0000-0000 : pic2 0000-0000 : dma2 0000-0000 : fpu 0000-0000 : vga+ 0000-0000 : serial 0000-0000 : QEMU0002:00 0000-0000 : fw_cfg_io 0000-0000 : 0000:00:1f.0 0000-0000 : ACPI PM1a_EVT_BLK 0000-0000 : ACPI PM1a_CNT_BLK 0000-0000 : ACPI PM_TMR 0000-0000 : ACPI GPE0_BLK 0000-0000 : 0000:00:1f.3 0000-0000 : PCI conf1 0000-0000 : PCI Bus 0000:00 0000-0000 : PCI Bus 0000:01 0000-0000 : PCI Bus 0000:02 0000-0000 : PCI Bus 0000:03 0000-0000 : PCI Bus 0000:04 0000-0000 : PCI Bus 0000:05 0000-0000 : PCI Bus 0000:06 0000-0000 : PCI Bus 0000:07 0000-0000 : 0000:00:01.0 0000-0000 : 0000:00:1f.2 0000-0000 : ahci $ cat /proc/iomem 00000000-00000000 : Reserved 00000000-00000000 : System RAM 00000000-00000000 : Reserved 00000000-00000000 : PCI Bus 0000:00 00000000-00000000 : Video ROM 00000000-00000000 : Adapter ROM 00000000-00000000 : Adapter ROM 00000000-00000000 : Reserved 00000000-00000000 : System ROM 00000000-00000000 : System RAM 00000000-00000000 : Kernel code 00000000-00000000 : Kernel data 00000000-00000000 : Kernel bss 00000000-00000000 : Reserved 00000000-00000000 : PCI MMCONFIG 0000 [bus 00-ff] 00000000-00000000 : Reserved 00000000-00000000 : PCI Bus 0000:00 00000000-00000000 : 0000:00:01.0 00000000-00000000 : 0000:00:01.0 00000000-00000000 : PCI Bus 0000:07 00000000-00000000 : PCI Bus 0000:06 00000000-00000000 : PCI Bus 0000:05 00000000-00000000 : PCI Bus 0000:04 00000000-00000000 : 0000:04:00.0 00000000-00000000 : PCI Bus 0000:03 00000000-00000000 : 0000:03:00.0 00000000-00000000 : PCI Bus 0000:02 00000000-00000000 : 0000:02:00.0 00000000-00000000 : xhci-hcd 00000000-00000000 : PCI Bus 0000:01 00000000-00000000 : 0000:01:00.0 00000000-00000000 : 0000:01:00.0 00000000-00000000 : 0000:00:1b.0 00000000-00000000 : 0000:00:01.0 00000000-00000000 : 0000:00:02.0 00000000-00000000 : 0000:00:02.1 00000000-00000000 : 0000:00:02.2 00000000-00000000 : 0000:00:02.3 00000000-00000000 : 0000:00:02.4 00000000-00000000 : 0000:00:02.5 00000000-00000000 : 0000:00:02.6 00000000-00000000 : 0000:00:1f.2 00000000-00000000 : ahci 00000000-00000000 : PCI Bus 0000:07 00000000-00000000 : PCI Bus 0000:06 00000000-00000000 : 0000:06:00.0 00000000-00000000 : virtio-pci-modern 00000000-00000000 : PCI Bus 0000:05 00000000-00000000 : 0000:05:00.0 00000000-00000000 : virtio-pci-modern 00000000-00000000 : PCI Bus 0000:04 00000000-00000000 : 0000:04:00.0 00000000-00000000 : virtio-pci-modern 00000000-00000000 : PCI Bus 0000:03 00000000-00000000 : 0000:03:00.0 00000000-00000000 : virtio-pci-modern 00000000-00000000 : PCI Bus 0000:02 00000000-00000000 : PCI Bus 0000:01 00000000-00000000 : 0000:01:00.0 00000000-00000000 : virtio-pci-modern 00000000-00000000 : IOAPIC 0 00000000-00000000 : Reserved 00000000-00000000 : Local APIC 00000000-00000000 : Reserved 00000000-00000000 : Reserved 00000000-00000000 : PCI Bus 0000:00 [8.5.] PCI information ('lspci -vvv' as root) Attached as: lspci-vvv-5.3.0-rc4.txt [8.6.] SCSI information (from /proc/scsi/scsi) $ cat //proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: QEMU Model: QEMU DVD-ROM Rev: 2.5+ Type: CD-ROM ANSI SCSI revision: 05 [8.7.] Other information that might be relevant to the problem During testing it looks like this: $ egrep -r ^ /sys/module/zswap/parameters /sys/module/zswap/parameters/same_filled_pages_enabled:Y /sys/module/zswap/parameters/enabled:Y /sys/module/zswap/parameters/max_pool_percent:20 /sys/module/zswap/parameters/compressor:lzo /sys/module/zswap/parameters/zpool:z3fold $ cat /proc/meminfo MemTotal: 983056 kB MemFree: 377876 kB MemAvailable: 660820 kB Buffers: 14896 kB Cached: 368028 kB SwapCached: 0 kB Active: 247500 kB Inactive: 193120 kB Active(anon): 58016 kB Inactive(anon): 280 kB Active(file): 189484 kB Inactive(file): 192840 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree: 4194300 kB Dirty: 8 kB Writeback: 0 kB AnonPages: 57712 kB Mapped: 81984 kB Shmem: 596 kB KReclaimable: 56272 kB Slab: 128128 kB SReclaimable: 56272 kB SUnreclaim: 71856 kB KernelStack: 2208 kB PageTables: 1632 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4685828 kB Committed_AS: 268512 kB VmallocTotal: 34359738367 kB VmallocUsed: 9764 kB VmallocChunk: 0 kB Percpu: 9312 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 110452 kB DirectMap2M: 937984 kB DirectMap1G: 0 kB [9.] Other notes My workaround is to disable zswap: sudo bash -c 'echo 0 > /sys/module/zswap/parameters/enabled' Sometimes stress can die just because it is out of memory. Also some other programs might die because of page allocation failures etc. But that is not relevant here. Generally stress command is actually like: stress --vm 6 --vm-bytes 228608000 --timeout 10 It seems to be essential to start and stop stress runs. Sometimes problem does not trigger until much later. To be sure there is no problems I'd suggest running stress at least an hour (--timeout 3600) and also couple of hundred times with short timeout. I've used 90 minutes as mark of "good" run during bisect (start of). I'm not sure if this is only one issue here. I reboot machine with kernel under test. Run uname -r and collect boot logs using ssh. And then ssh in with test script. No other commands are run. Some timestamps of errors to give idea how log to wait for test to give results. Testing starts when machine has been up about 8 or 9 seconds. [ 13.805105] general protection fault: 0000 [#1] SMP PTI [ 14.059768] general protection fault: 0000 [#1] SMP PTI [ 14.324867] general protection fault: 0000 [#1] SMP PTI [ 14.458709] general protection fault: 0000 [#1] SMP PTI [ 41.818966] BUG: unable to handle page fault for address: fffff54cf8000028 [ 105.710330] BUG: unable to handle page fault for address: ffffd2df8a000028 [ 135.390332] BUG: unable to handle page fault for address: ffffe5a34a000028 [ 166.793041] BUG: unable to handle page fault for address: ffffd1be6f000028 [ 311.602285] BUG: unable to handle page fault for address: fffff7f409000028