* xfs_inode not reclaimed/memory leak on 5.2.16 @ 2019-09-30 7:28 Florian Weimer 2019-09-30 8:54 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Florian Weimer @ 2019-09-30 7:28 UTC (permalink / raw) To: linux-xfs; +Cc: linux-fsdevel Simply running “du -hc” on a large directory tree causes du to be killed because of kernel paging request failure in the XFS code. I ran slabtop, and it showed tons of xfs_inode objects. The system was rather unhappy after that, so I wasn't able to capture much more information. Is this a known issue on Linux 5.2? I don't see it with kernel 5.0.20. Those are plain upstream kernels built for x86-64, with no unusual config options (that I know of). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs_inode not reclaimed/memory leak on 5.2.16 2019-09-30 7:28 xfs_inode not reclaimed/memory leak on 5.2.16 Florian Weimer @ 2019-09-30 8:54 ` Dave Chinner 2019-09-30 19:07 ` Florian Weimer 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2019-09-30 8:54 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-xfs, linux-fsdevel On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: > Simply running “du -hc” on a large directory tree causes du to be > killed because of kernel paging request failure in the XFS code. dmesg output? if the system was still running, then you might be able to pull the trace from syslog. But we can't do much without knowing what the actual failure was.... FWIW, one of my regular test workloads is iterating a directory tree with 50 million inodes in several different ways to stress reclaim algorithms in ways that users do. I haven't seen issues with that test for a while, so it's not an obvious problem whatever you came across. > I ran slabtop, and it showed tons of xfs_inode objects. Sure, because your workload is iterating inodes. > The system was rather unhappy after that, so I wasn't able to capture > much more information. > > Is this a known issue on Linux 5.2? Not that I know of. > I don't see it with kernel > 5.0.20. Those are plain upstream kernels built for x86-64, with no > unusual config options (that I know of). We've had quite a few memory reclaim regressions in recent times that have displayed similar symptoms - XFS is often just the messenger because the inode cache is generating the memory pressure. e.g. the shrinker infrastructure was broken in 4.16 and then broken differently in 4.17 to try to fix it, and we didn't hear about them until about 4.18/4.19 when users started to trip over them. I fixed those problems in 5.0, but there's every chance that there have been new regressions since then. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs_inode not reclaimed/memory leak on 5.2.16 2019-09-30 8:54 ` Dave Chinner @ 2019-09-30 19:07 ` Florian Weimer 2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Florian Weimer @ 2019-09-30 19:07 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel * Dave Chinner: > On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >> Simply running “du -hc” on a large directory tree causes du to be >> killed because of kernel paging request failure in the XFS code. > > dmesg output? if the system was still running, then you might be > able to pull the trace from syslog. But we can't do much without > knowing what the actual failure was.... Huh. I actually have something in syslog: [ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 4001.238415] #PF: supervisor read access in kernel mode [ 4001.238417] #PF: error_code(0x0000) - not-present page [ 4001.238418] PGD 0 P4D 0 [ 4001.238420] Oops: 0000 [#1] SMP PTI [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1 [ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 [ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 [ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 [ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 [ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 [ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 [ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 [ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 [ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 [ 4001.238446] Call Trace: [ 4001.238450] __reset_isolation_suitable+0x9b/0x120 [ 4001.238453] reset_isolation_suitable+0x3b/0x40 [ 4001.238456] kswapd+0x98/0x300 [ 4001.238460] ? wait_woken+0x80/0x80 [ 4001.238463] kthread+0x114/0x130 [ 4001.238465] ? balance_pgdat+0x450/0x450 [ 4001.238467] ? kthread_park+0x80/0x80 [ 4001.238470] ret_from_fork+0x1f/0x30 [ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod [ 4001.238509] CR2: 0000000000000000 [ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]--- [ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 [ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 [ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 [ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 [ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 [ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 [ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 [ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 [ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 [ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 [ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8 [ 4001.709172] #PF: supervisor read access in kernel mode [ 4001.709173] #PF: error_code(0x0000) - not-present page [ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0 [ 4001.709177] Oops: 0000 [#2] SMP PTI [ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1 [ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 [ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130 [ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d [ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012 [ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000 [ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8 [ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638 [ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001 [ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000 [ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 [ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 [ 4001.709195] Call Trace: [ 4001.709198] __wake_up_common_lock+0x6c/0x90 [ 4001.709200] __wake_up+0xe/0x10 [ 4001.709203] wakeup_kswapd+0xf4/0x120 [ 4001.709206] get_page_from_freelist+0x52e/0xc80 [ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0 [ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80 [ 4001.709212] ? radix_tree_lookup+0xd/0x10 [ 4001.709215] ? kmem_cache_alloc+0x80/0xa0 [ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320 [ 4001.709219] xfs_buf_get_map+0xe8/0x190 [ 4001.709220] xfs_buf_read_map+0x25/0x100 [ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0 [ 4001.709225] xfs_imap_to_bp+0x53/0xa0 [ 4001.709226] xfs_iread+0x76/0x1b0 [ 4001.709229] xfs_iget+0x1e5/0x700 [ 4001.709231] xfs_lookup+0x63/0x90 [ 4001.709232] xfs_vn_lookup+0x47/0x80 [ 4001.709235] __lookup_slow+0x7f/0x120 [ 4001.709236] lookup_slow+0x35/0x50 [ 4001.709238] walk_component+0x193/0x2a0 [ 4001.709239] ? path_init+0x112/0x2f0 [ 4001.709240] path_lookupat.isra.16+0x5c/0x200 [ 4001.709242] filename_lookup.part.27+0x88/0x100 [ 4001.709243] ? xfs_ilock+0x39/0x90 [ 4001.709245] ? __check_object_size+0xf6/0x187 [ 4001.709248] ? strncpy_from_user+0x56/0x1c0 [ 4001.709249] user_path_at_empty+0x39/0x40 [ 4001.709250] vfs_statx+0x62/0xb0 [ 4001.709252] __se_sys_newfstatat+0x26/0x50 [ 4001.709254] __x64_sys_newfstatat+0x19/0x20 [ 4001.709255] do_syscall_64+0x4b/0x260 [ 4001.709257] ? page_fault+0x8/0x30 [ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4001.709261] RIP: 0033:0x7f0090c47e49 [ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00 [ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 [ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49 [ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004 [ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001 [ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0 [ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b [ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod [ 4001.709293] CR2: ffffc900003e7ec8 [ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]--- [ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 [ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 [ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 [ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 [ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 [ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 [ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 [ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 [ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 [ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 So XFS wasn't *that* unhappy if it could still write to the file system. > FWIW, one of my regular test workloads is iterating a directory tree > with 50 million inodes in several different ways to stress reclaim > algorithms in ways that users do. I haven't seen issues with that > test for a while, so it's not an obvious problem whatever you came > across. Right, I should have tried to reproduce it first. I actually can't. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-09-30 19:07 ` Florian Weimer @ 2019-09-30 21:17 ` Dave Chinner 2019-09-30 21:42 ` Florian Weimer 2019-10-01 9:10 ` Vlastimil Babka 0 siblings, 2 replies; 10+ messages in thread From: Dave Chinner @ 2019-09-30 21:17 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-xfs, linux-fsdevel, linux-mm On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: > * Dave Chinner: > > > On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: > >> Simply running “du -hc” on a large directory tree causes du to be > >> killed because of kernel paging request failure in the XFS code. > > > > dmesg output? if the system was still running, then you might be > > able to pull the trace from syslog. But we can't do much without > > knowing what the actual failure was.... > > Huh. I actually have something in syslog: > > [ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 4001.238415] #PF: supervisor read access in kernel mode > [ 4001.238417] #PF: error_code(0x0000) - not-present page > [ 4001.238418] PGD 0 P4D 0 > [ 4001.238420] Oops: 0000 [#1] SMP PTI > [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1 > [ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 > [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 That's memory compaction code it's crashed in. > [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 > [ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 > [ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 > [ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 > [ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 > [ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 > [ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 > [ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 > [ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 > [ 4001.238446] Call Trace: > [ 4001.238450] __reset_isolation_suitable+0x9b/0x120 > [ 4001.238453] reset_isolation_suitable+0x3b/0x40 > [ 4001.238456] kswapd+0x98/0x300 > [ 4001.238460] ? wait_woken+0x80/0x80 > [ 4001.238463] kthread+0x114/0x130 > [ 4001.238465] ? balance_pgdat+0x450/0x450 > [ 4001.238467] ? kthread_park+0x80/0x80 > [ 4001.238470] ret_from_fork+0x1f/0x30 Ok, so the memory compaction code has had a null pointer dereference which has killed kswapd. memory reclaim is going to have serious problems from this point on as kswapd does most of the reclaim. I have no idea why this might have happened - are they any other unexpected events or clues in the syslog that might point to a memory corruption or some sign of badness before this crash? > [ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod > [ 4001.238509] CR2: 0000000000000000 > [ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]--- > [ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 > [ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 > [ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 > [ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 > [ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 > [ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 > [ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 > [ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 > [ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 > [ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 > [ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8 > [ 4001.709172] #PF: supervisor read access in kernel mode > [ 4001.709173] #PF: error_code(0x0000) - not-present page > [ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0 > [ 4001.709177] Oops: 0000 [#2] SMP PTI > [ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1 > [ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 > [ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130 Then half a second later, the du process has crashed in __wake_up_common (core scheduler code).... > [ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d > [ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012 > [ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000 > [ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8 > [ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638 > [ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001 > [ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000 > [ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 > [ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 > [ 4001.709195] Call Trace: > [ 4001.709198] __wake_up_common_lock+0x6c/0x90 > [ 4001.709200] __wake_up+0xe/0x10 > [ 4001.709203] wakeup_kswapd+0xf4/0x120 ...trying to wake up kswapd. This may have crashed because the kswapd task has been killed and it hasn't been removed from the wait list and so there's a dead/freed task being woken. Regardless, this looks like a follow-on issue, not a root cause. > [ 4001.709206] get_page_from_freelist+0x52e/0xc80 > [ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0 > [ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80 > [ 4001.709212] ? radix_tree_lookup+0xd/0x10 > [ 4001.709215] ? kmem_cache_alloc+0x80/0xa0 > [ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320 > [ 4001.709219] xfs_buf_get_map+0xe8/0x190 > [ 4001.709220] xfs_buf_read_map+0x25/0x100 > [ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0 > [ 4001.709225] xfs_imap_to_bp+0x53/0xa0 > [ 4001.709226] xfs_iread+0x76/0x1b0 > [ 4001.709229] xfs_iget+0x1e5/0x700 > [ 4001.709231] xfs_lookup+0x63/0x90 > [ 4001.709232] xfs_vn_lookup+0x47/0x80 The XFS part of this is that it triggers the memory allocation that trips over the bad kswapd state, nothing else. IOWs, this doesn't look like an XFS problem at all, but more likely something going wrong with memory compaction or memory reclaim, so I'd suggest linux-mm@kvack.org [cc'd] is the first port of call for further triage. > [ 4001.709235] __lookup_slow+0x7f/0x120 > [ 4001.709236] lookup_slow+0x35/0x50 > [ 4001.709238] walk_component+0x193/0x2a0 > [ 4001.709239] ? path_init+0x112/0x2f0 > [ 4001.709240] path_lookupat.isra.16+0x5c/0x200 > [ 4001.709242] filename_lookup.part.27+0x88/0x100 > [ 4001.709243] ? xfs_ilock+0x39/0x90 > [ 4001.709245] ? __check_object_size+0xf6/0x187 > [ 4001.709248] ? strncpy_from_user+0x56/0x1c0 > [ 4001.709249] user_path_at_empty+0x39/0x40 > [ 4001.709250] vfs_statx+0x62/0xb0 > [ 4001.709252] __se_sys_newfstatat+0x26/0x50 > [ 4001.709254] __x64_sys_newfstatat+0x19/0x20 > [ 4001.709255] do_syscall_64+0x4b/0x260 > [ 4001.709257] ? page_fault+0x8/0x30 > [ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 4001.709261] RIP: 0033:0x7f0090c47e49 > [ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00 > [ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 > [ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49 > [ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004 > [ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001 > [ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0 > [ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b > [ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod > [ 4001.709293] CR2: ffffc900003e7ec8 > [ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]--- > [ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 > [ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 > [ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 > [ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 > [ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 > [ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 > [ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 > [ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 > [ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 > [ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 > > So XFS wasn't *that* unhappy if it could still write to the file > system. RIght, as long as it doesn't trip over any of the leaked state (e.g. locks) from the du process that was killed, it'll keep going as long as direct memory reclaim can keep reclaiming memory. > > > FWIW, one of my regular test workloads is iterating a directory tree > > with 50 million inodes in several different ways to stress reclaim > > algorithms in ways that users do. I haven't seen issues with that > > test for a while, so it's not an obvious problem whatever you came > > across. > > Right, I should have tried to reproduce it first. I actually can't. Not surprising, it has the smell of "random crash" to it. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner @ 2019-09-30 21:42 ` Florian Weimer 2019-10-01 9:10 ` Vlastimil Babka 1 sibling, 0 replies; 10+ messages in thread From: Florian Weimer @ 2019-09-30 21:42 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel, linux-mm * Dave Chinner: >> [ 4001.238446] Call Trace: >> [ 4001.238450] __reset_isolation_suitable+0x9b/0x120 >> [ 4001.238453] reset_isolation_suitable+0x3b/0x40 >> [ 4001.238456] kswapd+0x98/0x300 >> [ 4001.238460] ? wait_woken+0x80/0x80 >> [ 4001.238463] kthread+0x114/0x130 >> [ 4001.238465] ? balance_pgdat+0x450/0x450 >> [ 4001.238467] ? kthread_park+0x80/0x80 >> [ 4001.238470] ret_from_fork+0x1f/0x30 > > Ok, so the memory compaction code has had a null pointer dereference > which has killed kswapd. memory reclaim is going to have serious > problems from this point on as kswapd does most of the reclaim. Sorry, no. OpenVPN opened a tun device at the same time (same second), and udevd reacted to that, but that's it. I also double-checked, and there haven't been any recent previous occurrences of that crash. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner 2019-09-30 21:42 ` Florian Weimer @ 2019-10-01 9:10 ` Vlastimil Babka 2019-10-01 19:40 ` Florian Weimer 1 sibling, 1 reply; 10+ messages in thread From: Vlastimil Babka @ 2019-10-01 9:10 UTC (permalink / raw) To: Dave Chinner, Florian Weimer Cc: linux-xfs, linux-fsdevel, linux-mm, Mel Gorman On 9/30/19 11:17 PM, Dave Chinner wrote: > On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: >> * Dave Chinner: >> >>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >>>> Simply running “du -hc” on a large directory tree causes du to be >>>> killed because of kernel paging request failure in the XFS code. >>> >>> dmesg output? if the system was still running, then you might be >>> able to pull the trace from syslog. But we can't do much without >>> knowing what the actual failure was.... >> >> Huh. I actually have something in syslog: >> >> [ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> [ 4001.238415] #PF: supervisor read access in kernel mode >> [ 4001.238417] #PF: error_code(0x0000) - not-present page >> [ 4001.238418] PGD 0 P4D 0 >> [ 4001.238420] Oops: 0000 [#1] SMP PTI >> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1 >> [ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 >> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 > > That's memory compaction code it's crashed in. > >> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 Tried to decode it, but couldn't match it to source code, my version of compiled code is too different. Would it be possible to either send mm/compaction.o from the matching build, or output of 'objdump -d -l' for the __reset_isolation_pfn function? >> [ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 >> [ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 >> [ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 >> [ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 >> [ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 >> [ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 >> [ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 >> [ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 >> [ 4001.238446] Call Trace: >> [ 4001.238450] __reset_isolation_suitable+0x9b/0x120 >> [ 4001.238453] reset_isolation_suitable+0x3b/0x40 >> [ 4001.238456] kswapd+0x98/0x300 >> [ 4001.238460] ? wait_woken+0x80/0x80 >> [ 4001.238463] kthread+0x114/0x130 >> [ 4001.238465] ? balance_pgdat+0x450/0x450 >> [ 4001.238467] ? kthread_park+0x80/0x80 >> [ 4001.238470] ret_from_fork+0x1f/0x30 > > Ok, so the memory compaction code has had a null pointer dereference > which has killed kswapd. memory reclaim is going to have serious > problems from this point on as kswapd does most of the reclaim. > > I have no idea why this might have happened - are they any other > unexpected events or clues in the syslog that might point to a > memory corruption or some sign of badness before this crash? > >> [ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod >> [ 4001.238509] CR2: 0000000000000000 >> [ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]--- >> [ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >> [ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 >> [ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 >> [ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 >> [ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 >> [ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 >> [ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 >> [ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 >> [ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000 >> [ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0 >> [ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8 >> [ 4001.709172] #PF: supervisor read access in kernel mode >> [ 4001.709173] #PF: error_code(0x0000) - not-present page >> [ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0 >> [ 4001.709177] Oops: 0000 [#2] SMP PTI >> [ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1 >> [ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 >> [ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130 > > Then half a second later, the du process has crashed in > __wake_up_common (core scheduler code).... > >> [ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d >> [ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012 >> [ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000 >> [ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8 >> [ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638 >> [ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001 >> [ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000 >> [ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 >> [ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 >> [ 4001.709195] Call Trace: >> [ 4001.709198] __wake_up_common_lock+0x6c/0x90 >> [ 4001.709200] __wake_up+0xe/0x10 >> [ 4001.709203] wakeup_kswapd+0xf4/0x120 > > ...trying to wake up kswapd. This may have crashed because the > kswapd task has been killed and it hasn't been removed from > the wait list and so there's a dead/freed task being woken. > Regardless, this looks like a follow-on issue, not a root cause. > >> [ 4001.709206] get_page_from_freelist+0x52e/0xc80 >> [ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0 >> [ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80 >> [ 4001.709212] ? radix_tree_lookup+0xd/0x10 >> [ 4001.709215] ? kmem_cache_alloc+0x80/0xa0 >> [ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320 >> [ 4001.709219] xfs_buf_get_map+0xe8/0x190 >> [ 4001.709220] xfs_buf_read_map+0x25/0x100 >> [ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0 >> [ 4001.709225] xfs_imap_to_bp+0x53/0xa0 >> [ 4001.709226] xfs_iread+0x76/0x1b0 >> [ 4001.709229] xfs_iget+0x1e5/0x700 >> [ 4001.709231] xfs_lookup+0x63/0x90 >> [ 4001.709232] xfs_vn_lookup+0x47/0x80 > > The XFS part of this is that it triggers the memory allocation that > trips over the bad kswapd state, nothing else. > > IOWs, this doesn't look like an XFS problem at all, but more likely > something going wrong with memory compaction or memory reclaim, so > I'd suggest linux-mm@kvack.org [cc'd] is the first port of call for > further triage. > >> [ 4001.709235] __lookup_slow+0x7f/0x120 >> [ 4001.709236] lookup_slow+0x35/0x50 >> [ 4001.709238] walk_component+0x193/0x2a0 >> [ 4001.709239] ? path_init+0x112/0x2f0 >> [ 4001.709240] path_lookupat.isra.16+0x5c/0x200 >> [ 4001.709242] filename_lookup.part.27+0x88/0x100 >> [ 4001.709243] ? xfs_ilock+0x39/0x90 >> [ 4001.709245] ? __check_object_size+0xf6/0x187 >> [ 4001.709248] ? strncpy_from_user+0x56/0x1c0 >> [ 4001.709249] user_path_at_empty+0x39/0x40 >> [ 4001.709250] vfs_statx+0x62/0xb0 >> [ 4001.709252] __se_sys_newfstatat+0x26/0x50 >> [ 4001.709254] __x64_sys_newfstatat+0x19/0x20 >> [ 4001.709255] do_syscall_64+0x4b/0x260 >> [ 4001.709257] ? page_fault+0x8/0x30 >> [ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 4001.709261] RIP: 0033:0x7f0090c47e49 >> [ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00 >> [ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 >> [ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49 >> [ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004 >> [ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001 >> [ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0 >> [ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b >> [ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod >> [ 4001.709293] CR2: ffffc900003e7ec8 >> [ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]--- >> [ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >> [ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 >> [ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246 >> [ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000 >> [ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000 >> [ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000 >> [ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000 >> [ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001 >> [ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000 >> [ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0 >> >> So XFS wasn't *that* unhappy if it could still write to the file >> system. > > RIght, as long as it doesn't trip over any of the leaked state (e.g. > locks) from the du process that was killed, it'll keep going as long > as direct memory reclaim can keep reclaiming memory. > >> >>> FWIW, one of my regular test workloads is iterating a directory tree >>> with 50 million inodes in several different ways to stress reclaim >>> algorithms in ways that users do. I haven't seen issues with that >>> test for a while, so it's not an obvious problem whatever you came >>> across. >> >> Right, I should have tried to reproduce it first. I actually can't. > > Not surprising, it has the smell of "random crash" to it. > > Cheers, > > Dave. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-10-01 9:10 ` Vlastimil Babka @ 2019-10-01 19:40 ` Florian Weimer 2019-10-07 13:28 ` Vlastimil Babka 0 siblings, 1 reply; 10+ messages in thread From: Florian Weimer @ 2019-10-01 19:40 UTC (permalink / raw) To: Vlastimil Babka Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm, Mel Gorman * Vlastimil Babka: > On 9/30/19 11:17 PM, Dave Chinner wrote: >> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: >>> * Dave Chinner: >>> >>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >>>>> Simply running “du -hc” on a large directory tree causes du to be >>>>> killed because of kernel paging request failure in the XFS code. >>>> >>>> dmesg output? if the system was still running, then you might be >>>> able to pull the trace from syslog. But we can't do much without >>>> knowing what the actual failure was.... >>> >>> Huh. I actually have something in syslog: >>> >>> [ 4001.238411] BUG: kernel NULL pointer dereference, address: >>> 0000000000000000 >>> [ 4001.238415] #PF: supervisor read access in kernel mode >>> [ 4001.238417] #PF: error_code(0x0000) - not-present page >>> [ 4001.238418] PGD 0 P4D 0 >>> [ 4001.238420] Oops: 0000 [#1] SMP PTI >>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ >>> #1 >>> [ 4001.238424] Hardware name: System manufacturer System Product >>> Name/P6X58D-E, BIOS 0701 05/10/2011 >>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >> >> That's memory compaction code it's crashed in. >> >>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 >>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 >>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 >>> 00 00 4c 89 f7 > > Tried to decode it, but couldn't match it to source code, my version of > compiled code is too different. Would it be possible to either send > mm/compaction.o from the matching build, or output of 'objdump -d -l' > for the __reset_isolation_pfn function? See below. I don't have debuginfo for this build, and the binary does not reproduce for some reason. Due to the heavy inlining, it might be quite hard to figure out what's going on. I've switched to kernel builds with debuginfo from now on. I'm surprised that it's not the default. 0000000000000120 <__reset_isolation_pfn>: __reset_isolation_pfn(): 120: 48 89 f0 mov %rsi,%rax 123: 48 c1 e8 0f shr $0xf,%rax 127: 48 3d ff ff 07 00 cmp $0x7ffff,%rax 12d: 0f 87 83 00 00 00 ja 1b6 <__reset_isolation_pfn+0x96> 133: 4c 8b 0d 00 00 00 00 mov 0x0(%rip),%r9 # 13a <__reset_isolation_pfn+0x1a> 136: R_X86_64_PC32 mem_section-0x4 13a: 4d 85 c9 test %r9,%r9 13d: 74 77 je 1b6 <__reset_isolation_pfn+0x96> 13f: 49 89 f2 mov %rsi,%r10 142: 49 c1 ea 17 shr $0x17,%r10 146: 4f 8b 0c d1 mov (%r9,%r10,8),%r9 14a: 4d 85 c9 test %r9,%r9 14d: 74 67 je 1b6 <__reset_isolation_pfn+0x96> 14f: 0f b6 c0 movzbl %al,%eax 152: 48 c1 e0 04 shl $0x4,%rax 156: 4c 01 c8 add %r9,%rax 159: 74 5b je 1b6 <__reset_isolation_pfn+0x96> 15b: 4c 8b 08 mov (%rax),%r9 15e: 41 f6 c1 02 test $0x2,%r9b 162: 74 52 je 1b6 <__reset_isolation_pfn+0x96> 164: 48 6b c6 38 imul $0x38,%rsi,%rax 168: 55 push %rbp 169: 49 83 e1 f8 and $0xfffffffffffffff8,%r9 16d: 48 89 e5 mov %rsp,%rbp 170: 41 57 push %r15 172: 41 56 push %r14 174: 4d 89 ce mov %r9,%r14 177: 41 55 push %r13 179: 41 54 push %r12 17b: 53 push %rbx 17c: 48 83 ec 10 sub $0x10,%rsp 180: 49 01 c6 add %rax,%r14 183: 74 1c je 1a1 <__reset_isolation_pfn+0x81> 185: 49 8b 06 mov (%r14),%rax 188: 48 c1 e8 2b shr $0x2b,%rax 18c: 83 e0 03 and $0x3,%eax 18f: 48 69 c0 80 05 00 00 imul $0x580,%rax,%rax 196: 48 05 00 00 00 00 add $0x0,%rax 198: R_X86_64_32S contig_page_data 19c: 48 39 c7 cmp %rax,%rdi 19f: 74 1c je 1bd <__reset_isolation_pfn+0x9d> 1a1: 45 31 d2 xor %r10d,%r10d 1a4: 48 83 c4 10 add $0x10,%rsp 1a8: 44 89 d0 mov %r10d,%eax 1ab: 5b pop %rbx 1ac: 41 5c pop %r12 1ae: 41 5d pop %r13 1b0: 41 5e pop %r14 1b2: 41 5f pop %r15 1b4: 5d pop %rbp 1b5: c3 retq 1b6: 45 31 d2 xor %r10d,%r10d 1b9: 44 89 d0 mov %r10d,%eax 1bc: c3 retq 1bd: 48 89 7d d0 mov %rdi,-0x30(%rbp) 1c1: 41 89 cc mov %ecx,%r12d 1c4: 41 89 cd mov %ecx,%r13d 1c7: 4c 89 f7 mov %r14,%rdi 1ca: 89 d1 mov %edx,%ecx 1cc: 41 89 d7 mov %edx,%r15d 1cf: 48 89 f3 mov %rsi,%rbx 1d2: 89 4d cc mov %ecx,-0x34(%rbp) 1d5: e8 46 fe ff ff callq 20 <pageblock_skip_persistent> 1da: 84 c0 test %al,%al 1dc: 75 c3 jne 1a1 <__reset_isolation_pfn+0x81> 1de: 45 89 fa mov %r15d,%r10d 1e1: 45 20 e2 and %r12b,%r10b 1e4: 0f 85 f3 01 00 00 jne 3dd <__reset_isolation_pfn+0x2bd> 1ea: 80 7d cc 01 cmpb $0x1,-0x34(%rbp) 1ee: 74 6d je 25d <__reset_isolation_pfn+0x13d> 1f0: 45 84 e4 test %r12b,%r12b 1f3: 74 68 je 25d <__reset_isolation_pfn+0x13d> 1f5: 49 8b 0e mov (%r14),%rcx 1f8: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 1ff <__reset_isolation_pfn+0xdf> 1fb: R_X86_64_PC32 mem_section-0x4 1ff: 48 89 ca mov %rcx,%rdx 202: 48 c1 ea 2d shr $0x2d,%rdx 206: 48 85 c0 test %rax,%rax 209: 74 17 je 222 <__reset_isolation_pfn+0x102> 20b: 48 c1 e9 35 shr $0x35,%rcx 20f: 48 8b 04 c8 mov (%rax,%rcx,8),%rax 213: 48 85 c0 test %rax,%rax 216: 74 0a je 222 <__reset_isolation_pfn+0x102> 218: 0f b6 d2 movzbl %dl,%edx 21b: 48 c1 e2 04 shl $0x4,%rdx 21f: 48 01 d0 add %rdx,%rax 222: 48 8b 00 mov (%rax),%rax 225: 4c 89 f6 mov %r14,%rsi 228: b9 07 00 00 00 mov $0x7,%ecx 22d: ba 02 00 00 00 mov $0x2,%edx 232: 4c 89 f7 mov %r14,%rdi 235: 48 83 e0 f8 and $0xfffffffffffffff8,%rax 239: 48 29 c6 sub %rax,%rsi 23c: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax 243: db b6 6d 246: 48 c1 fe 03 sar $0x3,%rsi 24a: 48 0f af f0 imul %rax,%rsi 24e: e8 00 00 00 00 callq 253 <__reset_isolation_pfn+0x133> 24f: R_X86_64_PLT32 get_pfnblock_flags_mask-0x4 253: 48 83 f8 01 cmp $0x1,%rax 257: 0f 85 44 ff ff ff jne 1a1 <__reset_isolation_pfn+0x81> 25d: 48 8b 7d d0 mov -0x30(%rbp),%rdi 261: 48 81 e3 00 fe ff ff and $0xfffffffffffffe00,%rbx 268: 48 8b 47 58 mov 0x58(%rdi),%rax 26c: 48 39 c3 cmp %rax,%rbx 26f: 48 89 c1 mov %rax,%rcx 272: 48 0f 43 cb cmovae %rbx,%rcx 276: 48 03 47 68 add 0x68(%rdi),%rax 27a: 48 81 c3 00 02 00 00 add $0x200,%rbx 281: 48 89 ca mov %rcx,%rdx 284: 48 c1 ea 0f shr $0xf,%rdx 288: 48 83 e8 01 sub $0x1,%rax 28c: 48 39 d8 cmp %rbx,%rax 28f: 48 0f 47 c3 cmova %rbx,%rax 293: 48 89 c6 mov %rax,%rsi 296: 48 c1 ee 0f shr $0xf,%rsi 29a: 48 81 fa ff ff 07 00 cmp $0x7ffff,%rdx 2a1: 0f 87 ab 01 00 00 ja 452 <__reset_isolation_pfn+0x332> 2a7: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 2ae <__reset_isolation_pfn+0x18e> 2aa: R_X86_64_PC32 mem_section-0x4 2ae: 48 85 ff test %rdi,%rdi 2b1: 0f 84 ea fe ff ff je 1a1 <__reset_isolation_pfn+0x81> 2b7: 49 89 ca mov %rcx,%r10 2ba: 49 c1 ea 17 shr $0x17,%r10 2be: 4e 8b 14 d7 mov (%rdi,%r10,8),%r10 2c2: 4d 85 d2 test %r10,%r10 2c5: 74 23 je 2ea <__reset_isolation_pfn+0x1ca> 2c7: 0f b6 d2 movzbl %dl,%edx 2ca: 48 c1 e2 04 shl $0x4,%rdx 2ce: 4c 01 d2 add %r10,%rdx 2d1: 74 17 je 2ea <__reset_isolation_pfn+0x1ca> 2d3: 48 8b 12 mov (%rdx),%rdx 2d6: f6 c2 02 test $0x2,%dl 2d9: 74 0f je 2ea <__reset_isolation_pfn+0x1ca> 2db: 48 6b c9 38 imul $0x38,%rcx,%rcx 2df: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx 2e3: 48 01 ca add %rcx,%rdx 2e6: 4c 0f 45 f2 cmovne %rdx,%r14 2ea: 48 81 fe 00 00 08 00 cmp $0x80000,%rsi 2f1: 0f 84 aa fe ff ff je 1a1 <__reset_isolation_pfn+0x81> 2f7: 48 89 c2 mov %rax,%rdx 2fa: 48 c1 ea 17 shr $0x17,%rdx 2fe: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx 302: 48 85 d2 test %rdx,%rdx 305: 0f 84 96 fe ff ff je 1a1 <__reset_isolation_pfn+0x81> 30b: 40 0f b6 f6 movzbl %sil,%esi 30f: 48 c1 e6 04 shl $0x4,%rsi 313: 48 01 f2 add %rsi,%rdx 316: 0f 84 85 fe ff ff je 1a1 <__reset_isolation_pfn+0x81> 31c: 48 8b 12 mov (%rdx),%rdx 31f: f6 c2 02 test $0x2,%dl 322: 0f 84 79 fe ff ff je 1a1 <__reset_isolation_pfn+0x81> 328: 48 6b c0 38 imul $0x38,%rax,%rax 32c: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx 330: 48 01 c2 add %rax,%rdx 333: 75 2e jne 363 <__reset_isolation_pfn+0x243> 335: e9 67 fe ff ff jmpq 1a1 <__reset_isolation_pfn+0x81> 33a: 45 84 e4 test %r12b,%r12b 33d: 74 14 je 353 <__reset_isolation_pfn+0x233> 33f: 41 8b 46 30 mov 0x30(%r14),%eax 343: 25 80 00 00 f0 and $0xf0000080,%eax 348: 3d 00 00 00 f0 cmp $0xf0000000,%eax 34d: 0f 84 21 01 00 00 je 474 <__reset_isolation_pfn+0x354> 353: 49 81 c6 c0 01 00 00 add $0x1c0,%r14 35a: 4c 39 f2 cmp %r14,%rdx 35d: 0f 86 3e fe ff ff jbe 1a1 <__reset_isolation_pfn+0x81> 363: 45 84 ff test %r15b,%r15b 366: 74 d2 je 33a <__reset_isolation_pfn+0x21a> 368: 49 8b 4e 08 mov 0x8(%r14),%rcx 36c: 48 8d 41 ff lea -0x1(%rcx),%rax 370: 83 e1 01 and $0x1,%ecx 373: 49 0f 44 c6 cmove %r14,%rax 377: 48 8b 00 mov (%rax),%rax 37a: a8 10 test $0x10,%al 37c: 74 bc je 33a <__reset_isolation_pfn+0x21a> 37e: 49 8b 16 mov (%r14),%rdx 381: 48 89 d0 mov %rdx,%rax 384: 48 c1 ea 35 shr $0x35,%rdx 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx 38c: 48 c1 e8 2d shr $0x2d,%rax 390: 48 85 d2 test %rdx,%rdx 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> 395: 0f b6 c0 movzbl %al,%eax 398: 48 c1 e0 04 shl $0x4,%rax 39c: 48 01 c2 add %rax,%rdx 39f: 48 8b 02 mov (%rdx),%rax 3a2: 4c 89 f2 mov %r14,%rdx 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d 3ab: 31 f6 xor %esi,%esi 3ad: b9 03 00 00 00 mov $0x3,%ecx 3b2: 4c 89 f7 mov %r14,%rdi 3b5: 48 83 e0 f8 and $0xfffffffffffffff8,%rax 3b9: 48 29 c2 sub %rax,%rdx 3bc: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax 3c3: db b6 6d 3c6: 48 c1 fa 03 sar $0x3,%rdx 3ca: 48 0f af d0 imul %rax,%rdx 3ce: e8 00 00 00 00 callq 3d3 <__reset_isolation_pfn+0x2b3> 3cf: R_X86_64_PLT32 set_pfnblock_flags_mask-0x4 3d3: 44 0f b6 55 cc movzbl -0x34(%rbp),%r10d 3d8: e9 c7 fd ff ff jmpq 1a4 <__reset_isolation_pfn+0x84> 3dd: 49 8b 0e mov (%r14),%rcx 3e0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 3e7 <__reset_isolation_pfn+0x2c7> 3e3: R_X86_64_PC32 mem_section-0x4 3e7: 48 89 ca mov %rcx,%rdx 3ea: 48 c1 ea 2d shr $0x2d,%rdx 3ee: 48 85 c0 test %rax,%rax 3f1: 74 17 je 40a <__reset_isolation_pfn+0x2ea> 3f3: 48 c1 e9 35 shr $0x35,%rcx 3f7: 48 8b 04 c8 mov (%rax,%rcx,8),%rax 3fb: 48 85 c0 test %rax,%rax 3fe: 74 0a je 40a <__reset_isolation_pfn+0x2ea> 400: 0f b6 d2 movzbl %dl,%edx 403: 48 c1 e2 04 shl $0x4,%rdx 407: 48 01 d0 add %rdx,%rax 40a: 48 8b 00 mov (%rax),%rax 40d: 4c 89 f6 mov %r14,%rsi 410: b9 01 00 00 00 mov $0x1,%ecx 415: ba 03 00 00 00 mov $0x3,%edx 41a: 4c 89 f7 mov %r14,%rdi 41d: 44 88 55 cb mov %r10b,-0x35(%rbp) 421: 48 83 e0 f8 and $0xfffffffffffffff8,%rax 425: 48 29 c6 sub %rax,%rsi 428: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax 42f: db b6 6d 432: 48 c1 fe 03 sar $0x3,%rsi 436: 48 0f af f0 imul %rax,%rsi 43a: e8 00 00 00 00 callq 43f <__reset_isolation_pfn+0x31f> 43b: R_X86_64_PLT32 get_pfnblock_flags_mask-0x4 43f: 44 0f b6 55 cb movzbl -0x35(%rbp),%r10d 444: 48 85 c0 test %rax,%rax 447: 0f 85 10 fe ff ff jne 25d <__reset_isolation_pfn+0x13d> 44d: e9 52 fd ff ff jmpq 1a4 <__reset_isolation_pfn+0x84> 452: 48 81 fe 00 00 08 00 cmp $0x80000,%rsi 459: 0f 84 42 fd ff ff je 1a1 <__reset_isolation_pfn+0x81> 45f: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 466 <__reset_isolation_pfn+0x346> 462: R_X86_64_PC32 mem_section-0x4 466: 48 85 ff test %rdi,%rdi 469: 0f 85 88 fe ff ff jne 2f7 <__reset_isolation_pfn+0x1d7> 46f: e9 2d fd ff ff jmpq 1a1 <__reset_isolation_pfn+0x81> 474: 49 8b 06 mov (%r14),%rax 477: 48 89 c2 mov %rax,%rdx 47a: 48 c1 e8 35 shr $0x35,%rax 47e: 48 8b 04 c7 mov (%rdi,%rax,8),%rax 482: 48 c1 ea 2d shr $0x2d,%rdx 486: 48 85 c0 test %rax,%rax 489: 74 0a je 495 <__reset_isolation_pfn+0x375> 48b: 0f b6 d2 movzbl %dl,%edx 48e: 48 c1 e2 04 shl $0x4,%rdx 492: 48 01 d0 add %rdx,%rax 495: 48 8b 00 mov (%rax),%rax 498: 4c 89 f2 mov %r14,%rdx 49b: 41 b8 01 00 00 00 mov $0x1,%r8d 4a1: 31 f6 xor %esi,%esi 4a3: b9 03 00 00 00 mov $0x3,%ecx 4a8: 4c 89 f7 mov %r14,%rdi 4ab: 48 83 e0 f8 and $0xfffffffffffffff8,%rax 4af: 48 29 c2 sub %rax,%rdx 4b2: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax 4b9: db b6 6d 4bc: 48 c1 fa 03 sar $0x3,%rdx 4c0: 48 0f af d0 imul %rax,%rdx 4c4: e8 00 00 00 00 callq 4c9 <__reset_isolation_pfn+0x3a9> 4c5: R_X86_64_PLT32 set_pfnblock_flags_mask-0x4 4c9: 45 89 ea mov %r13d,%r10d 4cc: e9 d3 fc ff ff jmpq 1a4 <__reset_isolation_pfn+0x84> 4d1: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 4d8: 00 00 00 00 4dc: 0f 1f 40 00 nopl 0x0(%rax) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-10-01 19:40 ` Florian Weimer @ 2019-10-07 13:28 ` Vlastimil Babka 2019-10-07 13:56 ` Vlastimil Babka 0 siblings, 1 reply; 10+ messages in thread From: Vlastimil Babka @ 2019-10-07 13:28 UTC (permalink / raw) To: Florian Weimer Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm, Mel Gorman On 10/1/19 9:40 PM, Florian Weimer wrote: > * Vlastimil Babka: > >> On 9/30/19 11:17 PM, Dave Chinner wrote: >>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: >>>> * Dave Chinner: >>>> >>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >>>>>> Simply running “du -hc” on a large directory tree causes du to be >>>>>> killed because of kernel paging request failure in the XFS code. >>>>> >>>>> dmesg output? if the system was still running, then you might be >>>>> able to pull the trace from syslog. But we can't do much without >>>>> knowing what the actual failure was.... >>>> >>>> Huh. I actually have something in syslog: >>>> >>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address: >>>> 0000000000000000 >>>> [ 4001.238415] #PF: supervisor read access in kernel mode >>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page >>>> [ 4001.238418] PGD 0 P4D 0 >>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI >>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ >>>> #1 >>>> [ 4001.238424] Hardware name: System manufacturer System Product >>>> Name/P6X58D-E, BIOS 0701 05/10/2011 >>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >>> >>> That's memory compaction code it's crashed in. >>> >>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 >>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 >>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 >>>> 00 00 4c 89 f7 >> >> Tried to decode it, but couldn't match it to source code, my version of >> compiled code is too different. Would it be possible to either send >> mm/compaction.o from the matching build, or output of 'objdump -d -l' >> for the __reset_isolation_pfn function? > > See below. I don't have debuginfo for this build, and the binary does > not reproduce for some reason. Due to the heavy inlining, it might be > quite hard to figure out what's going on. Thanks, but I'm still not able to "decompile" that in my head. > I've switched to kernel builds with debuginfo from now on. I'm > surprised that it's not the default. Let's see if you can reproduce it with that. However, I've noticed at least something weird: > 37e: 49 8b 16 mov (%r14),%rdx > 381: 48 89 d0 mov %rdx,%rax > 384: 48 c1 ea 35 shr $0x35,%rdx > 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx > 38c: 48 c1 e8 2d shr $0x2d,%rax > 390: 48 85 d2 test %rdx,%rdx > 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> IIUC, this will jump to 39f when rdx is zero. > 395: 0f b6 c0 movzbl %al,%eax > 398: 48 c1 e0 04 shl $0x4,%rax > 39c: 48 01 c2 add %rax,%rdx > 39f: 48 8b 02 mov (%rdx),%rax And this is where we crash because rdx is zero. So the test+branch might have sent us directly here to crash. Sounds like an inverted condition somewhere? Or possibly a result of optimizations. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-10-07 13:28 ` Vlastimil Babka @ 2019-10-07 13:56 ` Vlastimil Babka 2019-10-08 8:52 ` Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Vlastimil Babka @ 2019-10-07 13:56 UTC (permalink / raw) To: Florian Weimer, Mel Gorman Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm On 10/7/19 3:28 PM, Vlastimil Babka wrote: > On 10/1/19 9:40 PM, Florian Weimer wrote: >> * Vlastimil Babka: >> >> >> See below. I don't have debuginfo for this build, and the binary does >> not reproduce for some reason. Due to the heavy inlining, it might be >> quite hard to figure out what's going on. > > Thanks, but I'm still not able to "decompile" that in my head. While staring at the code, I think I found two probably unrelated bugs. One is that pfn and page might be desynced when zone starts in the middle of pageblock, as the max() is only applied to page and not pfn. But that only effectively affects the later pfn_valid_within() checks, which should be always true on x86. The second is that "end of pageblock online and valid" should refer to the last pfn of pageblock, not first pfn of next pageblocks. Otherwise we might return false needlessly. Mel, what do you think? --- a/mm/compaction.c +++ b/mm/compaction.c @@ -270,14 +270,15 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source, /* Ensure the start of the pageblock or zone is online and valid */ block_pfn = pageblock_start_pfn(pfn); - block_page = pfn_to_online_page(max(block_pfn, zone->zone_start_pfn)); + block_pfn = max(block_pfn, zone->zone_start_pfn); + block_page = pfn_to_online_page(block_pfn); if (block_page) { page = block_page; pfn = block_pfn; } /* Ensure the end of the pageblock or zone is online and valid */ - block_pfn += pageblock_nr_pages; + block_pfn = pageblock_end_pfn(pfn) - 1; block_pfn = min(block_pfn, zone_end_pfn(zone) - 1); end_page = pfn_to_online_page(block_pfn); if (!end_page) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] 2019-10-07 13:56 ` Vlastimil Babka @ 2019-10-08 8:52 ` Mel Gorman 0 siblings, 0 replies; 10+ messages in thread From: Mel Gorman @ 2019-10-08 8:52 UTC (permalink / raw) To: Vlastimil Babka Cc: Florian Weimer, Dave Chinner, linux-xfs, linux-fsdevel, linux-mm On Mon, Oct 07, 2019 at 03:56:41PM +0200, Vlastimil Babka wrote: > On 10/7/19 3:28 PM, Vlastimil Babka wrote: > > On 10/1/19 9:40 PM, Florian Weimer wrote: > >> * Vlastimil Babka: > >> > >> > >> See below. I don't have debuginfo for this build, and the binary does > >> not reproduce for some reason. Due to the heavy inlining, it might be > >> quite hard to figure out what's going on. > > > > Thanks, but I'm still not able to "decompile" that in my head. > > While staring at the code, I think I found two probably unrelated bugs. > One is that pfn and page might be desynced when zone starts in the middle > of pageblock, as the max() is only applied to page and not pfn. But that > only effectively affects the later pfn_valid_within() checks, which should > be always true on x86. > > The second is that "end of pageblock online and valid" should refer to > the last pfn of pageblock, not first pfn of next pageblocks. Otherwise we > might return false needlessly. Mel, what do you think? > I think you are correct in both cases. It's perfectly possible I would not have observed a problem in testing if zones were aligned which I think is generally the case on my test machines. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-10-08 9:02 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-30 7:28 xfs_inode not reclaimed/memory leak on 5.2.16 Florian Weimer 2019-09-30 8:54 ` Dave Chinner 2019-09-30 19:07 ` Florian Weimer 2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner 2019-09-30 21:42 ` Florian Weimer 2019-10-01 9:10 ` Vlastimil Babka 2019-10-01 19:40 ` Florian Weimer 2019-10-07 13:28 ` Vlastimil Babka 2019-10-07 13:56 ` Vlastimil Babka 2019-10-08 8:52 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).