* xfs_inode not reclaimed/memory leak on 5.2.16
@ 2019-09-30 7:28 Florian Weimer
2019-09-30 8:54 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2019-09-30 7:28 UTC (permalink / raw)
To: linux-xfs; +Cc: linux-fsdevel
Simply running “du -hc” on a large directory tree causes du to be
killed because of kernel paging request failure in the XFS code.
I ran slabtop, and it showed tons of xfs_inode objects.
The system was rather unhappy after that, so I wasn't able to capture
much more information.
Is this a known issue on Linux 5.2? I don't see it with kernel
5.0.20. Those are plain upstream kernels built for x86-64, with no
unusual config options (that I know of).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs_inode not reclaimed/memory leak on 5.2.16
2019-09-30 7:28 xfs_inode not reclaimed/memory leak on 5.2.16 Florian Weimer
@ 2019-09-30 8:54 ` Dave Chinner
2019-09-30 19:07 ` Florian Weimer
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-09-30 8:54 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-xfs, linux-fsdevel
On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
> Simply running “du -hc” on a large directory tree causes du to be
> killed because of kernel paging request failure in the XFS code.
dmesg output? if the system was still running, then you might be
able to pull the trace from syslog. But we can't do much without
knowing what the actual failure was....
FWIW, one of my regular test workloads is iterating a directory tree
with 50 million inodes in several different ways to stress reclaim
algorithms in ways that users do. I haven't seen issues with that
test for a while, so it's not an obvious problem whatever you came
across.
> I ran slabtop, and it showed tons of xfs_inode objects.
Sure, because your workload is iterating inodes.
> The system was rather unhappy after that, so I wasn't able to capture
> much more information.
>
> Is this a known issue on Linux 5.2?
Not that I know of.
> I don't see it with kernel
> 5.0.20. Those are plain upstream kernels built for x86-64, with no
> unusual config options (that I know of).
We've had quite a few memory reclaim regressions in recent times
that have displayed similar symptoms - XFS is often just the
messenger because the inode cache is generating the memory pressure.
e.g. the shrinker infrastructure was broken in 4.16 and then broken
differently in 4.17 to try to fix it, and we didn't hear about them
until about 4.18/4.19 when users started to trip over them. I fixed
those problems in 5.0, but there's every chance that there have been
new regressions since then.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs_inode not reclaimed/memory leak on 5.2.16
2019-09-30 8:54 ` Dave Chinner
@ 2019-09-30 19:07 ` Florian Weimer
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2019-09-30 19:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel
* Dave Chinner:
> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>> Simply running “du -hc” on a large directory tree causes du to be
>> killed because of kernel paging request failure in the XFS code.
>
> dmesg output? if the system was still running, then you might be
> able to pull the trace from syslog. But we can't do much without
> knowing what the actual failure was....
Huh. I actually have something in syslog:
[ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 4001.238415] #PF: supervisor read access in kernel mode
[ 4001.238417] #PF: error_code(0x0000) - not-present page
[ 4001.238418] PGD 0 P4D 0
[ 4001.238420] Oops: 0000 [#1] SMP PTI
[ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1
[ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
[ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
[ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
[ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
[ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
[ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
[ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
[ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
[ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
[ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
[ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
[ 4001.238446] Call Trace:
[ 4001.238450] __reset_isolation_suitable+0x9b/0x120
[ 4001.238453] reset_isolation_suitable+0x3b/0x40
[ 4001.238456] kswapd+0x98/0x300
[ 4001.238460] ? wait_woken+0x80/0x80
[ 4001.238463] kthread+0x114/0x130
[ 4001.238465] ? balance_pgdat+0x450/0x450
[ 4001.238467] ? kthread_park+0x80/0x80
[ 4001.238470] ret_from_fork+0x1f/0x30
[ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
[ 4001.238509] CR2: 0000000000000000
[ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]---
[ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
[ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
[ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
[ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
[ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
[ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
[ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
[ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
[ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
[ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
[ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8
[ 4001.709172] #PF: supervisor read access in kernel mode
[ 4001.709173] #PF: error_code(0x0000) - not-present page
[ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0
[ 4001.709177] Oops: 0000 [#2] SMP PTI
[ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1
[ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
[ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130
[ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d
[ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012
[ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000
[ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8
[ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638
[ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
[ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000
[ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
[ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
[ 4001.709195] Call Trace:
[ 4001.709198] __wake_up_common_lock+0x6c/0x90
[ 4001.709200] __wake_up+0xe/0x10
[ 4001.709203] wakeup_kswapd+0xf4/0x120
[ 4001.709206] get_page_from_freelist+0x52e/0xc80
[ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0
[ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80
[ 4001.709212] ? radix_tree_lookup+0xd/0x10
[ 4001.709215] ? kmem_cache_alloc+0x80/0xa0
[ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320
[ 4001.709219] xfs_buf_get_map+0xe8/0x190
[ 4001.709220] xfs_buf_read_map+0x25/0x100
[ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0
[ 4001.709225] xfs_imap_to_bp+0x53/0xa0
[ 4001.709226] xfs_iread+0x76/0x1b0
[ 4001.709229] xfs_iget+0x1e5/0x700
[ 4001.709231] xfs_lookup+0x63/0x90
[ 4001.709232] xfs_vn_lookup+0x47/0x80
[ 4001.709235] __lookup_slow+0x7f/0x120
[ 4001.709236] lookup_slow+0x35/0x50
[ 4001.709238] walk_component+0x193/0x2a0
[ 4001.709239] ? path_init+0x112/0x2f0
[ 4001.709240] path_lookupat.isra.16+0x5c/0x200
[ 4001.709242] filename_lookup.part.27+0x88/0x100
[ 4001.709243] ? xfs_ilock+0x39/0x90
[ 4001.709245] ? __check_object_size+0xf6/0x187
[ 4001.709248] ? strncpy_from_user+0x56/0x1c0
[ 4001.709249] user_path_at_empty+0x39/0x40
[ 4001.709250] vfs_statx+0x62/0xb0
[ 4001.709252] __se_sys_newfstatat+0x26/0x50
[ 4001.709254] __x64_sys_newfstatat+0x19/0x20
[ 4001.709255] do_syscall_64+0x4b/0x260
[ 4001.709257] ? page_fault+0x8/0x30
[ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4001.709261] RIP: 0033:0x7f0090c47e49
[ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00
[ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49
[ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004
[ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001
[ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0
[ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b
[ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
[ 4001.709293] CR2: ffffc900003e7ec8
[ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]---
[ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
[ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
[ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
[ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
[ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
[ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
[ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
[ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
[ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
[ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
So XFS wasn't *that* unhappy if it could still write to the file
system.
> FWIW, one of my regular test workloads is iterating a directory tree
> with 50 million inodes in several different ways to stress reclaim
> algorithms in ways that users do. I haven't seen issues with that
> test for a while, so it's not an obvious problem whatever you came
> across.
Right, I should have tried to reproduce it first. I actually can't.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-09-30 19:07 ` Florian Weimer
@ 2019-09-30 21:17 ` Dave Chinner
2019-09-30 21:42 ` Florian Weimer
2019-10-01 9:10 ` Vlastimil Babka
0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2019-09-30 21:17 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-xfs, linux-fsdevel, linux-mm
On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
> * Dave Chinner:
>
> > On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
> >> Simply running “du -hc” on a large directory tree causes du to be
> >> killed because of kernel paging request failure in the XFS code.
> >
> > dmesg output? if the system was still running, then you might be
> > able to pull the trace from syslog. But we can't do much without
> > knowing what the actual failure was....
>
> Huh. I actually have something in syslog:
>
> [ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 4001.238415] #PF: supervisor read access in kernel mode
> [ 4001.238417] #PF: error_code(0x0000) - not-present page
> [ 4001.238418] PGD 0 P4D 0
> [ 4001.238420] Oops: 0000 [#1] SMP PTI
> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1
> [ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
That's memory compaction code it's crashed in.
> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
> [ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
> [ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
> [ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
> [ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
> [ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
> [ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
> [ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
> [ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
> [ 4001.238446] Call Trace:
> [ 4001.238450] __reset_isolation_suitable+0x9b/0x120
> [ 4001.238453] reset_isolation_suitable+0x3b/0x40
> [ 4001.238456] kswapd+0x98/0x300
> [ 4001.238460] ? wait_woken+0x80/0x80
> [ 4001.238463] kthread+0x114/0x130
> [ 4001.238465] ? balance_pgdat+0x450/0x450
> [ 4001.238467] ? kthread_park+0x80/0x80
> [ 4001.238470] ret_from_fork+0x1f/0x30
Ok, so the memory compaction code has had a null pointer dereference
which has killed kswapd. memory reclaim is going to have serious
problems from this point on as kswapd does most of the reclaim.
I have no idea why this might have happened - are they any other
unexpected events or clues in the syslog that might point to a
memory corruption or some sign of badness before this crash?
> [ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
> [ 4001.238509] CR2: 0000000000000000
> [ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]---
> [ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
> [ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
> [ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
> [ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
> [ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
> [ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
> [ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
> [ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
> [ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
> [ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
> [ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8
> [ 4001.709172] #PF: supervisor read access in kernel mode
> [ 4001.709173] #PF: error_code(0x0000) - not-present page
> [ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0
> [ 4001.709177] Oops: 0000 [#2] SMP PTI
> [ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1
> [ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
> [ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130
Then half a second later, the du process has crashed in
__wake_up_common (core scheduler code)....
> [ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d
> [ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012
> [ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000
> [ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8
> [ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638
> [ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
> [ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000
> [ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
> [ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
> [ 4001.709195] Call Trace:
> [ 4001.709198] __wake_up_common_lock+0x6c/0x90
> [ 4001.709200] __wake_up+0xe/0x10
> [ 4001.709203] wakeup_kswapd+0xf4/0x120
...trying to wake up kswapd. This may have crashed because the
kswapd task has been killed and it hasn't been removed from
the wait list and so there's a dead/freed task being woken.
Regardless, this looks like a follow-on issue, not a root cause.
> [ 4001.709206] get_page_from_freelist+0x52e/0xc80
> [ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0
> [ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80
> [ 4001.709212] ? radix_tree_lookup+0xd/0x10
> [ 4001.709215] ? kmem_cache_alloc+0x80/0xa0
> [ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320
> [ 4001.709219] xfs_buf_get_map+0xe8/0x190
> [ 4001.709220] xfs_buf_read_map+0x25/0x100
> [ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0
> [ 4001.709225] xfs_imap_to_bp+0x53/0xa0
> [ 4001.709226] xfs_iread+0x76/0x1b0
> [ 4001.709229] xfs_iget+0x1e5/0x700
> [ 4001.709231] xfs_lookup+0x63/0x90
> [ 4001.709232] xfs_vn_lookup+0x47/0x80
The XFS part of this is that it triggers the memory allocation that
trips over the bad kswapd state, nothing else.
IOWs, this doesn't look like an XFS problem at all, but more likely
something going wrong with memory compaction or memory reclaim, so
I'd suggest linux-mm@kvack.org [cc'd] is the first port of call for
further triage.
> [ 4001.709235] __lookup_slow+0x7f/0x120
> [ 4001.709236] lookup_slow+0x35/0x50
> [ 4001.709238] walk_component+0x193/0x2a0
> [ 4001.709239] ? path_init+0x112/0x2f0
> [ 4001.709240] path_lookupat.isra.16+0x5c/0x200
> [ 4001.709242] filename_lookup.part.27+0x88/0x100
> [ 4001.709243] ? xfs_ilock+0x39/0x90
> [ 4001.709245] ? __check_object_size+0xf6/0x187
> [ 4001.709248] ? strncpy_from_user+0x56/0x1c0
> [ 4001.709249] user_path_at_empty+0x39/0x40
> [ 4001.709250] vfs_statx+0x62/0xb0
> [ 4001.709252] __se_sys_newfstatat+0x26/0x50
> [ 4001.709254] __x64_sys_newfstatat+0x19/0x20
> [ 4001.709255] do_syscall_64+0x4b/0x260
> [ 4001.709257] ? page_fault+0x8/0x30
> [ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 4001.709261] RIP: 0033:0x7f0090c47e49
> [ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00
> [ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
> [ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49
> [ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004
> [ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001
> [ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0
> [ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b
> [ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
> [ 4001.709293] CR2: ffffc900003e7ec8
> [ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]---
> [ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
> [ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
> [ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
> [ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
> [ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
> [ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
> [ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
> [ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
> [ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
> [ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
>
> So XFS wasn't *that* unhappy if it could still write to the file
> system.
RIght, as long as it doesn't trip over any of the leaked state (e.g.
locks) from the du process that was killed, it'll keep going as long
as direct memory reclaim can keep reclaiming memory.
>
> > FWIW, one of my regular test workloads is iterating a directory tree
> > with 50 million inodes in several different ways to stress reclaim
> > algorithms in ways that users do. I haven't seen issues with that
> > test for a while, so it's not an obvious problem whatever you came
> > across.
>
> Right, I should have tried to reproduce it first. I actually can't.
Not surprising, it has the smell of "random crash" to it.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
@ 2019-09-30 21:42 ` Florian Weimer
2019-10-01 9:10 ` Vlastimil Babka
1 sibling, 0 replies; 10+ messages in thread
From: Florian Weimer @ 2019-09-30 21:42 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel, linux-mm
* Dave Chinner:
>> [ 4001.238446] Call Trace:
>> [ 4001.238450] __reset_isolation_suitable+0x9b/0x120
>> [ 4001.238453] reset_isolation_suitable+0x3b/0x40
>> [ 4001.238456] kswapd+0x98/0x300
>> [ 4001.238460] ? wait_woken+0x80/0x80
>> [ 4001.238463] kthread+0x114/0x130
>> [ 4001.238465] ? balance_pgdat+0x450/0x450
>> [ 4001.238467] ? kthread_park+0x80/0x80
>> [ 4001.238470] ret_from_fork+0x1f/0x30
>
> Ok, so the memory compaction code has had a null pointer dereference
> which has killed kswapd. memory reclaim is going to have serious
> problems from this point on as kswapd does most of the reclaim.
Sorry, no. OpenVPN opened a tun device at the same time (same
second), and udevd reacted to that, but that's it.
I also double-checked, and there haven't been any recent previous
occurrences of that crash.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
2019-09-30 21:42 ` Florian Weimer
@ 2019-10-01 9:10 ` Vlastimil Babka
2019-10-01 19:40 ` Florian Weimer
1 sibling, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-10-01 9:10 UTC (permalink / raw)
To: Dave Chinner, Florian Weimer
Cc: linux-xfs, linux-fsdevel, linux-mm, Mel Gorman
On 9/30/19 11:17 PM, Dave Chinner wrote:
> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>> * Dave Chinner:
>>
>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>> killed because of kernel paging request failure in the XFS code.
>>>
>>> dmesg output? if the system was still running, then you might be
>>> able to pull the trace from syslog. But we can't do much without
>>> knowing what the actual failure was....
>>
>> Huh. I actually have something in syslog:
>>
>> [ 4001.238411] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [ 4001.238415] #PF: supervisor read access in kernel mode
>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>> [ 4001.238418] PGD 0 P4D 0
>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ #1
>> [ 4001.238424] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>
> That's memory compaction code it's crashed in.
>
>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
Tried to decode it, but couldn't match it to source code, my version of
compiled code is too different. Would it be possible to either send
mm/compaction.o from the matching build, or output of 'objdump -d -l'
for the __reset_isolation_pfn function?
>> [ 4001.238433] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
>> [ 4001.238435] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
>> [ 4001.238437] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
>> [ 4001.238438] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
>> [ 4001.238439] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
>> [ 4001.238440] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
>> [ 4001.238442] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
>> [ 4001.238444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4001.238445] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
>> [ 4001.238446] Call Trace:
>> [ 4001.238450] __reset_isolation_suitable+0x9b/0x120
>> [ 4001.238453] reset_isolation_suitable+0x3b/0x40
>> [ 4001.238456] kswapd+0x98/0x300
>> [ 4001.238460] ? wait_woken+0x80/0x80
>> [ 4001.238463] kthread+0x114/0x130
>> [ 4001.238465] ? balance_pgdat+0x450/0x450
>> [ 4001.238467] ? kthread_park+0x80/0x80
>> [ 4001.238470] ret_from_fork+0x1f/0x30
>
> Ok, so the memory compaction code has had a null pointer dereference
> which has killed kswapd. memory reclaim is going to have serious
> problems from this point on as kswapd does most of the reclaim.
>
> I have no idea why this might have happened - are they any other
> unexpected events or clues in the syslog that might point to a
> memory corruption or some sign of badness before this crash?
>
>> [ 4001.238472] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
>> [ 4001.238509] CR2: 0000000000000000
>> [ 4001.238511] ---[ end trace 3cdcc14b40255fe6 ]---
>> [ 4001.238514] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>> [ 4001.238516] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
>> [ 4001.238518] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
>> [ 4001.238519] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
>> [ 4001.238521] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
>> [ 4001.238522] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
>> [ 4001.238523] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
>> [ 4001.238524] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
>> [ 4001.238526] FS: 0000000000000000(0000) GS:ffff888333cc0000(0000) knlGS:0000000000000000
>> [ 4001.238528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4001.238529] CR2: 0000000000000000 CR3: 000000000200a003 CR4: 00000000000206e0
>> [ 4001.709169] BUG: unable to handle page fault for address: ffffc900003e7ec8
>> [ 4001.709172] #PF: supervisor read access in kernel mode
>> [ 4001.709173] #PF: error_code(0x0000) - not-present page
>> [ 4001.709174] PGD 33201a067 P4D 33201a067 PUD 33201b067 PMD 3322af067 PTE 0
>> [ 4001.709177] Oops: 0000 [#2] SMP PTI
>> [ 4001.709179] CPU: 1 PID: 10507 Comm: du Tainted: G D I 5.2.16fw+ #1
>> [ 4001.709180] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
>> [ 4001.709184] RIP: 0010:__wake_up_common+0x3c/0x130
>
> Then half a second later, the du process has crashed in
> __wake_up_common (core scheduler code)....
>
>> [ 4001.709186] Code: 85 c9 74 0a 41 f6 01 04 0f 85 9f 00 00 00 48 8b 47 08 48 8d 5f 08 48 83 e8 18 48 8d 78 18 48 39 fb 0f 84 ca 00 00 00 89 75 d4 <48> 8b 70 18 4d 89 cd 45 31 e4 4c 89 45 c8 89 4d d0 89 55 c4 4c 8d
>> [ 4001.709187] RSP: 0018:ffffc900043db5e0 EFLAGS: 00010012
>> [ 4001.709188] RAX: ffffc900003e7eb0 RBX: ffffffff82066f00 RCX: 0000000000000000
>> [ 4001.709189] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffc900003e7ec8
>> [ 4001.709190] RBP: ffffc900043db620 R08: 0000000000000000 R09: ffffc900043db638
>> [ 4001.709191] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
>> [ 4001.709192] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000
>> [ 4001.709193] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
>> [ 4001.709194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4001.709195] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
>> [ 4001.709195] Call Trace:
>> [ 4001.709198] __wake_up_common_lock+0x6c/0x90
>> [ 4001.709200] __wake_up+0xe/0x10
>> [ 4001.709203] wakeup_kswapd+0xf4/0x120
>
> ...trying to wake up kswapd. This may have crashed because the
> kswapd task has been killed and it hasn't been removed from
> the wait list and so there's a dead/freed task being woken.
> Regardless, this looks like a follow-on issue, not a root cause.
>
>> [ 4001.709206] get_page_from_freelist+0x52e/0xc80
>> [ 4001.709208] __alloc_pages_nodemask+0xf0/0xcc0
>> [ 4001.709209] ? get_page_from_freelist+0xa8d/0xc80
>> [ 4001.709212] ? radix_tree_lookup+0xd/0x10
>> [ 4001.709215] ? kmem_cache_alloc+0x80/0xa0
>> [ 4001.709217] xfs_buf_allocate_memory+0x20e/0x320
>> [ 4001.709219] xfs_buf_get_map+0xe8/0x190
>> [ 4001.709220] xfs_buf_read_map+0x25/0x100
>> [ 4001.709223] xfs_trans_read_buf_map+0xb2/0x1f0
>> [ 4001.709225] xfs_imap_to_bp+0x53/0xa0
>> [ 4001.709226] xfs_iread+0x76/0x1b0
>> [ 4001.709229] xfs_iget+0x1e5/0x700
>> [ 4001.709231] xfs_lookup+0x63/0x90
>> [ 4001.709232] xfs_vn_lookup+0x47/0x80
>
> The XFS part of this is that it triggers the memory allocation that
> trips over the bad kswapd state, nothing else.
>
> IOWs, this doesn't look like an XFS problem at all, but more likely
> something going wrong with memory compaction or memory reclaim, so
> I'd suggest linux-mm@kvack.org [cc'd] is the first port of call for
> further triage.
>
>> [ 4001.709235] __lookup_slow+0x7f/0x120
>> [ 4001.709236] lookup_slow+0x35/0x50
>> [ 4001.709238] walk_component+0x193/0x2a0
>> [ 4001.709239] ? path_init+0x112/0x2f0
>> [ 4001.709240] path_lookupat.isra.16+0x5c/0x200
>> [ 4001.709242] filename_lookup.part.27+0x88/0x100
>> [ 4001.709243] ? xfs_ilock+0x39/0x90
>> [ 4001.709245] ? __check_object_size+0xf6/0x187
>> [ 4001.709248] ? strncpy_from_user+0x56/0x1c0
>> [ 4001.709249] user_path_at_empty+0x39/0x40
>> [ 4001.709250] vfs_statx+0x62/0xb0
>> [ 4001.709252] __se_sys_newfstatat+0x26/0x50
>> [ 4001.709254] __x64_sys_newfstatat+0x19/0x20
>> [ 4001.709255] do_syscall_64+0x4b/0x260
>> [ 4001.709257] ? page_fault+0x8/0x30
>> [ 4001.709259] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 4001.709261] RIP: 0033:0x7f0090c47e49
>> [ 4001.709262] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 89 f0 48 89 d6 83 ff 01 77 36 89 c7 45 89 c2 48 89 ca b8 06 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 11 10 0d 00
>> [ 4001.709263] RSP: 002b:00007fffee0929e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
>> [ 4001.709264] RAX: ffffffffffffffda RBX: 00005635d95cfe00 RCX: 00007f0090c47e49
>> [ 4001.709265] RDX: 00005635d95cfe78 RSI: 00005635d95cff08 RDI: 0000000000000004
>> [ 4001.709266] RBP: 00005635d95cfe78 R08: 0000000000000100 R09: 0000000000000001
>> [ 4001.709267] R10: 0000000000000100 R11: 0000000000000246 R12: 00005635d8c1e5c0
>> [ 4001.709268] R13: 00005635d95cfe00 R14: 00005635d8c1e650 R15: 000000000000000b
>> [ 4001.709269] Modules linked in: nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp snd_hda_intel kvm_intel snd_hda_codec serio_raw kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e ptp xhci_pci pps_core uhci_hcd ehci_pci xhci_hcd ehci_hcd sky2 usbcore ttm usb_common sd_mod
>> [ 4001.709293] CR2: ffffc900003e7ec8
>> [ 4001.709295] ---[ end trace 3cdcc14b40255fe7 ]---
>> [ 4001.709297] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>> [ 4001.709299] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
>> [ 4001.709299] RSP: 0018:ffffc900003e7de0 EFLAGS: 00010246
>> [ 4001.709300] RAX: 0000000000057285 RBX: 0000000000108000 RCX: 0000000000000000
>> [ 4001.709301] RDX: 0000000000000000 RSI: 0000000000000210 RDI: ffff88833fffa000
>> [ 4001.709302] RBP: ffffc900003e7e18 R08: 0000000000000004 R09: ffff888335000000
>> [ 4001.709303] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000000
>> [ 4001.709304] R13: 0000000000000000 R14: ffff8883389c01c0 R15: 0000000000000001
>> [ 4001.709305] FS: 00007f0090d20540(0000) GS:ffff888333c40000(0000) knlGS:0000000000000000
>> [ 4001.709306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4001.709307] CR2: ffffc900003e7ec8 CR3: 00000002b85dc004 CR4: 00000000000206e0
>>
>> So XFS wasn't *that* unhappy if it could still write to the file
>> system.
>
> RIght, as long as it doesn't trip over any of the leaked state (e.g.
> locks) from the du process that was killed, it'll keep going as long
> as direct memory reclaim can keep reclaiming memory.
>
>>
>>> FWIW, one of my regular test workloads is iterating a directory tree
>>> with 50 million inodes in several different ways to stress reclaim
>>> algorithms in ways that users do. I haven't seen issues with that
>>> test for a while, so it's not an obvious problem whatever you came
>>> across.
>>
>> Right, I should have tried to reproduce it first. I actually can't.
>
> Not surprising, it has the smell of "random crash" to it.
>
> Cheers,
>
> Dave.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-10-01 9:10 ` Vlastimil Babka
@ 2019-10-01 19:40 ` Florian Weimer
2019-10-07 13:28 ` Vlastimil Babka
0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2019-10-01 19:40 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm, Mel Gorman
* Vlastimil Babka:
> On 9/30/19 11:17 PM, Dave Chinner wrote:
>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>> * Dave Chinner:
>>>
>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>> killed because of kernel paging request failure in the XFS code.
>>>>
>>>> dmesg output? if the system was still running, then you might be
>>>> able to pull the trace from syslog. But we can't do much without
>>>> knowing what the actual failure was....
>>>
>>> Huh. I actually have something in syslog:
>>>
>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000000
>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>> [ 4001.238418] PGD 0 P4D 0
>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>> #1
>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>
>> That's memory compaction code it's crashed in.
>>
>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>> 00 00 4c 89 f7
>
> Tried to decode it, but couldn't match it to source code, my version of
> compiled code is too different. Would it be possible to either send
> mm/compaction.o from the matching build, or output of 'objdump -d -l'
> for the __reset_isolation_pfn function?
See below. I don't have debuginfo for this build, and the binary does
not reproduce for some reason. Due to the heavy inlining, it might be
quite hard to figure out what's going on.
I've switched to kernel builds with debuginfo from now on. I'm
surprised that it's not the default.
0000000000000120 <__reset_isolation_pfn>:
__reset_isolation_pfn():
120: 48 89 f0 mov %rsi,%rax
123: 48 c1 e8 0f shr $0xf,%rax
127: 48 3d ff ff 07 00 cmp $0x7ffff,%rax
12d: 0f 87 83 00 00 00 ja 1b6 <__reset_isolation_pfn+0x96>
133: 4c 8b 0d 00 00 00 00 mov 0x0(%rip),%r9 # 13a <__reset_isolation_pfn+0x1a>
136: R_X86_64_PC32 mem_section-0x4
13a: 4d 85 c9 test %r9,%r9
13d: 74 77 je 1b6 <__reset_isolation_pfn+0x96>
13f: 49 89 f2 mov %rsi,%r10
142: 49 c1 ea 17 shr $0x17,%r10
146: 4f 8b 0c d1 mov (%r9,%r10,8),%r9
14a: 4d 85 c9 test %r9,%r9
14d: 74 67 je 1b6 <__reset_isolation_pfn+0x96>
14f: 0f b6 c0 movzbl %al,%eax
152: 48 c1 e0 04 shl $0x4,%rax
156: 4c 01 c8 add %r9,%rax
159: 74 5b je 1b6 <__reset_isolation_pfn+0x96>
15b: 4c 8b 08 mov (%rax),%r9
15e: 41 f6 c1 02 test $0x2,%r9b
162: 74 52 je 1b6 <__reset_isolation_pfn+0x96>
164: 48 6b c6 38 imul $0x38,%rsi,%rax
168: 55 push %rbp
169: 49 83 e1 f8 and $0xfffffffffffffff8,%r9
16d: 48 89 e5 mov %rsp,%rbp
170: 41 57 push %r15
172: 41 56 push %r14
174: 4d 89 ce mov %r9,%r14
177: 41 55 push %r13
179: 41 54 push %r12
17b: 53 push %rbx
17c: 48 83 ec 10 sub $0x10,%rsp
180: 49 01 c6 add %rax,%r14
183: 74 1c je 1a1 <__reset_isolation_pfn+0x81>
185: 49 8b 06 mov (%r14),%rax
188: 48 c1 e8 2b shr $0x2b,%rax
18c: 83 e0 03 and $0x3,%eax
18f: 48 69 c0 80 05 00 00 imul $0x580,%rax,%rax
196: 48 05 00 00 00 00 add $0x0,%rax
198: R_X86_64_32S contig_page_data
19c: 48 39 c7 cmp %rax,%rdi
19f: 74 1c je 1bd <__reset_isolation_pfn+0x9d>
1a1: 45 31 d2 xor %r10d,%r10d
1a4: 48 83 c4 10 add $0x10,%rsp
1a8: 44 89 d0 mov %r10d,%eax
1ab: 5b pop %rbx
1ac: 41 5c pop %r12
1ae: 41 5d pop %r13
1b0: 41 5e pop %r14
1b2: 41 5f pop %r15
1b4: 5d pop %rbp
1b5: c3 retq
1b6: 45 31 d2 xor %r10d,%r10d
1b9: 44 89 d0 mov %r10d,%eax
1bc: c3 retq
1bd: 48 89 7d d0 mov %rdi,-0x30(%rbp)
1c1: 41 89 cc mov %ecx,%r12d
1c4: 41 89 cd mov %ecx,%r13d
1c7: 4c 89 f7 mov %r14,%rdi
1ca: 89 d1 mov %edx,%ecx
1cc: 41 89 d7 mov %edx,%r15d
1cf: 48 89 f3 mov %rsi,%rbx
1d2: 89 4d cc mov %ecx,-0x34(%rbp)
1d5: e8 46 fe ff ff callq 20 <pageblock_skip_persistent>
1da: 84 c0 test %al,%al
1dc: 75 c3 jne 1a1 <__reset_isolation_pfn+0x81>
1de: 45 89 fa mov %r15d,%r10d
1e1: 45 20 e2 and %r12b,%r10b
1e4: 0f 85 f3 01 00 00 jne 3dd <__reset_isolation_pfn+0x2bd>
1ea: 80 7d cc 01 cmpb $0x1,-0x34(%rbp)
1ee: 74 6d je 25d <__reset_isolation_pfn+0x13d>
1f0: 45 84 e4 test %r12b,%r12b
1f3: 74 68 je 25d <__reset_isolation_pfn+0x13d>
1f5: 49 8b 0e mov (%r14),%rcx
1f8: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 1ff <__reset_isolation_pfn+0xdf>
1fb: R_X86_64_PC32 mem_section-0x4
1ff: 48 89 ca mov %rcx,%rdx
202: 48 c1 ea 2d shr $0x2d,%rdx
206: 48 85 c0 test %rax,%rax
209: 74 17 je 222 <__reset_isolation_pfn+0x102>
20b: 48 c1 e9 35 shr $0x35,%rcx
20f: 48 8b 04 c8 mov (%rax,%rcx,8),%rax
213: 48 85 c0 test %rax,%rax
216: 74 0a je 222 <__reset_isolation_pfn+0x102>
218: 0f b6 d2 movzbl %dl,%edx
21b: 48 c1 e2 04 shl $0x4,%rdx
21f: 48 01 d0 add %rdx,%rax
222: 48 8b 00 mov (%rax),%rax
225: 4c 89 f6 mov %r14,%rsi
228: b9 07 00 00 00 mov $0x7,%ecx
22d: ba 02 00 00 00 mov $0x2,%edx
232: 4c 89 f7 mov %r14,%rdi
235: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
239: 48 29 c6 sub %rax,%rsi
23c: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax
243: db b6 6d
246: 48 c1 fe 03 sar $0x3,%rsi
24a: 48 0f af f0 imul %rax,%rsi
24e: e8 00 00 00 00 callq 253 <__reset_isolation_pfn+0x133>
24f: R_X86_64_PLT32 get_pfnblock_flags_mask-0x4
253: 48 83 f8 01 cmp $0x1,%rax
257: 0f 85 44 ff ff ff jne 1a1 <__reset_isolation_pfn+0x81>
25d: 48 8b 7d d0 mov -0x30(%rbp),%rdi
261: 48 81 e3 00 fe ff ff and $0xfffffffffffffe00,%rbx
268: 48 8b 47 58 mov 0x58(%rdi),%rax
26c: 48 39 c3 cmp %rax,%rbx
26f: 48 89 c1 mov %rax,%rcx
272: 48 0f 43 cb cmovae %rbx,%rcx
276: 48 03 47 68 add 0x68(%rdi),%rax
27a: 48 81 c3 00 02 00 00 add $0x200,%rbx
281: 48 89 ca mov %rcx,%rdx
284: 48 c1 ea 0f shr $0xf,%rdx
288: 48 83 e8 01 sub $0x1,%rax
28c: 48 39 d8 cmp %rbx,%rax
28f: 48 0f 47 c3 cmova %rbx,%rax
293: 48 89 c6 mov %rax,%rsi
296: 48 c1 ee 0f shr $0xf,%rsi
29a: 48 81 fa ff ff 07 00 cmp $0x7ffff,%rdx
2a1: 0f 87 ab 01 00 00 ja 452 <__reset_isolation_pfn+0x332>
2a7: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 2ae <__reset_isolation_pfn+0x18e>
2aa: R_X86_64_PC32 mem_section-0x4
2ae: 48 85 ff test %rdi,%rdi
2b1: 0f 84 ea fe ff ff je 1a1 <__reset_isolation_pfn+0x81>
2b7: 49 89 ca mov %rcx,%r10
2ba: 49 c1 ea 17 shr $0x17,%r10
2be: 4e 8b 14 d7 mov (%rdi,%r10,8),%r10
2c2: 4d 85 d2 test %r10,%r10
2c5: 74 23 je 2ea <__reset_isolation_pfn+0x1ca>
2c7: 0f b6 d2 movzbl %dl,%edx
2ca: 48 c1 e2 04 shl $0x4,%rdx
2ce: 4c 01 d2 add %r10,%rdx
2d1: 74 17 je 2ea <__reset_isolation_pfn+0x1ca>
2d3: 48 8b 12 mov (%rdx),%rdx
2d6: f6 c2 02 test $0x2,%dl
2d9: 74 0f je 2ea <__reset_isolation_pfn+0x1ca>
2db: 48 6b c9 38 imul $0x38,%rcx,%rcx
2df: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx
2e3: 48 01 ca add %rcx,%rdx
2e6: 4c 0f 45 f2 cmovne %rdx,%r14
2ea: 48 81 fe 00 00 08 00 cmp $0x80000,%rsi
2f1: 0f 84 aa fe ff ff je 1a1 <__reset_isolation_pfn+0x81>
2f7: 48 89 c2 mov %rax,%rdx
2fa: 48 c1 ea 17 shr $0x17,%rdx
2fe: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
302: 48 85 d2 test %rdx,%rdx
305: 0f 84 96 fe ff ff je 1a1 <__reset_isolation_pfn+0x81>
30b: 40 0f b6 f6 movzbl %sil,%esi
30f: 48 c1 e6 04 shl $0x4,%rsi
313: 48 01 f2 add %rsi,%rdx
316: 0f 84 85 fe ff ff je 1a1 <__reset_isolation_pfn+0x81>
31c: 48 8b 12 mov (%rdx),%rdx
31f: f6 c2 02 test $0x2,%dl
322: 0f 84 79 fe ff ff je 1a1 <__reset_isolation_pfn+0x81>
328: 48 6b c0 38 imul $0x38,%rax,%rax
32c: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx
330: 48 01 c2 add %rax,%rdx
333: 75 2e jne 363 <__reset_isolation_pfn+0x243>
335: e9 67 fe ff ff jmpq 1a1 <__reset_isolation_pfn+0x81>
33a: 45 84 e4 test %r12b,%r12b
33d: 74 14 je 353 <__reset_isolation_pfn+0x233>
33f: 41 8b 46 30 mov 0x30(%r14),%eax
343: 25 80 00 00 f0 and $0xf0000080,%eax
348: 3d 00 00 00 f0 cmp $0xf0000000,%eax
34d: 0f 84 21 01 00 00 je 474 <__reset_isolation_pfn+0x354>
353: 49 81 c6 c0 01 00 00 add $0x1c0,%r14
35a: 4c 39 f2 cmp %r14,%rdx
35d: 0f 86 3e fe ff ff jbe 1a1 <__reset_isolation_pfn+0x81>
363: 45 84 ff test %r15b,%r15b
366: 74 d2 je 33a <__reset_isolation_pfn+0x21a>
368: 49 8b 4e 08 mov 0x8(%r14),%rcx
36c: 48 8d 41 ff lea -0x1(%rcx),%rax
370: 83 e1 01 and $0x1,%ecx
373: 49 0f 44 c6 cmove %r14,%rax
377: 48 8b 00 mov (%rax),%rax
37a: a8 10 test $0x10,%al
37c: 74 bc je 33a <__reset_isolation_pfn+0x21a>
37e: 49 8b 16 mov (%r14),%rdx
381: 48 89 d0 mov %rdx,%rax
384: 48 c1 ea 35 shr $0x35,%rdx
388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
38c: 48 c1 e8 2d shr $0x2d,%rax
390: 48 85 d2 test %rdx,%rdx
393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
395: 0f b6 c0 movzbl %al,%eax
398: 48 c1 e0 04 shl $0x4,%rax
39c: 48 01 c2 add %rax,%rdx
39f: 48 8b 02 mov (%rdx),%rax
3a2: 4c 89 f2 mov %r14,%rdx
3a5: 41 b8 01 00 00 00 mov $0x1,%r8d
3ab: 31 f6 xor %esi,%esi
3ad: b9 03 00 00 00 mov $0x3,%ecx
3b2: 4c 89 f7 mov %r14,%rdi
3b5: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
3b9: 48 29 c2 sub %rax,%rdx
3bc: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax
3c3: db b6 6d
3c6: 48 c1 fa 03 sar $0x3,%rdx
3ca: 48 0f af d0 imul %rax,%rdx
3ce: e8 00 00 00 00 callq 3d3 <__reset_isolation_pfn+0x2b3>
3cf: R_X86_64_PLT32 set_pfnblock_flags_mask-0x4
3d3: 44 0f b6 55 cc movzbl -0x34(%rbp),%r10d
3d8: e9 c7 fd ff ff jmpq 1a4 <__reset_isolation_pfn+0x84>
3dd: 49 8b 0e mov (%r14),%rcx
3e0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 3e7 <__reset_isolation_pfn+0x2c7>
3e3: R_X86_64_PC32 mem_section-0x4
3e7: 48 89 ca mov %rcx,%rdx
3ea: 48 c1 ea 2d shr $0x2d,%rdx
3ee: 48 85 c0 test %rax,%rax
3f1: 74 17 je 40a <__reset_isolation_pfn+0x2ea>
3f3: 48 c1 e9 35 shr $0x35,%rcx
3f7: 48 8b 04 c8 mov (%rax,%rcx,8),%rax
3fb: 48 85 c0 test %rax,%rax
3fe: 74 0a je 40a <__reset_isolation_pfn+0x2ea>
400: 0f b6 d2 movzbl %dl,%edx
403: 48 c1 e2 04 shl $0x4,%rdx
407: 48 01 d0 add %rdx,%rax
40a: 48 8b 00 mov (%rax),%rax
40d: 4c 89 f6 mov %r14,%rsi
410: b9 01 00 00 00 mov $0x1,%ecx
415: ba 03 00 00 00 mov $0x3,%edx
41a: 4c 89 f7 mov %r14,%rdi
41d: 44 88 55 cb mov %r10b,-0x35(%rbp)
421: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
425: 48 29 c6 sub %rax,%rsi
428: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax
42f: db b6 6d
432: 48 c1 fe 03 sar $0x3,%rsi
436: 48 0f af f0 imul %rax,%rsi
43a: e8 00 00 00 00 callq 43f <__reset_isolation_pfn+0x31f>
43b: R_X86_64_PLT32 get_pfnblock_flags_mask-0x4
43f: 44 0f b6 55 cb movzbl -0x35(%rbp),%r10d
444: 48 85 c0 test %rax,%rax
447: 0f 85 10 fe ff ff jne 25d <__reset_isolation_pfn+0x13d>
44d: e9 52 fd ff ff jmpq 1a4 <__reset_isolation_pfn+0x84>
452: 48 81 fe 00 00 08 00 cmp $0x80000,%rsi
459: 0f 84 42 fd ff ff je 1a1 <__reset_isolation_pfn+0x81>
45f: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 466 <__reset_isolation_pfn+0x346>
462: R_X86_64_PC32 mem_section-0x4
466: 48 85 ff test %rdi,%rdi
469: 0f 85 88 fe ff ff jne 2f7 <__reset_isolation_pfn+0x1d7>
46f: e9 2d fd ff ff jmpq 1a1 <__reset_isolation_pfn+0x81>
474: 49 8b 06 mov (%r14),%rax
477: 48 89 c2 mov %rax,%rdx
47a: 48 c1 e8 35 shr $0x35,%rax
47e: 48 8b 04 c7 mov (%rdi,%rax,8),%rax
482: 48 c1 ea 2d shr $0x2d,%rdx
486: 48 85 c0 test %rax,%rax
489: 74 0a je 495 <__reset_isolation_pfn+0x375>
48b: 0f b6 d2 movzbl %dl,%edx
48e: 48 c1 e2 04 shl $0x4,%rdx
492: 48 01 d0 add %rdx,%rax
495: 48 8b 00 mov (%rax),%rax
498: 4c 89 f2 mov %r14,%rdx
49b: 41 b8 01 00 00 00 mov $0x1,%r8d
4a1: 31 f6 xor %esi,%esi
4a3: b9 03 00 00 00 mov $0x3,%ecx
4a8: 4c 89 f7 mov %r14,%rdi
4ab: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
4af: 48 29 c2 sub %rax,%rdx
4b2: 48 b8 b7 6d db b6 6d movabs $0x6db6db6db6db6db7,%rax
4b9: db b6 6d
4bc: 48 c1 fa 03 sar $0x3,%rdx
4c0: 48 0f af d0 imul %rax,%rdx
4c4: e8 00 00 00 00 callq 4c9 <__reset_isolation_pfn+0x3a9>
4c5: R_X86_64_PLT32 set_pfnblock_flags_mask-0x4
4c9: 45 89 ea mov %r13d,%r10d
4cc: e9 d3 fc ff ff jmpq 1a4 <__reset_isolation_pfn+0x84>
4d1: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
4d8: 00 00 00 00
4dc: 0f 1f 40 00 nopl 0x0(%rax)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-10-01 19:40 ` Florian Weimer
@ 2019-10-07 13:28 ` Vlastimil Babka
2019-10-07 13:56 ` Vlastimil Babka
0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-10-07 13:28 UTC (permalink / raw)
To: Florian Weimer
Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm, Mel Gorman
On 10/1/19 9:40 PM, Florian Weimer wrote:
> * Vlastimil Babka:
>
>> On 9/30/19 11:17 PM, Dave Chinner wrote:
>>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>>> * Dave Chinner:
>>>>
>>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>>> killed because of kernel paging request failure in the XFS code.
>>>>>
>>>>> dmesg output? if the system was still running, then you might be
>>>>> able to pull the trace from syslog. But we can't do much without
>>>>> knowing what the actual failure was....
>>>>
>>>> Huh. I actually have something in syslog:
>>>>
>>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000000
>>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>>> [ 4001.238418] PGD 0 P4D 0
>>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>>> #1
>>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>>
>>> That's memory compaction code it's crashed in.
>>>
>>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>>> 00 00 4c 89 f7
>>
>> Tried to decode it, but couldn't match it to source code, my version of
>> compiled code is too different. Would it be possible to either send
>> mm/compaction.o from the matching build, or output of 'objdump -d -l'
>> for the __reset_isolation_pfn function?
>
> See below. I don't have debuginfo for this build, and the binary does
> not reproduce for some reason. Due to the heavy inlining, it might be
> quite hard to figure out what's going on.
Thanks, but I'm still not able to "decompile" that in my head.
> I've switched to kernel builds with debuginfo from now on. I'm
> surprised that it's not the default.
Let's see if you can reproduce it with that.
However, I've noticed at least something weird:
> 37e: 49 8b 16 mov (%r14),%rdx
> 381: 48 89 d0 mov %rdx,%rax
> 384: 48 c1 ea 35 shr $0x35,%rdx
> 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
> 38c: 48 c1 e8 2d shr $0x2d,%rax
> 390: 48 85 d2 test %rdx,%rdx
> 393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
IIUC, this will jump to 39f when rdx is zero.
> 395: 0f b6 c0 movzbl %al,%eax
> 398: 48 c1 e0 04 shl $0x4,%rax
> 39c: 48 01 c2 add %rax,%rdx
> 39f: 48 8b 02 mov (%rdx),%rax
And this is where we crash because rdx is zero. So the test+branch might
have sent us directly here to crash. Sounds like an inverted condition
somewhere? Or possibly a result of optimizations.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-10-07 13:28 ` Vlastimil Babka
@ 2019-10-07 13:56 ` Vlastimil Babka
2019-10-08 8:52 ` Mel Gorman
0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-10-07 13:56 UTC (permalink / raw)
To: Florian Weimer, Mel Gorman
Cc: Dave Chinner, linux-xfs, linux-fsdevel, linux-mm
On 10/7/19 3:28 PM, Vlastimil Babka wrote:
> On 10/1/19 9:40 PM, Florian Weimer wrote:
>> * Vlastimil Babka:
>>
>>
>> See below. I don't have debuginfo for this build, and the binary does
>> not reproduce for some reason. Due to the heavy inlining, it might be
>> quite hard to figure out what's going on.
>
> Thanks, but I'm still not able to "decompile" that in my head.
While staring at the code, I think I found two probably unrelated bugs.
One is that pfn and page might be desynced when zone starts in the middle
of pageblock, as the max() is only applied to page and not pfn. But that
only effectively affects the later pfn_valid_within() checks, which should
be always true on x86.
The second is that "end of pageblock online and valid" should refer to
the last pfn of pageblock, not first pfn of next pageblocks. Otherwise we
might return false needlessly. Mel, what do you think?
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -270,14 +270,15 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
/* Ensure the start of the pageblock or zone is online and valid */
block_pfn = pageblock_start_pfn(pfn);
- block_page = pfn_to_online_page(max(block_pfn, zone->zone_start_pfn));
+ block_pfn = max(block_pfn, zone->zone_start_pfn);
+ block_page = pfn_to_online_page(block_pfn);
if (block_page) {
page = block_page;
pfn = block_pfn;
}
/* Ensure the end of the pageblock or zone is online and valid */
- block_pfn += pageblock_nr_pages;
+ block_pfn = pageblock_end_pfn(pfn) - 1;
block_pfn = min(block_pfn, zone_end_pfn(zone) - 1);
end_page = pfn_to_online_page(block_pfn);
if (!end_page)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
2019-10-07 13:56 ` Vlastimil Babka
@ 2019-10-08 8:52 ` Mel Gorman
0 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2019-10-08 8:52 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Florian Weimer, Dave Chinner, linux-xfs, linux-fsdevel, linux-mm
On Mon, Oct 07, 2019 at 03:56:41PM +0200, Vlastimil Babka wrote:
> On 10/7/19 3:28 PM, Vlastimil Babka wrote:
> > On 10/1/19 9:40 PM, Florian Weimer wrote:
> >> * Vlastimil Babka:
> >>
> >>
> >> See below. I don't have debuginfo for this build, and the binary does
> >> not reproduce for some reason. Due to the heavy inlining, it might be
> >> quite hard to figure out what's going on.
> >
> > Thanks, but I'm still not able to "decompile" that in my head.
>
> While staring at the code, I think I found two probably unrelated bugs.
> One is that pfn and page might be desynced when zone starts in the middle
> of pageblock, as the max() is only applied to page and not pfn. But that
> only effectively affects the later pfn_valid_within() checks, which should
> be always true on x86.
>
> The second is that "end of pageblock online and valid" should refer to
> the last pfn of pageblock, not first pfn of next pageblocks. Otherwise we
> might return false needlessly. Mel, what do you think?
>
I think you are correct in both cases. It's perfectly possible I would
not have observed a problem in testing if zones were aligned which I
think is generally the case on my test machines.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-10-08 9:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30 7:28 xfs_inode not reclaimed/memory leak on 5.2.16 Florian Weimer
2019-09-30 8:54 ` Dave Chinner
2019-09-30 19:07 ` Florian Weimer
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
2019-09-30 21:42 ` Florian Weimer
2019-10-01 9:10 ` Vlastimil Babka
2019-10-01 19:40 ` Florian Weimer
2019-10-07 13:28 ` Vlastimil Babka
2019-10-07 13:56 ` Vlastimil Babka
2019-10-08 8:52 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).