From: Florian Weimer <fw@deneb.enyo.de>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
linux-mm@kvack.org, Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
Date: Wed, 16 Oct 2019 21:38:49 +0200 [thread overview]
Message-ID: <87blugh452.fsf@mid.deneb.enyo.de> (raw)
In-Reply-To: <96023250-6168-3806-320a-a3468f1cd8c9@suse.cz> (Vlastimil Babka's message of "Tue, 1 Oct 2019 11:10:22 +0200")
* Vlastimil Babka:
> On 9/30/19 11:17 PM, Dave Chinner wrote:
>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>> * Dave Chinner:
>>>
>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>> killed because of kernel paging request failure in the XFS code.
>>>>
>>>> dmesg output? if the system was still running, then you might be
>>>> able to pull the trace from syslog. But we can't do much without
>>>> knowing what the actual failure was....
>>>
>>> Huh. I actually have something in syslog:
>>>
>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000000
>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>> [ 4001.238418] PGD 0 P4D 0
>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>> #1
>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>
>> That's memory compaction code it's crashed in.
>>
>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>> 00 00 4c 89 f7
>
> Tried to decode it, but couldn't match it to source code, my version of
> compiled code is too different. Would it be possible to either send
> mm/compaction.o from the matching build, or output of 'objdump -d -l'
> for the __reset_isolation_pfn function?
(dropping the fs lists)
I got another crash, this time triggered by rsync (large tree with
many small files, few files changed).
Oops:
[41969.140117] BUG: kernel NULL pointer dereference, address: 0000000000000000
[41969.140121] #PF: supervisor read access in kernel mode
[41969.140122] #PF: error_code(0x0000) - not-present page
[41969.140123] PGD 0 P4D 0
[41969.140125] Oops: 0000 [#1] SMP PTI
[41969.140127] CPU: 5 PID: 144 Comm: kswapd0 Tainted: G I 5.2.18fw+ #10
[41969.140128] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
[41969.140133] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
[41969.140134] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
[41969.140135] RSP: 0018:ffffc900003ffde0 EFLAGS: 00010246
[41969.140137] RAX: 000000000004fdac RBX: 0000000000118000 RCX: 0000000000000000
[41969.140138] RDX: 0000000000000000 RSI: 0000000000000230 RDI: ffff88833fffa000
[41969.140138] RBP: ffffc900003ffe18 R08: 000000000000003c R09: ffff888335080000
[41969.140139] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000001
[41969.140140] R13: 0000000000000001 R14: ffff888338dc01c0 R15: 0000000000000001
[41969.140141] FS: 0000000000000000(0000) GS:ffff888333d40000(0000) knlGS:0000000000000000
[41969.140142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41969.140143] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000000206e0
[41969.140144] Call Trace:
[41969.140147] __reset_isolation_suitable+0x9b/0x120
[41969.140149] reset_isolation_suitable+0x3b/0x40
[41969.140152] kswapd+0x98/0x300
[41969.140154] ? wait_woken+0x80/0x80
[41969.140157] kthread+0x114/0x130
[41969.140158] ? balance_pgdat+0x450/0x450
[41969.140159] ? kthread_park+0x80/0x80
[41969.140162] ret_from_fork+0x1f/0x30
[41969.140163] Modules linked in: usb_storage nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp serio_raw snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e xhci_pci ptp ehci_pci uhci_hcd xhci_hcd pps_core ehci_hcd sky2 usbcore ttm usb_common sd_mod
[41969.140187] CR2: 0000000000000000
[41969.140189] ---[ end trace e27ddb472a95c047 ]---
This time, I've got a kernel with debugging information (still
5.2.18). The crash is at offset 0x39f:
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
384: 48 c1 ea 35 shr $0x35,%rdx
388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
38c: 48 c1 e8 2d shr $0x2d,%rax
390: 48 85 d2 test %rdx,%rdx
393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
395: 0f b6 c0 movzbl %al,%eax
398: 48 c1 e0 04 shl $0x4,%rax
39c: 48 01 c2 add %rax,%rdx
unsigned long map = section->section_mem_map;
39f: 48 8b 02 mov (%rdx),%rax
clear_pageblock_skip(page);
3a2: 4c 89 f2 mov %r14,%rdx
3a5: 41 b8 01 00 00 00 mov $0x1,%r8d
3ab: 31 f6 xor %esi,%esi
3ad: b9 03 00 00 00 mov $0x3,%ecx
3b2: 4c 89 f7 mov %r14,%rdi
Hmm, -l output is likely more helpful here:
/home/fw/src/linux/linux/mm/compaction.c:293
37a: a8 10 test $0x10,%al
37c: 74 bc je 33a <__reset_isolation_pfn+0x21a>
page_to_section():
/home/fw/src/linux/linux/./include/linux/mm.h:1265
37e: 49 8b 16 mov (%r14),%rdx
381: 48 89 d0 mov %rdx,%rax
__nr_to_section():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1218
384: 48 c1 ea 35 shr $0x35,%rdx
388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
page_to_section():
/home/fw/src/linux/linux/./include/linux/mm.h:1265
38c: 48 c1 e8 2d shr $0x2d,%rax
__nr_to_section():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1218
390: 48 85 d2 test %rdx,%rdx
393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
/home/fw/src/linux/linux/./include/linux/mmzone.h:1220
395: 0f b6 c0 movzbl %al,%eax
398: 48 c1 e0 04 shl $0x4,%rax
39c: 48 01 c2 add %rax,%rdx
__section_mem_map_addr():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1247
39f: 48 8b 02 mov (%rdx),%rax
__reset_isolation_pfn():
/home/fw/src/linux/linux/mm/compaction.c:294
3a2: 4c 89 f2 mov %r14,%rdx
3a5: 41 b8 01 00 00 00 mov $0x1,%r8d
3ab: 31 f6 xor %esi,%esi
It's this loop:
286 /*
287 * Only clear the hint if a sample indicates there is either a
288 * free page or an LRU page in the block. One or other condition
289 * is necessary for the block to be a migration source/target.
290 */
291 do {
292 if (pfn_valid_within(pfn)) {
293 if (check_source && PageLRU(page)) {
294 clear_pageblock_skip(page);
295 return true;
296 }
297
298 if (check_target && PageBuddy(page)) {
299 clear_pageblock_skip(page);
300 return true;
301 }
302 }
303
304 page += (1 << PAGE_ALLOC_COSTLY_ORDER);
305 pfn += (1 << PAGE_ALLOC_COSTLY_ORDER);
306 } while (page < end_page);
next prev parent reply other threads:[~2019-10-16 19:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87pnji8cpw.fsf@mid.deneb.enyo.de>
[not found] ` <20190930085406.GP16973@dread.disaster.area>
[not found] ` <87o8z1fvqu.fsf@mid.deneb.enyo.de>
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
2019-09-30 21:42 ` Florian Weimer
2019-10-01 9:10 ` Vlastimil Babka
2019-10-01 19:40 ` Florian Weimer
2019-10-07 13:28 ` Vlastimil Babka
2019-10-07 13:56 ` Vlastimil Babka
2019-10-08 8:52 ` Mel Gorman
2019-10-16 19:38 ` Florian Weimer [this message]
2019-10-16 20:03 ` Vlastimil Babka
2019-10-18 17:38 ` Florian Weimer
2019-10-21 8:13 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87blugh452.fsf@mid.deneb.enyo.de \
--to=fw@deneb.enyo.de \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).