From: Matthew Wilcox <willy@infradead.org> To: Zorro Lang <zlang@kernel.org> Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-ext4@vger.kernel.org Subject: Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Date: Thu, 29 Sep 2022 22:28:00 +0100 [thread overview] Message-ID: <YzYN4JqbKdxLd6oA@casper.infradead.org> (raw) In-Reply-To: <20220927011720.7jmugevxc7ax26qw@zlang-mailbox> On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote: > Hi mm and ppc list, > > Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k > blocksize* ext4. It's not easy to reproduce, but still has chance to trigger > by loop running generic/048 on ppc64le (not sure all kind of ppc64le can > reproduce it). > > Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only > hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more > likes folio related issue, so I cc mm and ppc mail list, hope to get more > reviewing. Argh. This is the wrong way to do it. Please stop using bugzilla. Now there's discussion in two places and there's nowhere to see all of it. > [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 > [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c > [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] > [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 > [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 > [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) > [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 > [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 > [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 > [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 > [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 > [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 > [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 > [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 > [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 > [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 > [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 > [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 > [ 4681.238650] Call Trace: > [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) > [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 > [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 > [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 > [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 > [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 > [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 > [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 > [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 > [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 > [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 > [ 4681.238745] Instruction dump: > [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 > [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 Running that through scripts/decodecode (with some minor hacks .. how do PPC people do this properly?) I get: 0: fb c1 ff f0 std r30,-16(r1) 4: f8 21 ff c1 stdu r1,-64(r1) 8: 7c 7d 1b 78 mr r29,r3 c: 7c 9c 23 78 mr r28,r4 10: eb c3 00 28 ld r30,40(r3) 14: 7f df f3 78 mr r31,r30 18: 48 00 00 18 b 0x30 1c: 60 00 00 00 nop 20: 60 00 00 00 nop 24: eb ff 00 08 ld r31,8(r31) 28: 7c 3e f8 40 cmpld r30,r31 2c: 41 82 00 48 beq 0x74 30:* 81 5f 00 60 lwz r10,96(r31) <-- trapping instruction 34: e9 3f 00 00 ld r9,0(r31) 38: 55 29 07 7c rlwinm r9,r9,0,29,30 3c: 7d 29 53 78 or r9,r9,r10 That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving us an effective address of 0x69. It would be nice to know what source line that corresponds to. Could you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0 into a line number? I can't because it needs the vmlinux you generated.
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org> To: Zorro Lang <zlang@kernel.org> Cc: linux-mm@kvack.org, linux-ext4@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Date: Thu, 29 Sep 2022 22:28:00 +0100 [thread overview] Message-ID: <YzYN4JqbKdxLd6oA@casper.infradead.org> (raw) In-Reply-To: <20220927011720.7jmugevxc7ax26qw@zlang-mailbox> On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote: > Hi mm and ppc list, > > Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k > blocksize* ext4. It's not easy to reproduce, but still has chance to trigger > by loop running generic/048 on ppc64le (not sure all kind of ppc64le can > reproduce it). > > Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only > hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more > likes folio related issue, so I cc mm and ppc mail list, hope to get more > reviewing. Argh. This is the wrong way to do it. Please stop using bugzilla. Now there's discussion in two places and there's nowhere to see all of it. > [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 > [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c > [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] > [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 > [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 > [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) > [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 > [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 > [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 > [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 > [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 > [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 > [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 > [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 > [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 > [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 > [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 > [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 > [ 4681.238650] Call Trace: > [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) > [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 > [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 > [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 > [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 > [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 > [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 > [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 > [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 > [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 > [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 > [ 4681.238745] Instruction dump: > [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 > [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 Running that through scripts/decodecode (with some minor hacks .. how do PPC people do this properly?) I get: 0: fb c1 ff f0 std r30,-16(r1) 4: f8 21 ff c1 stdu r1,-64(r1) 8: 7c 7d 1b 78 mr r29,r3 c: 7c 9c 23 78 mr r28,r4 10: eb c3 00 28 ld r30,40(r3) 14: 7f df f3 78 mr r31,r30 18: 48 00 00 18 b 0x30 1c: 60 00 00 00 nop 20: 60 00 00 00 nop 24: eb ff 00 08 ld r31,8(r31) 28: 7c 3e f8 40 cmpld r30,r31 2c: 41 82 00 48 beq 0x74 30:* 81 5f 00 60 lwz r10,96(r31) <-- trapping instruction 34: e9 3f 00 00 ld r9,0(r31) 38: 55 29 07 7c rlwinm r9,r9,0,29,30 3c: 7d 29 53 78 or r9,r9,r10 That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving us an effective address of 0x69. It would be nice to know what source line that corresponds to. Could you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0 into a line number? I can't because it needs the vmlinux you generated.
next prev parent reply other threads:[~2022-09-29 21:28 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-09-27 1:17 [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Zorro Lang 2022-09-27 1:17 ` Zorro Lang 2022-09-29 21:28 ` Matthew Wilcox [this message] 2022-09-29 21:28 ` Matthew Wilcox 2022-09-30 2:01 ` Michael Ellerman 2022-09-30 18:59 ` Matthew Wilcox 2022-09-30 18:59 ` Matthew Wilcox 2022-10-05 11:13 ` Michael Ellerman 2022-10-05 11:13 ` Michael Ellerman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YzYN4JqbKdxLd6oA@casper.infradead.org \ --to=willy@infradead.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=zlang@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.