* [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 @ 2022-09-27 1:17 Zorro Lang 2022-09-29 21:28 ` Matthew Wilcox 0 siblings, 1 reply; 5+ messages in thread From: Zorro Lang @ 2022-09-27 1:17 UTC (permalink / raw) To: linux-mm; +Cc: linuxppc-dev, linux-ext4 Hi mm and ppc list, Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k blocksize* ext4. It's not easy to reproduce, but still has chance to trigger by loop running generic/048 on ppc64le (not sure all kind of ppc64le can reproduce it). Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more likes folio related issue, so I cc mm and ppc mail list, hope to get more reviewing. Thanks, Zorro [1] https://bugzilla.kernel.org/show_bug.cgi?id=216529 [2] [ 4638.919160] run fstests generic/048 at 2022-09-23 21:00:41 [ 4641.700564] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota mode: none. [ 4641.710999] EXT4-fs (sda3): shut down requested (1) [ 4641.718544] Aborting journal on device sda3-8. [ 4641.740342] EXT4-fs (sda3): unmounting filesystem. [ 4643.000415] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota mode: none. [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4681.230942] Modules linked in: dm_flakey ext2 dm_snapshot dm_bufio dm_zero dm_mod loop ext4 mbcache jbd2 bonding rfkill tls sunrpc pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 [ 4681.238650] Call Trace: [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 [ 4681.238745] Instruction dump: [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 [ 4681.238782] ---[ end trace 0000000000000000 ]--- [ 4681.270607] [ 4681.337460] Kernel attempted to read user page (6a) - exploit attempt? (uid: 0) [ 4681.337469] BUG: Kernel NULL pointer dereference on read at 0x0000006a [ 4681.337474] Faulting instruction address: 0xc00000000068ee0c [ 4681.337478] Oops: Kernel access of bad area, sig: 11 [#2] [ 4681.337481] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4681.337486] Modules linked in: dm_flakey ext2 dm_snapshot dm_bufio dm_zero dm_mod loop ext4 mbcache jbd2 bonding rfkill tls sunrpc pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto [ 4681.337517] CPU: 2 PID: 704157 Comm: xfs_io Kdump: loaded Tainted: G D 6.0.0-rc6+ #1 [ 4681.337523] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 [ 4681.337527] REGS: c000000036006ef0 TRAP: 0300 Tainted: G D (6.0.0-rc6+) [ 4681.337532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28428242 XER: 00000001 [ 4681.337546] CFAR: c00000000000c80c DAR: 000000000000006a DSISR: 40000000 IRQMASK: 0 [ 4681.337546] GPR00: c00000000068f2b8 c000000036007190 c000000002cf1700 c00c000000424740 [ 4681.337546] GPR04: c0000000360071f0 0000000000000000 0000000000000002 0000000000000000 [ 4681.337546] GPR08: c000000002d404b0 0000000000000000 c00c000000424740 0000000000000002 [ 4681.337546] GPR12: 0000000000000000 c00000000ffce400 0000000000000000 c0000001fff9c480 [ 4681.337546] GPR16: c00000004960e050 0000000000000007 0000000000000000 0000000000000000 [ 4681.337546] GPR20: 0000000000000001 c000000036007288 c00000000146b9d8 5deadbeef0000100 [ 4681.337546] GPR24: 5deadbeef0000122 c00000004960e000 c000000036007678 c000000036007278 [ 4681.337546] GPR28: c0000000360071f0 c00c000000424740 000000000000000a 000000000000000a [ 4681.337602] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 [ 4681.337608] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 [ 4681.337613] Call Trace: [ 4681.337616] [c000000036007190] [c000000036007210] 0xc000000036007210 (unreliable) [ 4681.337622] [c0000000360071d0] [c000000036007678] 0xc000000036007678 [ 4681.337627] [c000000036007220] [c000000000498708] filemap_release_folio+0x88/0xb0 [ 4681.337633] [c000000036007240] [c0000000004c51c0] shrink_active_list+0x490/0x750 [ 4681.337640] [c000000036007340] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 [ 4681.337645] [c000000036007430] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 [ 4681.337651] [c0000000360074a0] [c0000000004ca3c4] shrink_node+0x174/0x6b0 [ 4681.337656] [c000000036007550] [c0000000004cbd34] shrink_zones.constprop.0+0xd4/0x3e0 [ 4681.337661] [c0000000360075d0] [c0000000004cc158] do_try_to_free_pages+0x118/0x470 [ 4681.337667] [c000000036007650] [c0000000004cd084] try_to_free_pages+0x194/0x4c0 [ 4681.337673] [c000000036007720] [c00000000054cca4] __alloc_pages_slowpath.constprop.0+0x4f4/0xd80 [ 4681.337680] [c000000036007880] [c00000000054d95c] __alloc_pages+0x42c/0x580 [ 4681.337686] [c000000036007910] [c000000000587d88] alloc_pages+0xd8/0x1d0 [ 4681.337692] [c000000036007960] [c000000000587eb4] folio_alloc+0x34/0x90 [ 4681.337698] [c000000036007990] [c000000000498bc0] filemap_alloc_folio+0x40/0x60 [ 4681.337703] [c0000000360079b0] [c0000000004a0f54] __filemap_get_folio+0x224/0x790 [ 4681.337709] [c000000036007ab0] [c0000000004b4830] pagecache_get_page+0x30/0xb0 [ 4681.337715] [c000000036007ae0] [c008000003a9e4dc] ext4_da_write_begin+0x1a4/0x4f0 [ext4] [ 4681.337742] [c000000036007b70] [c000000000498e54] generic_perform_write+0xf4/0x2b0 [ 4681.337748] [c000000036007c20] [c008000003a7d190] ext4_buffered_write_iter+0xa8/0x1a0 [ext4] [ 4681.337770] [c000000036007c70] [c000000000615fc8] vfs_write+0x358/0x4b0 [ 4681.337776] [c000000036007d40] [c0000000006161f4] sys_pwrite64+0xd4/0x120 [ 4681.337782] [c000000036007da0] [c0000000000318d0] system_call_exception+0x180/0x430 [ 4681.337788] [c000000036007e10] [c00000000000be68] system_call_vectored_common+0xe8/0x278 [ 4681.337795] --- interrupt: 3000 at 0x7fff95651da4 [ 4681.337799] NIP: 00007fff95651da4 LR: 0000000000000000 CTR: 0000000000000000 [ 4681.337803] REGS: c000000036007e80 TRAP: 3000 Tainted: G D (6.0.0-rc6+) [ 4681.337807] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 48082402 XER: 00000000 [ 4681.337822] IRQMASK: 0 [ 4681.337822] GPR00: 00000000000000b4 00007ffffaa52530 00007fff95767200 0000000000000003 [ 4681.337822] GPR04: 0000010031ac0000 0000000000010000 0000000000490000 00007fff9581a5a0 [ 4681.337822] GPR08: 00007fff95812e68 0000000000000000 0000000000000000 0000000000000000 [ 4681.337822] GPR12: 0000000000000000 00007fff9581a5a0 0000000000a00000 ffffffffffffffff [ 4681.337822] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 4681.337822] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000490000 [ 4681.337822] GPR24: 0000000000000049 0000000000000000 0000000000000000 0000000000010000 [ 4681.337822] GPR28: 0000010031ac0000 0000000000000003 0000000000000000 0000000000490000 [ 4681.337875] NIP [00007fff95651da4] 0x7fff95651da4 [ 4681.337878] LR [0000000000000000] 0x0 [ 4681.337881] --- interrupt: 3000 [ 4681.337884] Instruction dump: [ 4681.337887] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 [ 4681.337897] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 [ 4681.337908] ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 2022-09-27 1:17 [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Zorro Lang @ 2022-09-29 21:28 ` Matthew Wilcox 2022-09-30 2:01 ` Michael Ellerman 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2022-09-29 21:28 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-mm, linuxppc-dev, linux-ext4 On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote: > Hi mm and ppc list, > > Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k > blocksize* ext4. It's not easy to reproduce, but still has chance to trigger > by loop running generic/048 on ppc64le (not sure all kind of ppc64le can > reproduce it). > > Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only > hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more > likes folio related issue, so I cc mm and ppc mail list, hope to get more > reviewing. Argh. This is the wrong way to do it. Please stop using bugzilla. Now there's discussion in two places and there's nowhere to see all of it. > [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 > [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c > [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] > [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 > [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 > [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) > [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 > [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 > [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 > [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 > [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 > [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 > [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 > [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 > [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 > [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 > [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 > [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 > [ 4681.238650] Call Trace: > [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) > [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 > [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 > [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 > [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 > [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 > [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 > [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 > [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 > [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 > [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 > [ 4681.238745] Instruction dump: > [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 > [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 Running that through scripts/decodecode (with some minor hacks .. how do PPC people do this properly?) I get: 0: fb c1 ff f0 std r30,-16(r1) 4: f8 21 ff c1 stdu r1,-64(r1) 8: 7c 7d 1b 78 mr r29,r3 c: 7c 9c 23 78 mr r28,r4 10: eb c3 00 28 ld r30,40(r3) 14: 7f df f3 78 mr r31,r30 18: 48 00 00 18 b 0x30 1c: 60 00 00 00 nop 20: 60 00 00 00 nop 24: eb ff 00 08 ld r31,8(r31) 28: 7c 3e f8 40 cmpld r30,r31 2c: 41 82 00 48 beq 0x74 30:* 81 5f 00 60 lwz r10,96(r31) <-- trapping instruction 34: e9 3f 00 00 ld r9,0(r31) 38: 55 29 07 7c rlwinm r9,r9,0,29,30 3c: 7d 29 53 78 or r9,r9,r10 That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving us an effective address of 0x69. It would be nice to know what source line that corresponds to. Could you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0 into a line number? I can't because it needs the vmlinux you generated. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 2022-09-29 21:28 ` Matthew Wilcox @ 2022-09-30 2:01 ` Michael Ellerman 2022-09-30 18:59 ` Matthew Wilcox 0 siblings, 1 reply; 5+ messages in thread From: Michael Ellerman @ 2022-09-30 2:01 UTC (permalink / raw) To: Matthew Wilcox, Zorro Lang; +Cc: linux-mm, linux-ext4, linuxppc-dev Matthew Wilcox <willy@infradead.org> writes: > On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote: >> Hi mm and ppc list, >> >> Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k >> blocksize* ext4. It's not easy to reproduce, but still has chance to trigger >> by loop running generic/048 on ppc64le (not sure all kind of ppc64le can >> reproduce it). >> >> Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only >> hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more >> likes folio related issue, so I cc mm and ppc mail list, hope to get more >> reviewing. > > Argh. This is the wrong way to do it. Please stop using bugzilla. > Now there's discussion in two places and there's nowhere to see all > of it. > >> [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 >> [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c >> [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] >> [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries >> [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 >> [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 >> [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) >> [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 >> [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 >> [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 >> [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 >> [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 >> [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 >> [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 >> [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 >> [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 >> [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 >> [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 >> [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 >> [ 4681.238650] Call Trace: >> [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) >> [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 >> [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 >> [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 >> [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 >> [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 >> [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 >> [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 >> [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 >> [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 >> [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 >> [ 4681.238745] Instruction dump: >> [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 >> [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 > > Running that through scripts/decodecode (with some minor hacks .. how > do PPC people do this properly?) We've just always used our own scripts. Mine is here: https://github.com/mpe/misc-scripts/blob/master/ppc/ppc-disasm I've added an issue to our tracker for us to get scripts/decodecode working on our oopses (eventually). > I get: > > 0: fb c1 ff f0 std r30,-16(r1) > 4: f8 21 ff c1 stdu r1,-64(r1) > 8: 7c 7d 1b 78 mr r29,r3 > c: 7c 9c 23 78 mr r28,r4 > 10: eb c3 00 28 ld r30,40(r3) > 14: 7f df f3 78 mr r31,r30 > 18: 48 00 00 18 b 0x30 > 1c: 60 00 00 00 nop > 20: 60 00 00 00 nop > 24: eb ff 00 08 ld r31,8(r31) > 28: 7c 3e f8 40 cmpld r30,r31 > 2c: 41 82 00 48 beq 0x74 > 30:* 81 5f 00 60 lwz r10,96(r31) <-- trapping instruction > 34: e9 3f 00 00 ld r9,0(r31) > 38: 55 29 07 7c rlwinm r9,r9,0,29,30 > 3c: 7d 29 53 78 or r9,r9,r10 > > That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving > us an effective address of 0x69. > > It would be nice to know what source line that corresponds to. Could > you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0 > into a line number? I can't because it needs the vmlinux you generated. You'll need: https://lore.kernel.org/all/20220927075211.897152-1-srikar@linux.vnet.ibm.com/ I don't have the same vmlinux obviously, but mine seems to match up pretty closely, I get: c0000000004e3900 <drop_buffers.constprop.0>: c0000000004e3900: b9 00 4c 3c addis r2,r12,185 c0000000004e3904: 00 c5 42 38 addi r2,r2,-15104 c0000000004e3908: a6 02 08 7c mflr r0 c0000000004e390c: 29 4f b8 4b bl c000000000068834 <_mcount> # ^ entry & ftrace stuff c0000000004e3910: e0 ff 81 fb std r28,-32(r1) c0000000004e3914: e8 ff a1 fb std r29,-24(r1) c0000000004e3918: 78 23 9c 7c mr r28,r4 c0000000004e391c: 78 1b 7d 7c mr r29,r3 c0000000004e3920: f8 ff e1 fb std r31,-8(r1) c0000000004e3924: f0 ff c1 fb std r30,-16(r1) c0000000004e3928: c1 ff 21 f8 stdu r1,-64(r1) # ^ save regs and create stack frame c0000000004e392c: 28 00 c3 eb ld r30,40(r3) # r30 = folio->private (0000000000000009) c0000000004e3930: 78 f3 df 7f mr r31,r30 # r31 = folio->private = head = bh c0000000004e3934: 18 00 00 48 b c0000000004e394c <drop_buffers.constprop.0+0x4c> -> c0000000004e3938: 00 00 00 60 nop c0000000004e393c: 00 00 42 60 ori r2,r2,0 c0000000004e3940: 08 00 ff eb ld r31,8(r31) c0000000004e3944: 40 f8 3e 7c cmpld r30,r31 c0000000004e3948: 48 00 82 41 beq c0000000004e3990 <drop_buffers.constprop.0+0x90> c0000000004e394c: 60 00 5f 81 lwz r10,96(r31) # r10 = bh->b_count $ ./scripts/faddr2line .build/vmlinux drop_buffers.constprop.0+0x4c drop_buffers.constprop.0+0x4c/0x170: arch_atomic_read at arch/powerpc/include/asm/atomic.h:30 (inlined by) atomic_read at include/linux/atomic/atomic-instrumented.h:28 (inlined by) buffer_busy at fs/buffer.c:2859 (inlined by) drop_buffers at fs/buffer.c:2871 static inline int buffer_busy(struct buffer_head *bh) { return atomic_read(&bh->b_count) | (bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock))); } struct folio { union { struct { long unsigned int flags; /* 0 8 */ union { struct list_head lru; /* 8 16 */ struct { void * __filler; /* 8 8 */ unsigned int mlock_count; /* 16 4 */ }; /* 8 16 */ }; /* 8 16 */ struct address_space * mapping; /* 24 8 */ long unsigned int index; /* 32 8 */ void * private; /* 40 8 */ <---- struct buffer_head { long unsigned int b_state; /* 0 8 */ struct buffer_head * b_this_page; /* 8 8 */ struct page * b_page; /* 16 8 */ sector_t b_blocknr; /* 24 8 */ size_t b_size; /* 32 8 */ char * b_data; /* 40 8 */ struct block_device * b_bdev; /* 48 8 */ bh_end_io_t * b_end_io; /* 56 8 */ void * b_private; /* 64 8 */ struct list_head b_assoc_buffers; /* 72 16 */ struct address_space * b_assoc_map; /* 88 8 */ atomic_t b_count; /* 96 4 */ <---- The buffer_head comes from folio_buffers(folio): static bool drop_buffers(struct folio *folio, struct buffer_head **buffers_to_free) { struct buffer_head *head = folio_buffers(folio); Which is == folio_get_private() r3 and r29 still hold folio = c00c00000042f1c0 That's a valid looking vmemmap address. So we have a valid folio, but its private field == 9 ? Seems like all sorts of things get stuffed into page->private, so presumably 9 is not necessarily a corrupt value, just not what we're expecting. But I'm out of my depth so over to you :) cheers ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 2022-09-30 2:01 ` Michael Ellerman @ 2022-09-30 18:59 ` Matthew Wilcox 2022-10-05 11:13 ` Michael Ellerman 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2022-09-30 18:59 UTC (permalink / raw) To: Michael Ellerman; +Cc: Zorro Lang, linux-mm, linux-ext4, linuxppc-dev On Fri, Sep 30, 2022 at 12:01:26PM +1000, Michael Ellerman wrote: > Matthew Wilcox <willy@infradead.org> writes: > >> [ 4681.238745] Instruction dump: > >> [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 > >> [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 > > > > Running that through scripts/decodecode (with some minor hacks .. how > > do PPC people do this properly?) > > We've just always used our own scripts. Mine is here: https://github.com/mpe/misc-scripts/blob/master/ppc/ppc-disasm > > I've added an issue to our tracker for us to get scripts/decodecode > working on our oopses (eventually). Would you be open to changing your oops printer to do s/Instruction dump/Code/ ? That would make it work without any other changes. $ CROSS_COMPILE=powerpc-linux-gnu- ./scripts/decodecode Code: fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 ^D gives the right answer. You could also do like x86 and put Code: on the same line as the first set of hex (not that it matters; the parser is fairly flexible). This would also work ... diff --git a/scripts/decodecode b/scripts/decodecode index c711a196511c..0cadf1a37cbf 100755 --- a/scripts/decodecode +++ b/scripts/decodecode @@ -27,8 +27,8 @@ cont= while read i ; do case "$i" in -*Code:*) - code=$i +*Code:* | *'Instruction dump':*) + code=${i##*:} cont=yes ;; *) @@ -51,7 +51,7 @@ if [ -z "$code" ]; then fi echo $code -code=`echo $code | sed -e 's/.*Code: //'` +code=`echo $code` width=`expr index "$code" ' '` width=$((($width-1)/2)) (no, i don't know why i need that echo $code line; trimming trailing spaces, maybe? shell is a terrible language) > $ ./scripts/faddr2line .build/vmlinux drop_buffers.constprop.0+0x4c > drop_buffers.constprop.0+0x4c/0x170: > arch_atomic_read at arch/powerpc/include/asm/atomic.h:30 > (inlined by) atomic_read at include/linux/atomic/atomic-instrumented.h:28 > (inlined by) buffer_busy at fs/buffer.c:2859 > (inlined by) drop_buffers at fs/buffer.c:2871 > > static inline int buffer_busy(struct buffer_head *bh) > { > return atomic_read(&bh->b_count) | > (bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock))); > } > > struct folio { > union { > struct { > long unsigned int flags; /* 0 8 */ > union { > struct list_head lru; /* 8 16 */ > struct { > void * __filler; /* 8 8 */ > unsigned int mlock_count; /* 16 4 */ > }; /* 8 16 */ > }; /* 8 16 */ > struct address_space * mapping; /* 24 8 */ > long unsigned int index; /* 32 8 */ > void * private; /* 40 8 */ <---- > > struct buffer_head { > long unsigned int b_state; /* 0 8 */ > struct buffer_head * b_this_page; /* 8 8 */ > struct page * b_page; /* 16 8 */ > sector_t b_blocknr; /* 24 8 */ > size_t b_size; /* 32 8 */ > char * b_data; /* 40 8 */ > struct block_device * b_bdev; /* 48 8 */ > bh_end_io_t * b_end_io; /* 56 8 */ > void * b_private; /* 64 8 */ > struct list_head b_assoc_buffers; /* 72 16 */ > struct address_space * b_assoc_map; /* 88 8 */ > atomic_t b_count; /* 96 4 */ <---- > > The buffer_head comes from folio_buffers(folio): > > static bool > drop_buffers(struct folio *folio, struct buffer_head **buffers_to_free) > { > struct buffer_head *head = folio_buffers(folio); > > Which is == folio_get_private() > > r3 and r29 still hold folio = c00c00000042f1c0 > > That's a valid looking vmemmap address. > > So we have a valid folio, but its private field == 9 ? > > Seems like all sorts of things get stuffed into page->private, so > presumably 9 is not necessarily a corrupt value, just not what we're > expecting. But I'm out of my depth so over to you :) Yes, all kinds of things do get stuffed into folio->private, alas. However, for an ext4 folio, it should either be NULL or a pointer to a buffer_head. It'd be interesting to insert ... if ((long)head < 4096) dump_page(&folio->page, "bad bh"); in drop_buffers() before we actually dereference the 'head'. My suspicion is that page->private and PagePrivate have got out of sync somehow; we're trying to reclaim the PG_private bit and there have been some similar problems of this type in the past. I had success debugging this kind of problem with this patch: commit 80eba374eab3 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Tue Jun 21 07:04:32 2022 -0400 mm: Add an assertion that PG_private and folio->private are in sync We are trying to eliminate the use of the PG_private flag. To do so, it must be in sync with the use of the ->private field. It usually is, and this assert should catch any cases where it isn't. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> diff --git a/mm/filemap.c b/mm/filemap.c index 15800334147b..2f26c32ea1cd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1529,6 +1529,9 @@ void folio_unlock(struct folio *folio) BUILD_BUG_ON(PG_waiters != 7); BUILD_BUG_ON(PG_locked > 7); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio) && + (folio_test_private(folio) == + !folio_get_private(folio)), folio); if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0))) folio_wake_bit(folio, PG_locked); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 2022-09-30 18:59 ` Matthew Wilcox @ 2022-10-05 11:13 ` Michael Ellerman 0 siblings, 0 replies; 5+ messages in thread From: Michael Ellerman @ 2022-10-05 11:13 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Zorro Lang, linux-mm, linux-ext4, linuxppc-dev Matthew Wilcox <willy@infradead.org> writes: > On Fri, Sep 30, 2022 at 12:01:26PM +1000, Michael Ellerman wrote: >> Matthew Wilcox <willy@infradead.org> writes: >> >> [ 4681.238745] Instruction dump: >> >> [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 >> >> [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 >> > >> > Running that through scripts/decodecode (with some minor hacks .. how >> > do PPC people do this properly?) >> >> We've just always used our own scripts. Mine is here: https://github.com/mpe/misc-scripts/blob/master/ppc/ppc-disasm >> >> I've added an issue to our tracker for us to get scripts/decodecode >> working on our oopses (eventually). > > Would you be open to changing your oops printer to do > s/Instruction dump/Code/ ? That would make it work without any other > changes. Yeah, we're the only arch that uses "Instruction dump". For userspace instructions we already print "code". I'll send a patch switching to "Code:". cheers ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-10-05 11:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-27 1:17 [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Zorro Lang 2022-09-29 21:28 ` Matthew Wilcox 2022-09-30 2:01 ` Michael Ellerman 2022-09-30 18:59 ` Matthew Wilcox 2022-10-05 11:13 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).