All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zorro Lang <zlang@kernel.org>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-ext4@vger.kernel.org
Subject: Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0
Date: Thu, 29 Sep 2022 22:28:00 +0100	[thread overview]
Message-ID: <YzYN4JqbKdxLd6oA@casper.infradead.org> (raw)
In-Reply-To: <20220927011720.7jmugevxc7ax26qw@zlang-mailbox>

On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote:
> Hi mm and ppc list,
> 
> Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k
> blocksize* ext4. It's not easy to reproduce, but still has chance to trigger
> by loop running generic/048 on ppc64le (not sure all kind of ppc64le can
> reproduce it).
> 
> Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only
> hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more
> likes folio related issue, so I cc mm and ppc mail list, hope to get more
> reviewing.

Argh.  This is the wrong way to do it.  Please stop using bugzilla.
Now there's discussion in two places and there's nowhere to see all
of it.

> [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 
> [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c 
> [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] 
> [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
> [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 
> [ 4681.230999] NIP:  c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 
> [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380   Not tainted  (6.0.0-rc6+) 
> [ 4681.238532] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028242  XER: 00000000 
> [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0  
> [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0  
> [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000  
> [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000  
> [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480  
> [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000  
> [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100  
> [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8  
> [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009  
> [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 
> [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 
> [ 4681.238650] Call Trace: 
> [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) 
> [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 
> [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 
> [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 
> [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 
> [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 
> [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 
> [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 
> [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 
> [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 
> [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 
> [ 4681.238745] Instruction dump: 
> [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000  
> [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378  

Running that through scripts/decodecode (with some minor hacks .. how
do PPC people do this properly?) I get:

   0:	fb c1 ff f0 	std     r30,-16(r1)
   4:	f8 21 ff c1 	stdu    r1,-64(r1)
   8:	7c 7d 1b 78 	mr      r29,r3
   c:	7c 9c 23 78 	mr      r28,r4
  10:	eb c3 00 28 	ld      r30,40(r3)
  14:	7f df f3 78 	mr      r31,r30
  18:	48 00 00 18 	b       0x30
  1c:	60 00 00 00 	nop
  20:	60 00 00 00 	nop
  24:	eb ff 00 08 	ld      r31,8(r31)
  28:	7c 3e f8 40 	cmpld   r30,r31
  2c:	41 82 00 48 	beq     0x74
  30:*	81 5f 00 60 	lwz     r10,96(r31)		<-- trapping instruction
  34:	e9 3f 00 00 	ld      r9,0(r31)
  38:	55 29 07 7c 	rlwinm  r9,r9,0,29,30
  3c:	7d 29 53 78 	or      r9,r9,r10

That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving
us an effective address of 0x69.

It would be nice to know what source line that corresponds to.  Could
you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0
into a line number?  I can't because it needs the vmlinux you generated.

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: Zorro Lang <zlang@kernel.org>
Cc: linux-mm@kvack.org, linux-ext4@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0
Date: Thu, 29 Sep 2022 22:28:00 +0100	[thread overview]
Message-ID: <YzYN4JqbKdxLd6oA@casper.infradead.org> (raw)
In-Reply-To: <20220927011720.7jmugevxc7ax26qw@zlang-mailbox>

On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote:
> Hi mm and ppc list,
> 
> Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k
> blocksize* ext4. It's not easy to reproduce, but still has chance to trigger
> by loop running generic/048 on ppc64le (not sure all kind of ppc64le can
> reproduce it).
> 
> Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only
> hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more
> likes folio related issue, so I cc mm and ppc mail list, hope to get more
> reviewing.

Argh.  This is the wrong way to do it.  Please stop using bugzilla.
Now there's discussion in two places and there's nowhere to see all
of it.

> [ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 
> [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c 
> [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] 
> [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
> [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 
> [ 4681.230999] NIP:  c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 
> [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380   Not tainted  (6.0.0-rc6+) 
> [ 4681.238532] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028242  XER: 00000000 
> [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0  
> [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0  
> [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000  
> [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000  
> [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480  
> [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000  
> [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100  
> [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8  
> [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009  
> [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 
> [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 
> [ 4681.238650] Call Trace: 
> [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) 
> [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 
> [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 
> [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 
> [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 
> [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 
> [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 
> [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 
> [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 
> [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 
> [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 
> [ 4681.238745] Instruction dump: 
> [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000  
> [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378  

Running that through scripts/decodecode (with some minor hacks .. how
do PPC people do this properly?) I get:

   0:	fb c1 ff f0 	std     r30,-16(r1)
   4:	f8 21 ff c1 	stdu    r1,-64(r1)
   8:	7c 7d 1b 78 	mr      r29,r3
   c:	7c 9c 23 78 	mr      r28,r4
  10:	eb c3 00 28 	ld      r30,40(r3)
  14:	7f df f3 78 	mr      r31,r30
  18:	48 00 00 18 	b       0x30
  1c:	60 00 00 00 	nop
  20:	60 00 00 00 	nop
  24:	eb ff 00 08 	ld      r31,8(r31)
  28:	7c 3e f8 40 	cmpld   r30,r31
  2c:	41 82 00 48 	beq     0x74
  30:*	81 5f 00 60 	lwz     r10,96(r31)		<-- trapping instruction
  34:	e9 3f 00 00 	ld      r9,0(r31)
  38:	55 29 07 7c 	rlwinm  r9,r9,0,29,30
  3c:	7d 29 53 78 	or      r9,r9,r10

That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving
us an effective address of 0x69.

It would be nice to know what source line that corresponds to.  Could
you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0
into a line number?  I can't because it needs the vmlinux you generated.

  reply	other threads:[~2022-09-29 21:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27  1:17 [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0 Zorro Lang
2022-09-27  1:17 ` Zorro Lang
2022-09-29 21:28 ` Matthew Wilcox [this message]
2022-09-29 21:28   ` Matthew Wilcox
2022-09-30  2:01   ` Michael Ellerman
2022-09-30 18:59     ` Matthew Wilcox
2022-09-30 18:59       ` Matthew Wilcox
2022-10-05 11:13       ` Michael Ellerman
2022-10-05 11:13         ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzYN4JqbKdxLd6oA@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=zlang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.