All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix use-after-free in xattr node block inactivation
@ 2022-07-07 22:21 Darrick J. Wong
  2022-07-08  4:26 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2022-07-07 22:21 UTC (permalink / raw)
  To: xfs; +Cc: hch, oliver.sang

From: Darrick J. Wong <djwong@kernel.org>

The kernel build robot reported a UAF error while running xfs/433
(edited somewhat for brevity):

 BUG: KASAN: use-after-free in xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs
 Read of size 4 at addr ffff88820ac2bd44 by task kworker/0:2/139

 CPU: 0 PID: 139 Comm: kworker/0:2 Tainted: G S                5.19.0-rc2-00004-g7cf2b0f9611b #1
 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
 Workqueue: xfs-inodegc/sdb4 xfs_inodegc_worker [xfs]
 Call Trace:
  <TASK>
 dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
 print_address_description+0x1f/0x200
 print_report.cold (mm/kasan/report.c:430)
 kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493)
 xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs
 xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs
 xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
 xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
 xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
 process_one_work
 worker_thread
 kthread
 ret_from_fork
  </TASK>

 Allocated by task 139:
 kasan_save_stack (mm/kasan/common.c:39)
 __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469)
 kmem_cache_alloc (mm/slab.h:750 mm/slub.c:3214 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239)
 _xfs_buf_alloc (include/linux/instrumented.h:86 include/linux/atomic/atomic-instrumented.h:41 fs/xfs/xfs_buf.c:232) xfs
 xfs_buf_get_map (fs/xfs/xfs_buf.c:660) xfs
 xfs_buf_read_map (fs/xfs/xfs_buf.c:777) xfs
 xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289) xfs
 xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2652) xfs
 xfs_da3_node_read (fs/xfs/libxfs/xfs_da_btree.c:392) xfs
 xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:272) xfs
 xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
 xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
 xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
 process_one_work
 worker_thread
 kthread
 ret_from_fork

 Freed by task 139:
 kasan_save_stack (mm/kasan/common.c:39)
 kasan_set_track (mm/kasan/common.c:45)
 kasan_set_free_info (mm/kasan/generic.c:372)
 __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374)
 kmem_cache_free (mm/slub.c:1753 mm/slub.c:3507 mm/slub.c:3524)
 xfs_buf_rele (fs/xfs/xfs_buf.c:1040) xfs
 xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:210) xfs
 xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs
 xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
 xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
 xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
 process_one_work
 worker_thread
 kthread
 ret_from_fork

I reproduced this for my own satisfaction, and got the same report,
along with an extra morsel:

 The buggy address belongs to the object at ffff88802103a800
  which belongs to the cache xfs_buf of size 432
 The buggy address is located 396 bytes inside of
  432-byte region [ffff88802103a800, ffff88802103a9b0)

I tracked this code down to:

	error = xfs_trans_get_buf(*trans, mp->m_ddev_targp,
			child_blkno,
			XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0,
			&child_bp);
	if (error)
		return error;
	error = bp->b_error;

That doesn't look right -- I think this should be dereferencing
child_bp, not bp.  Looking through the codebase history, I think this
was added by commit 2911edb653b9 ("xfs: remove the mappedbno argument to
xfs_da_get_buf"), which replaced a call to xfs_da_get_buf with the
current call to xfs_trans_get_buf.  Not sure why we trans_brelse'd @bp
earlier in the function, but I'm guessing it's to avoid pinning too many
buffers in memory while we inactivate the bottom of the attr tree.
Hence we now have to get the buffer back.

I /think/ this was supposed to check child_bp->b_error and fail the rest
of the invalidation if child_bp had experienced any kind of IO or
corruption error.  I bet the xfs_da3_node_read earlier in the loop will
catch most cases of incoming on-disk corruption which makes this check
mostly moot unless someone corrupts the buffer and the AIL or someone
happens to try to push it out to disk while the buffer's unlocked.

However, this is clearly a UAF bug, so fix this.

Cc: hch@lst.de
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 2911edb653b9 ("xfs: remove the mappedbno argument to xfs_da_get_buf")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_inactive.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 0e83cab9cdde..e892040eb86b 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -158,6 +158,7 @@ xfs_attr3_node_inactive(
 	}
 	child_fsb = be32_to_cpu(ichdr.btree[0].before);
 	xfs_trans_brelse(*trans, bp);	/* no locks for later trans */
+	bp = NULL;
 
 	/*
 	 * If this is the node level just above the leaves, simply loop
@@ -211,7 +212,7 @@ xfs_attr3_node_inactive(
 				&child_bp);
 		if (error)
 			return error;
-		error = bp->b_error;
+		error = child_bp->b_error;
 		if (error) {
 			xfs_trans_brelse(*trans, child_bp);
 			return error;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation
  2022-07-07 22:21 [PATCH] xfs: fix use-after-free in xattr node block inactivation Darrick J. Wong
@ 2022-07-08  4:26 ` Dave Chinner
  2022-07-08  4:59   ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2022-07-08  4:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, hch, oliver.sang

On Thu, Jul 07, 2022 at 03:21:25PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The kernel build robot reported a UAF error while running xfs/433
> (edited somewhat for brevity):
> 
>  BUG: KASAN: use-after-free in xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs
>  Read of size 4 at addr ffff88820ac2bd44 by task kworker/0:2/139
> 
>  CPU: 0 PID: 139 Comm: kworker/0:2 Tainted: G S                5.19.0-rc2-00004-g7cf2b0f9611b #1
>  Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
>  Workqueue: xfs-inodegc/sdb4 xfs_inodegc_worker [xfs]
>  Call Trace:
>   <TASK>
>  dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
>  print_address_description+0x1f/0x200
>  print_report.cold (mm/kasan/report.c:430)
>  kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493)
>  xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs
>  xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs
>  xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
>  xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
>  xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
>  process_one_work
>  worker_thread
>  kthread
>  ret_from_fork
>   </TASK>
> 
>  Allocated by task 139:
>  kasan_save_stack (mm/kasan/common.c:39)
>  __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469)
>  kmem_cache_alloc (mm/slab.h:750 mm/slub.c:3214 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239)
>  _xfs_buf_alloc (include/linux/instrumented.h:86 include/linux/atomic/atomic-instrumented.h:41 fs/xfs/xfs_buf.c:232) xfs
>  xfs_buf_get_map (fs/xfs/xfs_buf.c:660) xfs
>  xfs_buf_read_map (fs/xfs/xfs_buf.c:777) xfs
>  xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289) xfs
>  xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2652) xfs
>  xfs_da3_node_read (fs/xfs/libxfs/xfs_da_btree.c:392) xfs
>  xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:272) xfs
>  xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
>  xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
>  xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
>  process_one_work
>  worker_thread
>  kthread
>  ret_from_fork
> 
>  Freed by task 139:
>  kasan_save_stack (mm/kasan/common.c:39)
>  kasan_set_track (mm/kasan/common.c:45)
>  kasan_set_free_info (mm/kasan/generic.c:372)
>  __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374)
>  kmem_cache_free (mm/slub.c:1753 mm/slub.c:3507 mm/slub.c:3524)
>  xfs_buf_rele (fs/xfs/xfs_buf.c:1040) xfs
>  xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:210) xfs
>  xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs
>  xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs
>  xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs
>  xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs
>  process_one_work
>  worker_thread
>  kthread
>  ret_from_fork
> 
> I reproduced this for my own satisfaction, and got the same report,
> along with an extra morsel:
> 
>  The buggy address belongs to the object at ffff88802103a800
>   which belongs to the cache xfs_buf of size 432
>  The buggy address is located 396 bytes inside of
>   432-byte region [ffff88802103a800, ffff88802103a9b0)
> 
> I tracked this code down to:
> 
> 	error = xfs_trans_get_buf(*trans, mp->m_ddev_targp,
> 			child_blkno,
> 			XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0,
> 			&child_bp);
> 	if (error)
> 		return error;
> 	error = bp->b_error;
> 
> That doesn't look right -- I think this should be dereferencing
> child_bp, not bp.

It shouldn't even be there. If xfs_trans_get_buf() returns a buffer,
it should not have a pending error on it at all. i.e. it's supposed
to return either an error or a buffer handle that is ready for use.

> Looking through the codebase history, I think this
> was added by commit 2911edb653b9 ("xfs: remove the mappedbno argument to
> xfs_da_get_buf"), which replaced a call to xfs_da_get_buf with the
> current call to xfs_trans_get_buf.  Not sure why we trans_brelse'd @bp
> earlier in the function, but I'm guessing it's to avoid pinning too many
> buffers in memory while we inactivate the bottom of the attr tree.
> Hence we now have to get the buffer back.

That whole loop just looks wrong.

WE do:

xfs_attr3_node_inactive(bp)
{

	parent_blkno = xfs_buf_daddr(bp);
	....
	child_fsb = be32_to_cpu(ichdr.btree[0].before);
	xfs_trans_brelse(*trans, bp);   /* no locks for later trans */

	for (0 < i < ichdr.count) {
		xfs_da3_node_read_mapped(child_fsb, &child_bp)

		child_blkno = xfs_buf_daddr(child_bp);

		if (child is node) {
			/* recurse! */
			xfs_attr3_node_inactive(child_bp);
			/* released child_bp */
		}
		....

		/* re-get child_bp */
		xfs_trans_get_buf(child_blkno, &child_bp);

		/* whacky bp reference here */

		xfs_trans_binval(child_bp)

		/* read next child_bp */
		xfs_da3_node_read_mapped(parent_blkno, &bp)
		child_fsb = be32_to_cpu(phdr.btree[i + 1].before);
		xfs_trans_brelse(*trans, bp);
	}

So, this is dropping parent buffers so they aren't held over
the recursion down the sub tree. I can't see why this would be done for
lock ordering reasons - parent->child is the only ordering that
happens in this recursion, and only the direct ancestors of any
given child node are locked at any point in time, which is the same
a happens with a path walk....

Oh, right.

It's the fact that it can only invalidate a single child buffer
at a time and it will only do one per transaction. Hence it needs to
do a re-read of the parent buffer rather than
xfs_trans_bhold() it because the code is depth-first recursive and
so it has no idea of how deep the child will need to go (or, for
that matter, how deep we already are).

Whoever wrote this didn't, for some reason, use the da btree path
tracking (i.e. a struct xfs_da_state) to keep track of all the
parent buffers of the current child being invalidated. That would
make this code a whole lot simpler and neater....

Ugh. Nasty, nasty code.

> I /think/ this was supposed to check child_bp->b_error and fail the rest
> of the invalidation if child_bp had experienced any kind of IO or
> corruption error.  I bet the xfs_da3_node_read earlier in the loop will
> catch most cases of incoming on-disk corruption which makes this check
> mostly moot unless someone corrupts the buffer and the AIL or someone
> happens to try to push it out to disk while the buffer's unlocked.

In which case, I think we probably already should have shut down and
errored out.

> However, this is clearly a UAF bug, so fix this.
> 
> Cc: hch@lst.de
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Fixes: 2911edb653b9 ("xfs: remove the mappedbno argument to xfs_da_get_buf")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_attr_inactive.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index 0e83cab9cdde..e892040eb86b 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -158,6 +158,7 @@ xfs_attr3_node_inactive(
>  	}
>  	child_fsb = be32_to_cpu(ichdr.btree[0].before);
>  	xfs_trans_brelse(*trans, bp);	/* no locks for later trans */
> +	bp = NULL;
>  
>  	/*
>  	 * If this is the node level just above the leaves, simply loop
> @@ -211,7 +212,7 @@ xfs_attr3_node_inactive(
>  				&child_bp);
>  		if (error)
>  			return error;
> -		error = bp->b_error;
> +		error = child_bp->b_error;
>  		if (error) {
>  			xfs_trans_brelse(*trans, child_bp);
>  			return error;

I'd just remove the child_bp error checking altogether - if there
was an IOi or corruption error on it, that shouldn't keep us from
invalidating it to free the underlying space. We're trashing the
contents, so who cares if the contents is already trashed?

Also, you probably need to set bp = NULL after the
xfs_trans_brelse() call at the bottom of the loop....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation
  2022-07-08  4:26 ` Dave Chinner
@ 2022-07-08  4:59   ` Christoph Hellwig
  2022-07-08 15:45     ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2022-07-08  4:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, xfs, hch, oliver.sang

On Fri, Jul 08, 2022 at 02:26:53PM +1000, Dave Chinner wrote:
> > 			child_blkno,
> > 			XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0,
> > 			&child_bp);
> > 	if (error)
> > 		return error;
> > 	error = bp->b_error;
> > 
> > That doesn't look right -- I think this should be dereferencing
> > child_bp, not bp.
> 
> It shouldn't even be there. If xfs_trans_get_buf() returns a buffer,
> it should not have a pending error on it at all. i.e. it's supposed
> to return either an error or a buffer handle that is ready for use.

Agreed. Consumers of the buffer cache API should never look at b_error
because they will not see buffers with b_error set at all.

> Whoever wrote this didn't, for some reason, use the da btree path
> tracking (i.e. a struct xfs_da_state) to keep track of all the
> parent buffers of the current child being invalidated. That would
> make this code a whole lot simpler and neater....

Yeah.  The brelese seems to go back to:

commit 677821a1ab2301629aa0370835babb33bc6c919e
Author: Doug Doucette <doucette@engr.sgi.com>
Date:   Fri Dec 6 22:05:46 1996 +0000

    Fold in ficus changes not yet merged in:
    revision 1.32
    date: 1996/11/21 23:31:08;  author: doucette;  state: Exp;  lines: +69 -205
    Rewrite inactive attribute code to avoid freeing any of the data blocks
    until the very end.  We still walk the on-disk structure, but just
    call xfs_trans_binval on the buffers we get.  Then we call the truncate
    code to get rid of the data blocks.  This means we don't need a block
    reservation.

and the loop іtself is even older.  But the da_state had been around
since 1996, so that isn't really an excuse.

> > +		error = child_bp->b_error;
> >  		if (error) {
> >  			xfs_trans_brelse(*trans, child_bp);
> >  			return error;
> 
> I'd just remove the child_bp error checking altogether - if there
> was an IOi or corruption error on it, that shouldn't keep us from
> invalidating it to free the underlying space. We're trashing the
> contents, so who cares if the contents is already trashed?

Yeah.  I also don't see how a b_error could even magically appear
here without xfs_trans_get_buf returning an error first.

> Also, you probably need to set bp = NULL after the
> xfs_trans_brelse() call at the bottom of the loop....

Yes.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation
  2022-07-08  4:59   ` Christoph Hellwig
@ 2022-07-08 15:45     ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2022-07-08 15:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, xfs, oliver.sang

On Fri, Jul 08, 2022 at 06:59:21AM +0200, Christoph Hellwig wrote:
> On Fri, Jul 08, 2022 at 02:26:53PM +1000, Dave Chinner wrote:
> > > 			child_blkno,
> > > 			XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0,
> > > 			&child_bp);
> > > 	if (error)
> > > 		return error;
> > > 	error = bp->b_error;
> > > 
> > > That doesn't look right -- I think this should be dereferencing
> > > child_bp, not bp.
> > 
> > It shouldn't even be there. If xfs_trans_get_buf() returns a buffer,
> > it should not have a pending error on it at all. i.e. it's supposed
> > to return either an error or a buffer handle that is ready for use.

Ah, right, because flags == 0, so we clear b_error anyway.

> Agreed. Consumers of the buffer cache API should never look at b_error
> because they will not see buffers with b_error set at all.

<nod> Thinking about this further -- if the earlier da3_node_read
reads a corrupt buffer off disk, it'll exit early, and if the AIL
happens to push the buffer and the write verifier fails, the log will be
dead, so there's no value added by probing bp->b_error.  I'll remove the
whole if block entirely.

> > Whoever wrote this didn't, for some reason, use the da btree path
> > tracking (i.e. a struct xfs_da_state) to keep track of all the
> > parent buffers of the current child being invalidated. That would
> > make this code a whole lot simpler and neater....
> 
> Yeah.  The brelese seems to go back to:
> 
> commit 677821a1ab2301629aa0370835babb33bc6c919e
> Author: Doug Doucette <doucette@engr.sgi.com>
> Date:   Fri Dec 6 22:05:46 1996 +0000
> 
>     Fold in ficus changes not yet merged in:
>     revision 1.32
>     date: 1996/11/21 23:31:08;  author: doucette;  state: Exp;  lines: +69 -205
>     Rewrite inactive attribute code to avoid freeing any of the data blocks
>     until the very end.  We still walk the on-disk structure, but just
>     call xfs_trans_binval on the buffers we get.  Then we call the truncate
>     code to get rid of the data blocks.  This means we don't need a block
>     reservation.
> 
> and the loop іtself is even older.  But the da_state had been around
> since 1996, so that isn't really an excuse.
> 
> > > +		error = child_bp->b_error;
> > >  		if (error) {
> > >  			xfs_trans_brelse(*trans, child_bp);
> > >  			return error;
> > 
> > I'd just remove the child_bp error checking altogether - if there
> > was an IOi or corruption error on it, that shouldn't keep us from
> > invalidating it to free the underlying space. We're trashing the
> > contents, so who cares if the contents is already trashed?
> 
> Yeah.  I also don't see how a b_error could even magically appear
> here without xfs_trans_get_buf returning an error first.

xfs_trans_get_buf doesn't check b_error on the buffer it returns.  I
think only the _read_buf variants actually look at that.

> > Also, you probably need to set bp = NULL after the
> > xfs_trans_brelse() call at the bottom of the loop....
> 
> Yes.

Done.

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-08 15:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 22:21 [PATCH] xfs: fix use-after-free in xattr node block inactivation Darrick J. Wong
2022-07-08  4:26 ` Dave Chinner
2022-07-08  4:59   ` Christoph Hellwig
2022-07-08 15:45     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.