* [PATCH] xfs: fix use-after-free in xattr node block inactivation @ 2022-07-07 22:21 Darrick J. Wong 2022-07-08 4:26 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Darrick J. Wong @ 2022-07-07 22:21 UTC (permalink / raw) To: xfs; +Cc: hch, oliver.sang From: Darrick J. Wong <djwong@kernel.org> The kernel build robot reported a UAF error while running xfs/433 (edited somewhat for brevity): BUG: KASAN: use-after-free in xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs Read of size 4 at addr ffff88820ac2bd44 by task kworker/0:2/139 CPU: 0 PID: 139 Comm: kworker/0:2 Tainted: G S 5.19.0-rc2-00004-g7cf2b0f9611b #1 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 Workqueue: xfs-inodegc/sdb4 xfs_inodegc_worker [xfs] Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) print_address_description+0x1f/0x200 print_report.cold (mm/kasan/report.c:430) kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork </TASK> Allocated by task 139: kasan_save_stack (mm/kasan/common.c:39) __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) kmem_cache_alloc (mm/slab.h:750 mm/slub.c:3214 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239) _xfs_buf_alloc (include/linux/instrumented.h:86 include/linux/atomic/atomic-instrumented.h:41 fs/xfs/xfs_buf.c:232) xfs xfs_buf_get_map (fs/xfs/xfs_buf.c:660) xfs xfs_buf_read_map (fs/xfs/xfs_buf.c:777) xfs xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289) xfs xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2652) xfs xfs_da3_node_read (fs/xfs/libxfs/xfs_da_btree.c:392) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:272) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork Freed by task 139: kasan_save_stack (mm/kasan/common.c:39) kasan_set_track (mm/kasan/common.c:45) kasan_set_free_info (mm/kasan/generic.c:372) __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374) kmem_cache_free (mm/slub.c:1753 mm/slub.c:3507 mm/slub.c:3524) xfs_buf_rele (fs/xfs/xfs_buf.c:1040) xfs xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:210) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork I reproduced this for my own satisfaction, and got the same report, along with an extra morsel: The buggy address belongs to the object at ffff88802103a800 which belongs to the cache xfs_buf of size 432 The buggy address is located 396 bytes inside of 432-byte region [ffff88802103a800, ffff88802103a9b0) I tracked this code down to: error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, child_blkno, XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, &child_bp); if (error) return error; error = bp->b_error; That doesn't look right -- I think this should be dereferencing child_bp, not bp. Looking through the codebase history, I think this was added by commit 2911edb653b9 ("xfs: remove the mappedbno argument to xfs_da_get_buf"), which replaced a call to xfs_da_get_buf with the current call to xfs_trans_get_buf. Not sure why we trans_brelse'd @bp earlier in the function, but I'm guessing it's to avoid pinning too many buffers in memory while we inactivate the bottom of the attr tree. Hence we now have to get the buffer back. I /think/ this was supposed to check child_bp->b_error and fail the rest of the invalidation if child_bp had experienced any kind of IO or corruption error. I bet the xfs_da3_node_read earlier in the loop will catch most cases of incoming on-disk corruption which makes this check mostly moot unless someone corrupts the buffer and the AIL or someone happens to try to push it out to disk while the buffer's unlocked. However, this is clearly a UAF bug, so fix this. Cc: hch@lst.de Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 2911edb653b9 ("xfs: remove the mappedbno argument to xfs_da_get_buf") Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_attr_inactive.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 0e83cab9cdde..e892040eb86b 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -158,6 +158,7 @@ xfs_attr3_node_inactive( } child_fsb = be32_to_cpu(ichdr.btree[0].before); xfs_trans_brelse(*trans, bp); /* no locks for later trans */ + bp = NULL; /* * If this is the node level just above the leaves, simply loop @@ -211,7 +212,7 @@ xfs_attr3_node_inactive( &child_bp); if (error) return error; - error = bp->b_error; + error = child_bp->b_error; if (error) { xfs_trans_brelse(*trans, child_bp); return error; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation 2022-07-07 22:21 [PATCH] xfs: fix use-after-free in xattr node block inactivation Darrick J. Wong @ 2022-07-08 4:26 ` Dave Chinner 2022-07-08 4:59 ` Christoph Hellwig 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2022-07-08 4:26 UTC (permalink / raw) To: Darrick J. Wong; +Cc: xfs, hch, oliver.sang On Thu, Jul 07, 2022 at 03:21:25PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > The kernel build robot reported a UAF error while running xfs/433 > (edited somewhat for brevity): > > BUG: KASAN: use-after-free in xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs > Read of size 4 at addr ffff88820ac2bd44 by task kworker/0:2/139 > > CPU: 0 PID: 139 Comm: kworker/0:2 Tainted: G S 5.19.0-rc2-00004-g7cf2b0f9611b #1 > Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 > Workqueue: xfs-inodegc/sdb4 xfs_inodegc_worker [xfs] > Call Trace: > <TASK> > dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) > print_address_description+0x1f/0x200 > print_report.cold (mm/kasan/report.c:430) > kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) > xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs > xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs > xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs > xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs > xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs > process_one_work > worker_thread > kthread > ret_from_fork > </TASK> > > Allocated by task 139: > kasan_save_stack (mm/kasan/common.c:39) > __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) > kmem_cache_alloc (mm/slab.h:750 mm/slub.c:3214 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239) > _xfs_buf_alloc (include/linux/instrumented.h:86 include/linux/atomic/atomic-instrumented.h:41 fs/xfs/xfs_buf.c:232) xfs > xfs_buf_get_map (fs/xfs/xfs_buf.c:660) xfs > xfs_buf_read_map (fs/xfs/xfs_buf.c:777) xfs > xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289) xfs > xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2652) xfs > xfs_da3_node_read (fs/xfs/libxfs/xfs_da_btree.c:392) xfs > xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:272) xfs > xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs > xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs > xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs > process_one_work > worker_thread > kthread > ret_from_fork > > Freed by task 139: > kasan_save_stack (mm/kasan/common.c:39) > kasan_set_track (mm/kasan/common.c:45) > kasan_set_free_info (mm/kasan/generic.c:372) > __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374) > kmem_cache_free (mm/slub.c:1753 mm/slub.c:3507 mm/slub.c:3524) > xfs_buf_rele (fs/xfs/xfs_buf.c:1040) xfs > xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:210) xfs > xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs > xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs > xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs > xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs > process_one_work > worker_thread > kthread > ret_from_fork > > I reproduced this for my own satisfaction, and got the same report, > along with an extra morsel: > > The buggy address belongs to the object at ffff88802103a800 > which belongs to the cache xfs_buf of size 432 > The buggy address is located 396 bytes inside of > 432-byte region [ffff88802103a800, ffff88802103a9b0) > > I tracked this code down to: > > error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, > child_blkno, > XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, > &child_bp); > if (error) > return error; > error = bp->b_error; > > That doesn't look right -- I think this should be dereferencing > child_bp, not bp. It shouldn't even be there. If xfs_trans_get_buf() returns a buffer, it should not have a pending error on it at all. i.e. it's supposed to return either an error or a buffer handle that is ready for use. > Looking through the codebase history, I think this > was added by commit 2911edb653b9 ("xfs: remove the mappedbno argument to > xfs_da_get_buf"), which replaced a call to xfs_da_get_buf with the > current call to xfs_trans_get_buf. Not sure why we trans_brelse'd @bp > earlier in the function, but I'm guessing it's to avoid pinning too many > buffers in memory while we inactivate the bottom of the attr tree. > Hence we now have to get the buffer back. That whole loop just looks wrong. WE do: xfs_attr3_node_inactive(bp) { parent_blkno = xfs_buf_daddr(bp); .... child_fsb = be32_to_cpu(ichdr.btree[0].before); xfs_trans_brelse(*trans, bp); /* no locks for later trans */ for (0 < i < ichdr.count) { xfs_da3_node_read_mapped(child_fsb, &child_bp) child_blkno = xfs_buf_daddr(child_bp); if (child is node) { /* recurse! */ xfs_attr3_node_inactive(child_bp); /* released child_bp */ } .... /* re-get child_bp */ xfs_trans_get_buf(child_blkno, &child_bp); /* whacky bp reference here */ xfs_trans_binval(child_bp) /* read next child_bp */ xfs_da3_node_read_mapped(parent_blkno, &bp) child_fsb = be32_to_cpu(phdr.btree[i + 1].before); xfs_trans_brelse(*trans, bp); } So, this is dropping parent buffers so they aren't held over the recursion down the sub tree. I can't see why this would be done for lock ordering reasons - parent->child is the only ordering that happens in this recursion, and only the direct ancestors of any given child node are locked at any point in time, which is the same a happens with a path walk.... Oh, right. It's the fact that it can only invalidate a single child buffer at a time and it will only do one per transaction. Hence it needs to do a re-read of the parent buffer rather than xfs_trans_bhold() it because the code is depth-first recursive and so it has no idea of how deep the child will need to go (or, for that matter, how deep we already are). Whoever wrote this didn't, for some reason, use the da btree path tracking (i.e. a struct xfs_da_state) to keep track of all the parent buffers of the current child being invalidated. That would make this code a whole lot simpler and neater.... Ugh. Nasty, nasty code. > I /think/ this was supposed to check child_bp->b_error and fail the rest > of the invalidation if child_bp had experienced any kind of IO or > corruption error. I bet the xfs_da3_node_read earlier in the loop will > catch most cases of incoming on-disk corruption which makes this check > mostly moot unless someone corrupts the buffer and the AIL or someone > happens to try to push it out to disk while the buffer's unlocked. In which case, I think we probably already should have shut down and errored out. > However, this is clearly a UAF bug, so fix this. > > Cc: hch@lst.de > Reported-by: kernel test robot <oliver.sang@intel.com> > Fixes: 2911edb653b9 ("xfs: remove the mappedbno argument to xfs_da_get_buf") > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > fs/xfs/xfs_attr_inactive.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c > index 0e83cab9cdde..e892040eb86b 100644 > --- a/fs/xfs/xfs_attr_inactive.c > +++ b/fs/xfs/xfs_attr_inactive.c > @@ -158,6 +158,7 @@ xfs_attr3_node_inactive( > } > child_fsb = be32_to_cpu(ichdr.btree[0].before); > xfs_trans_brelse(*trans, bp); /* no locks for later trans */ > + bp = NULL; > > /* > * If this is the node level just above the leaves, simply loop > @@ -211,7 +212,7 @@ xfs_attr3_node_inactive( > &child_bp); > if (error) > return error; > - error = bp->b_error; > + error = child_bp->b_error; > if (error) { > xfs_trans_brelse(*trans, child_bp); > return error; I'd just remove the child_bp error checking altogether - if there was an IOi or corruption error on it, that shouldn't keep us from invalidating it to free the underlying space. We're trashing the contents, so who cares if the contents is already trashed? Also, you probably need to set bp = NULL after the xfs_trans_brelse() call at the bottom of the loop.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation 2022-07-08 4:26 ` Dave Chinner @ 2022-07-08 4:59 ` Christoph Hellwig 2022-07-08 15:45 ` Darrick J. Wong 0 siblings, 1 reply; 4+ messages in thread From: Christoph Hellwig @ 2022-07-08 4:59 UTC (permalink / raw) To: Dave Chinner; +Cc: Darrick J. Wong, xfs, hch, oliver.sang On Fri, Jul 08, 2022 at 02:26:53PM +1000, Dave Chinner wrote: > > child_blkno, > > XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, > > &child_bp); > > if (error) > > return error; > > error = bp->b_error; > > > > That doesn't look right -- I think this should be dereferencing > > child_bp, not bp. > > It shouldn't even be there. If xfs_trans_get_buf() returns a buffer, > it should not have a pending error on it at all. i.e. it's supposed > to return either an error or a buffer handle that is ready for use. Agreed. Consumers of the buffer cache API should never look at b_error because they will not see buffers with b_error set at all. > Whoever wrote this didn't, for some reason, use the da btree path > tracking (i.e. a struct xfs_da_state) to keep track of all the > parent buffers of the current child being invalidated. That would > make this code a whole lot simpler and neater.... Yeah. The brelese seems to go back to: commit 677821a1ab2301629aa0370835babb33bc6c919e Author: Doug Doucette <doucette@engr.sgi.com> Date: Fri Dec 6 22:05:46 1996 +0000 Fold in ficus changes not yet merged in: revision 1.32 date: 1996/11/21 23:31:08; author: doucette; state: Exp; lines: +69 -205 Rewrite inactive attribute code to avoid freeing any of the data blocks until the very end. We still walk the on-disk structure, but just call xfs_trans_binval on the buffers we get. Then we call the truncate code to get rid of the data blocks. This means we don't need a block reservation. and the loop іtself is even older. But the da_state had been around since 1996, so that isn't really an excuse. > > + error = child_bp->b_error; > > if (error) { > > xfs_trans_brelse(*trans, child_bp); > > return error; > > I'd just remove the child_bp error checking altogether - if there > was an IOi or corruption error on it, that shouldn't keep us from > invalidating it to free the underlying space. We're trashing the > contents, so who cares if the contents is already trashed? Yeah. I also don't see how a b_error could even magically appear here without xfs_trans_get_buf returning an error first. > Also, you probably need to set bp = NULL after the > xfs_trans_brelse() call at the bottom of the loop.... Yes. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix use-after-free in xattr node block inactivation 2022-07-08 4:59 ` Christoph Hellwig @ 2022-07-08 15:45 ` Darrick J. Wong 0 siblings, 0 replies; 4+ messages in thread From: Darrick J. Wong @ 2022-07-08 15:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Dave Chinner, xfs, oliver.sang On Fri, Jul 08, 2022 at 06:59:21AM +0200, Christoph Hellwig wrote: > On Fri, Jul 08, 2022 at 02:26:53PM +1000, Dave Chinner wrote: > > > child_blkno, > > > XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, > > > &child_bp); > > > if (error) > > > return error; > > > error = bp->b_error; > > > > > > That doesn't look right -- I think this should be dereferencing > > > child_bp, not bp. > > > > It shouldn't even be there. If xfs_trans_get_buf() returns a buffer, > > it should not have a pending error on it at all. i.e. it's supposed > > to return either an error or a buffer handle that is ready for use. Ah, right, because flags == 0, so we clear b_error anyway. > Agreed. Consumers of the buffer cache API should never look at b_error > because they will not see buffers with b_error set at all. <nod> Thinking about this further -- if the earlier da3_node_read reads a corrupt buffer off disk, it'll exit early, and if the AIL happens to push the buffer and the write verifier fails, the log will be dead, so there's no value added by probing bp->b_error. I'll remove the whole if block entirely. > > Whoever wrote this didn't, for some reason, use the da btree path > > tracking (i.e. a struct xfs_da_state) to keep track of all the > > parent buffers of the current child being invalidated. That would > > make this code a whole lot simpler and neater.... > > Yeah. The brelese seems to go back to: > > commit 677821a1ab2301629aa0370835babb33bc6c919e > Author: Doug Doucette <doucette@engr.sgi.com> > Date: Fri Dec 6 22:05:46 1996 +0000 > > Fold in ficus changes not yet merged in: > revision 1.32 > date: 1996/11/21 23:31:08; author: doucette; state: Exp; lines: +69 -205 > Rewrite inactive attribute code to avoid freeing any of the data blocks > until the very end. We still walk the on-disk structure, but just > call xfs_trans_binval on the buffers we get. Then we call the truncate > code to get rid of the data blocks. This means we don't need a block > reservation. > > and the loop іtself is even older. But the da_state had been around > since 1996, so that isn't really an excuse. > > > > + error = child_bp->b_error; > > > if (error) { > > > xfs_trans_brelse(*trans, child_bp); > > > return error; > > > > I'd just remove the child_bp error checking altogether - if there > > was an IOi or corruption error on it, that shouldn't keep us from > > invalidating it to free the underlying space. We're trashing the > > contents, so who cares if the contents is already trashed? > > Yeah. I also don't see how a b_error could even magically appear > here without xfs_trans_get_buf returning an error first. xfs_trans_get_buf doesn't check b_error on the buffer it returns. I think only the _read_buf variants actually look at that. > > Also, you probably need to set bp = NULL after the > > xfs_trans_brelse() call at the bottom of the loop.... > > Yes. Done. --D ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-07-08 15:45 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-07 22:21 [PATCH] xfs: fix use-after-free in xattr node block inactivation Darrick J. Wong 2022-07-08 4:26 ` Dave Chinner 2022-07-08 4:59 ` Christoph Hellwig 2022-07-08 15:45 ` Darrick J. Wong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.