* [PATCH 0/2] xfs: shutdown UAF fixes @ 2022-11-03 8:36 Guo Xuenan 2022-11-03 8:36 ` [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-03 8:36 UTC (permalink / raw) To: djwong, dchinner Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 Hi xfs folks: The following patches fix some race of xfs force shutdown. Patch 1 fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks"), seems not enough to avoid UAF of AIL before it being tear down in umount. Patch 2 fix uaf of super block buffer log item during xlog shut down, since xfs buf log item can be reloged, super block buffer is most frequently modified of all xfs_buf, especially when disable lazy-count feature, during force shutdown we will unpin and release log item, due to xfs relog mechanism, which may release the log item alread inserted in CIL. I reproduce the two problems, /importantly/, adding following patches and disable lazy-count feature to increase recurrence probability. Kernel patch for reproduce the issue of patch 1: ``` diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 522d450a94b1..b1221d517c00 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -503,6 +503,9 @@ xfs_buf_item_unpin( * the AIL properly holds a reference on the bli. */ freed = atomic_dec_and_test(&bip->bli_refcount); + if (remove) + mdelay(1000); + if (freed && !stale && remove) xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) ``` Kernel patch for reproduce the issue of patch 2: ``` diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 7bd16fbff534..b1aac4a7576c 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -25,6 +25,9 @@ #include "xfs_dquot.h" #include "xfs_icache.h" +#include <linux/delay.h> +#include "xfs_log_priv.h" + struct kmem_cache *xfs_trans_cache; #if defined(CONFIG_TRACEPOINTS) @@ -1002,6 +1005,8 @@ __xfs_trans_commit( xfs_trans_apply_sb_deltas(tp); xfs_trans_apply_dquot_deltas(tp); + if (xlog_is_shutdown(log)) + mdelay(1000); xlog_cil_commit(log, tp, &commit_seq, regrant); xfs_trans_free(tp); ``` Guo Xuenan (2): xfs: wait xlog ioend workqueue drained before tearing down AIL xfs: fix super block buf log item UAF during force shutdown fs/xfs/xfs_buf_item.c | 8 +++++--- fs/xfs/xfs_trans_ail.c | 3 +++ 2 files changed, 8 insertions(+), 3 deletions(-) -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-03 8:36 [PATCH 0/2] xfs: shutdown UAF fixes Guo Xuenan @ 2022-11-03 8:36 ` Guo Xuenan 2022-11-03 21:16 ` Dave Chinner 2022-11-03 8:36 ` [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan 2022-11-07 14:27 ` [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2 siblings, 1 reply; 17+ messages in thread From: Guo Xuenan @ 2022-11-03 8:36 UTC (permalink / raw) To: djwong, dchinner Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 Fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") changed the order of running callbacks and wait for iclog completion to avoid unmount path untimely destroy AIL. But which seems not enough to ensue this, adding mdelay in `xfs_buf_item_unpin` can prove that. The reproduction is as follows. To ensure destroy AIL safely, we should wait all xlog ioend workers done and sync the AIL. ================================================================== BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W 6.1.0-rc1-00002-gc28266863c4a #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-log/sda xlog_ioend_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xfs_trans_ail_delete+0x240/0x2a0 xfs_buf_item_done+0x7b/0xa0 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x4c8/0x860 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 9606: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 __kmalloc+0x59/0x140 kmem_alloc+0xb2/0x2f0 xfs_trans_ail_init+0x20/0x320 xfs_log_mount+0x37e/0x690 xfs_mountfs+0xe36/0x1b40 xfs_fs_fill_super+0xc5c/0x1a70 get_tree_bdev+0x3c5/0x6c0 vfs_get_tree+0x85/0x250 path_mount+0xec3/0x1830 do_mount+0xef/0x110 __x64_sys_mount+0x150/0x1f0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9662: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 __kmem_cache_free+0x99/0x2d0 kvfree+0x3a/0x40 xfs_log_unmount+0x60/0xf0 xfs_unmountfs+0xf3/0x1d0 xfs_fs_put_super+0x78/0x300 generic_shutdown_super+0x151/0x400 kill_block_super+0x9a/0xe0 deactivate_locked_super+0x82/0xe0 deactivate_super+0x91/0xb0 cleanup_mnt+0x32a/0x4a0 task_work_run+0x15f/0x240 exit_to_user_mode_prepare+0x188/0x190 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888023169400 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffff888023169400, ffff888023169480) The buggy address belongs to the physical page: page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023168f80 pfn:0x23168 head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> --- fs/xfs/xfs_trans_ail.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index f51df7d94ef7..1054adb29907 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -929,6 +929,9 @@ xfs_trans_ail_destroy( { struct xfs_ail *ailp = mp->m_ail; + drain_workqueue(mp->m_log->l_ioend_workqueue); + xfs_ail_push_all_sync(ailp); + kthread_stop(ailp->ail_task); kmem_free(ailp); } -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-03 8:36 ` [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan @ 2022-11-03 21:16 ` Dave Chinner 2022-11-04 7:50 ` Guo Xuenan 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2022-11-03 21:16 UTC (permalink / raw) To: Guo Xuenan Cc: djwong, dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 On Thu, Nov 03, 2022 at 04:36:31PM +0800, Guo Xuenan wrote: > Fix uaf in xfs_trans_ail_delete during xlog force shutdown. > In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in > xlog_state_shutdown_callbacks") changed the order of running callbacks > and wait for iclog completion to avoid unmount path untimely destroy AIL. > But which seems not enough to ensue this, adding mdelay in > `xfs_buf_item_unpin` can prove that. > > The reproduction is as follows. To ensure destroy AIL safely, > we should wait all xlog ioend workers done and sync the AIL. > > ================================================================== > BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 > Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 > > CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W > 6.1.0-rc1-00002-gc28266863c4a #137 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-1ubuntu1.1 04/01/2014 > Workqueue: xfs-log/sda xlog_ioend_work > Call Trace: > <TASK> > dump_stack_lvl+0x4d/0x66 > print_report+0x171/0x4a6 > kasan_report+0xb3/0x130 > xfs_trans_ail_delete+0x240/0x2a0 > xfs_buf_item_done+0x7b/0xa0 > xfs_buf_ioend+0x1e9/0x11f0 > xfs_buf_item_unpin+0x4c8/0x860 > xfs_trans_committed_bulk+0x4c2/0x7c0 > xlog_cil_committed+0xab6/0xfb0 > xlog_cil_process_committed+0x117/0x1e0 > xlog_state_shutdown_callbacks+0x208/0x440 > xlog_force_shutdown+0x1b3/0x3a0 > xlog_ioend_work+0xef/0x1d0 So we are still processing an iclog at this point and have it locked (iclog->ic_sema is held). These aren't cycled to wait for all processing to complete until xlog_dealloc_log() before they are freed. If we cycle through the iclog->ic_sema locks when we quiesce the log (as we should be doing before attempting to write an unmount record) this UAF problem goes away, right? > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index f51df7d94ef7..1054adb29907 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -929,6 +929,9 @@ xfs_trans_ail_destroy( > { > struct xfs_ail *ailp = mp->m_ail; > > + drain_workqueue(mp->m_log->l_ioend_workqueue); > + xfs_ail_push_all_sync(ailp); This isn't the place to be draining the AIL and waiting for IO to complete. As per above, that should have been done long before we get here... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-03 21:16 ` Dave Chinner @ 2022-11-04 7:50 ` Guo Xuenan 2022-11-04 15:46 ` Darrick J. Wong 0 siblings, 1 reply; 17+ messages in thread From: Guo Xuenan @ 2022-11-04 7:50 UTC (permalink / raw) To: Dave Chinner Cc: djwong, dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 Hi,Dave: On 2022/11/4 5:16, Dave Chinner wrote: > On Thu, Nov 03, 2022 at 04:36:31PM +0800, Guo Xuenan wrote: >> Fix uaf in xfs_trans_ail_delete during xlog force shutdown. >> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in >> xlog_state_shutdown_callbacks") changed the order of running callbacks >> and wait for iclog completion to avoid unmount path untimely destroy AIL. >> But which seems not enough to ensue this, adding mdelay in >> `xfs_buf_item_unpin` can prove that. >> >> The reproduction is as follows. To ensure destroy AIL safely, >> we should wait all xlog ioend workers done and sync the AIL. >> >> ================================================================== >> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 >> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 >> >> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W >> 6.1.0-rc1-00002-gc28266863c4a #137 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.13.0-1ubuntu1.1 04/01/2014 >> Workqueue: xfs-log/sda xlog_ioend_work >> Call Trace: >> <TASK> >> dump_stack_lvl+0x4d/0x66 >> print_report+0x171/0x4a6 >> kasan_report+0xb3/0x130 >> xfs_trans_ail_delete+0x240/0x2a0 >> xfs_buf_item_done+0x7b/0xa0 >> xfs_buf_ioend+0x1e9/0x11f0 >> xfs_buf_item_unpin+0x4c8/0x860 >> xfs_trans_committed_bulk+0x4c2/0x7c0 >> xlog_cil_committed+0xab6/0xfb0 >> xlog_cil_process_committed+0x117/0x1e0 >> xlog_state_shutdown_callbacks+0x208/0x440 >> xlog_force_shutdown+0x1b3/0x3a0 >> xlog_ioend_work+0xef/0x1d0 > So we are still processing an iclog at this point and have it > locked (iclog->ic_sema is held). These aren't cycled to wait for > all processing to complete until xlog_dealloc_log() before they are > freed. > > If we cycle through the iclog->ic_sema locks when we quiesce the log > (as we should be doing before attempting to write an unmount record) > this UAF problem goes away, right? Yes,:) right!According to the method you said, we can also solve this problem. The key to sloving this problem is to make sure that all log IO is done before tearing down AIL. > >> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c >> index f51df7d94ef7..1054adb29907 100644 >> --- a/fs/xfs/xfs_trans_ail.c >> +++ b/fs/xfs/xfs_trans_ail.c >> @@ -929,6 +929,9 @@ xfs_trans_ail_destroy( >> { >> struct xfs_ail *ailp = mp->m_ail; >> >> + drain_workqueue(mp->m_log->l_ioend_workqueue); >> + xfs_ail_push_all_sync(ailp); > This isn't the place to be draining the AIL and waiting for IO to > complete. As per above, that should have been done long before we > get here... I'm agree with your opinion,but, I have verified that it can indeed solve the UAF. And, I also verify the way you suggested, it is equally effective. But, I have no better idea where to place this check, hope for your better suggestions. Here I provide a way for reference,would you kindly consider the following modifications, thanks in advance :) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index f02a0dd522b3..4e48cc3ba6da 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1094,8 +1094,22 @@ void xfs_log_clean( struct xfs_mount *mp) { + struct xlog *log = mp->m_log; + xlog_in_core_t *iclog = log->l_iclog; + int i; + xfs_log_quiesce(mp); xfs_log_unmount_write(mp); + + /* + * Cycle all the iclogbuf locks to make sure all log IO completion + * is done before we tear down AIL. + */ + for (i = 0; i < log->l_iclog_bufs; i++) { + down(&iclog->ic_sema); + up(&iclog->ic_sema); + iclog = iclog->ic_next; + } } Best regards Xuenan > -Dave. -- Guo Xuenan [OS Kernel Lab] ----------------------------- Email: guoxuenan@huawei.com ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-04 7:50 ` Guo Xuenan @ 2022-11-04 15:46 ` Darrick J. Wong 2022-11-05 3:32 ` Guo Xuenan 0 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2022-11-04 15:46 UTC (permalink / raw) To: Guo Xuenan Cc: Dave Chinner, dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 On Fri, Nov 04, 2022 at 03:50:44PM +0800, Guo Xuenan wrote: > Hi,Dave: > On 2022/11/4 5:16, Dave Chinner wrote: > > On Thu, Nov 03, 2022 at 04:36:31PM +0800, Guo Xuenan wrote: > > > Fix uaf in xfs_trans_ail_delete during xlog force shutdown. > > > In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in > > > xlog_state_shutdown_callbacks") changed the order of running callbacks > > > and wait for iclog completion to avoid unmount path untimely destroy AIL. > > > But which seems not enough to ensue this, adding mdelay in > > > `xfs_buf_item_unpin` can prove that. > > > > > > The reproduction is as follows. To ensure destroy AIL safely, > > > we should wait all xlog ioend workers done and sync the AIL. > > > > > > ================================================================== > > > BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 > > > Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 > > > > > > CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W > > > 6.1.0-rc1-00002-gc28266863c4a #137 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.13.0-1ubuntu1.1 04/01/2014 > > > Workqueue: xfs-log/sda xlog_ioend_work > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x4d/0x66 > > > print_report+0x171/0x4a6 > > > kasan_report+0xb3/0x130 > > > xfs_trans_ail_delete+0x240/0x2a0 > > > xfs_buf_item_done+0x7b/0xa0 > > > xfs_buf_ioend+0x1e9/0x11f0 > > > xfs_buf_item_unpin+0x4c8/0x860 > > > xfs_trans_committed_bulk+0x4c2/0x7c0 > > > xlog_cil_committed+0xab6/0xfb0 > > > xlog_cil_process_committed+0x117/0x1e0 > > > xlog_state_shutdown_callbacks+0x208/0x440 > > > xlog_force_shutdown+0x1b3/0x3a0 > > > xlog_ioend_work+0xef/0x1d0 > > So we are still processing an iclog at this point and have it > > locked (iclog->ic_sema is held). These aren't cycled to wait for > > all processing to complete until xlog_dealloc_log() before they are > > freed. > > > > If we cycle through the iclog->ic_sema locks when we quiesce the log > > (as we should be doing before attempting to write an unmount record) > > this UAF problem goes away, right? > Yes,:) right!According to the method you said, we can also solve this > problem. > The key to sloving this problem is to make sure that all log IO is done > before > tearing down AIL. > > > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > > index f51df7d94ef7..1054adb29907 100644 > > > --- a/fs/xfs/xfs_trans_ail.c > > > +++ b/fs/xfs/xfs_trans_ail.c > > > @@ -929,6 +929,9 @@ xfs_trans_ail_destroy( > > > { > > > struct xfs_ail *ailp = mp->m_ail; > > > + drain_workqueue(mp->m_log->l_ioend_workqueue); > > > + xfs_ail_push_all_sync(ailp); > > This isn't the place to be draining the AIL and waiting for IO to > > complete. As per above, that should have been done long before we > > get here... > I'm agree with your opinion,but, I have verified that it can indeed solve > the UAF. > And, I also verify the way you suggested, it is equally effective. > But, I have no better idea where to place this check, hope for your better > suggestions. > Here I provide a way for reference,would you kindly consider the following > modifications, > thanks in advance :) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index f02a0dd522b3..4e48cc3ba6da 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1094,8 +1094,22 @@ void > xfs_log_clean( > struct xfs_mount *mp) > { > + struct xlog *log = mp->m_log; > + xlog_in_core_t *iclog = log->l_iclog; > + int i; > + > xfs_log_quiesce(mp); > xfs_log_unmount_write(mp); > + > + /* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL. > + */ > + for (i = 0; i < log->l_iclog_bufs; i++) { > + down(&iclog->ic_sema); > + up(&iclog->ic_sema); > + iclog = iclog->ic_next; > + } I'm pretty sure Dave meant *before* xfs_log_unmount_write when he said "as we should be doing before attempting to write an unmount record". Just from looking at function names, I wonder if this shouldn't be a final step of xfs_log_quiesce since a log with active IO completion doesn't really sound empty to me... --D > } > > Best regards > Xuenan > > -Dave. > > -- > Guo Xuenan [OS Kernel Lab] > ----------------------------- > Email: guoxuenan@huawei.com > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-04 15:46 ` Darrick J. Wong @ 2022-11-05 3:32 ` Guo Xuenan 0 siblings, 0 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-05 3:32 UTC (permalink / raw) To: Darrick J. Wong Cc: Dave Chinner, dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 On 2022/11/4 23:46, Darrick J. Wong wrote: > On Fri, Nov 04, 2022 at 03:50:44PM +0800, Guo Xuenan wrote: >> Hi,Dave: >> On 2022/11/4 5:16, Dave Chinner wrote: >>> On Thu, Nov 03, 2022 at 04:36:31PM +0800, Guo Xuenan wrote: >>>> Fix uaf in xfs_trans_ail_delete during xlog force shutdown. >>>> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in >>>> xlog_state_shutdown_callbacks") changed the order of running callbacks >>>> and wait for iclog completion to avoid unmount path untimely destroy AIL. >>>> But which seems not enough to ensue this, adding mdelay in >>>> `xfs_buf_item_unpin` can prove that. >>>> >>>> The reproduction is as follows. To ensure destroy AIL safely, >>>> we should wait all xlog ioend workers done and sync the AIL. >>>> >>>> ================================================================== >>>> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 >>>> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 >>>> >>>> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W >>>> 6.1.0-rc1-00002-gc28266863c4a #137 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>>> 1.13.0-1ubuntu1.1 04/01/2014 >>>> Workqueue: xfs-log/sda xlog_ioend_work >>>> Call Trace: >>>> <TASK> >>>> dump_stack_lvl+0x4d/0x66 >>>> print_report+0x171/0x4a6 >>>> kasan_report+0xb3/0x130 >>>> xfs_trans_ail_delete+0x240/0x2a0 >>>> xfs_buf_item_done+0x7b/0xa0 >>>> xfs_buf_ioend+0x1e9/0x11f0 >>>> xfs_buf_item_unpin+0x4c8/0x860 >>>> xfs_trans_committed_bulk+0x4c2/0x7c0 >>>> xlog_cil_committed+0xab6/0xfb0 >>>> xlog_cil_process_committed+0x117/0x1e0 >>>> xlog_state_shutdown_callbacks+0x208/0x440 >>>> xlog_force_shutdown+0x1b3/0x3a0 >>>> xlog_ioend_work+0xef/0x1d0 >>> So we are still processing an iclog at this point and have it >>> locked (iclog->ic_sema is held). These aren't cycled to wait for >>> all processing to complete until xlog_dealloc_log() before they are >>> freed. >>> >>> If we cycle through the iclog->ic_sema locks when we quiesce the log >>> (as we should be doing before attempting to write an unmount record) >>> this UAF problem goes away, right? >> Yes,:) right!According to the method you said, we can also solve this >> problem. >> The key to sloving this problem is to make sure that all log IO is done >> before >> tearing down AIL. >>>> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c >>>> index f51df7d94ef7..1054adb29907 100644 >>>> --- a/fs/xfs/xfs_trans_ail.c >>>> +++ b/fs/xfs/xfs_trans_ail.c >>>> @@ -929,6 +929,9 @@ xfs_trans_ail_destroy( >>>> { >>>> struct xfs_ail *ailp = mp->m_ail; >>>> + drain_workqueue(mp->m_log->l_ioend_workqueue); >>>> + xfs_ail_push_all_sync(ailp); >>> This isn't the place to be draining the AIL and waiting for IO to >>> complete. As per above, that should have been done long before we >>> get here... >> I'm agree with your opinion,but, I have verified that it can indeed solve >> the UAF. >> And, I also verify the way you suggested, it is equally effective. >> But, I have no better idea where to place this check, hope for your better >> suggestions. >> Here I provide a way for reference,would you kindly consider the following >> modifications, >> thanks in advance :) >> >> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c >> index f02a0dd522b3..4e48cc3ba6da 100644 >> --- a/fs/xfs/xfs_log.c >> +++ b/fs/xfs/xfs_log.c >> @@ -1094,8 +1094,22 @@ void >> xfs_log_clean( >> struct xfs_mount *mp) >> { >> + struct xlog *log = mp->m_log; >> + xlog_in_core_t *iclog = log->l_iclog; >> + int i; >> + >> xfs_log_quiesce(mp); >> xfs_log_unmount_write(mp); >> + >> + /* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL. >> + */ >> + for (i = 0; i < log->l_iclog_bufs; i++) { >> + down(&iclog->ic_sema); >> + up(&iclog->ic_sema); >> + iclog = iclog->ic_next; >> + } > I'm pretty sure Dave meant *before* xfs_log_unmount_write when he said > "as we should be doing before attempting to write an unmount record". > Just from looking at function names, I wonder if this shouldn't be a > final step of xfs_log_quiesce since a log with active IO completion > doesn't really sound empty to me... Sorry for my poor english,IIUC,you meant put the io compeletion check between xfs_log_quiesce and xfs_log_umount_write ? May we abstract the "cycle iclogbuf wait" into a function named xlog_wait_iodone/xlog_quiesce_done or something more appropriate. For example: @@ -82,6 +82,9 @@ STATIC int xlog_iclogs_empty( struct xlog *log); +static void +xlog_wait_iodone(struct xlog *log); + static int xfs_log_cover(struct xfs_mount *); @@ -886,6 +889,23 @@ xlog_force_iclog( return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); } +/* + * Cycle all the iclogbuf locks to make sure all log IO completion + * is done before we tear down AIL/CIL. + */ +static void +xlog_wait_iodone(struct xlog *log) +{ + int i; + xlog_in_core_t *iclog = log->l_iclog; + + for (i = 0; i < log->l_iclog_bufs; i++) { + down(&iclog->ic_sema); + up(&iclog->ic_sema); + iclog = iclog->ic_next; + } +} + /* * Wait for the iclog and all prior iclogs to be written disk as required by the * log force state machine. Waiting on ic_force_wait ensures iclog completions @@ -1095,6 +1115,7 @@ xfs_log_clean( struct xfs_mount *mp) { xfs_log_quiesce(mp); + xlog_wait_iodone(mp->m_log); xfs_log_unmount_write(mp); } @@ -2113,17 +2134,7 @@ xlog_dealloc_log( xlog_in_core_t *iclog, *next_iclog; int i; - /* - * Cycle all the iclogbuf locks to make sure all log IO completion - * is done before we tear down these buffers. - */ - iclog = log->l_iclog; - for (i = 0; i < log->l_iclog_bufs; i++) { - down(&iclog->ic_sema); - up(&iclog->ic_sema); - iclog = iclog->ic_next; - } - + xlog_wait_iodone(log); Best regards :) Xuenan > --D > >> } >> >> Best regards >> Xuenan >>> -Dave. >> -- >> Guo Xuenan [OS Kernel Lab] >> ----------------------------- >> Email: guoxuenan@huawei.com >> > . -- Guo Xuenan [OS Kernel Lab] ----------------------------- Email: guoxuenan@huawei.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown 2022-11-03 8:36 [PATCH 0/2] xfs: shutdown UAF fixes Guo Xuenan 2022-11-03 8:36 ` [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan @ 2022-11-03 8:36 ` Guo Xuenan 2022-11-03 21:44 ` Dave Chinner 2022-11-07 14:27 ` [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2 siblings, 1 reply; 17+ messages in thread From: Guo Xuenan @ 2022-11-03 8:36 UTC (permalink / raw) To: djwong, dchinner Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 xfs log io error will trigger xlog shut down, and end_io worker call shutdown_callbacks to unpin and release the buf log item. The race condition is that when there are some thread doing transaction commit and happened not to be intercepted by xlog_is_shutdown, then, these log item will be insert into CIL, when unpin and release these buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit` while xlog shutdown status can increase recurrence probability. The following call graph actually encountered this bad situation. fsstress io end worker kworker/0:1H-216 xlog_ioend_work ->xlog_force_shutdown ->xlog_state_shutdown_callbacks ->xlog_cil_process_committed ->xlog_cil_committed ->xfs_trans_committed_bulk ->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1); ->xfs_trans_getsb ->_xfs_trans_bjoin ->xfs_buf_item_init ->if (bip) { return 0;} //relog ->xlog_cil_commit ->xlog_cil_insert_items //insert into CIL ->xfs_buf_ioend_fail(bp); ->xfs_buf_ioend ->xfs_buf_item_done ->xfs_buf_item_relse ->xfs_buf_item_free when cil push worker gather percpu cil and insert super block buf log item into ctx->log_items then uaf occurs. ================================================================== BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0 Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105 CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W 6.1.0-rc1-00001-g274115149b42 #136 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xlog_cil_push_work+0x1c8f/0x22f0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 2145: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x54/0x60 kmem_cache_alloc+0x14a/0x510 xfs_buf_item_init+0x160/0x6d0 _xfs_trans_bjoin+0x7f/0x2e0 xfs_trans_getsb+0xb6/0x3f0 xfs_trans_apply_sb_deltas+0x1f/0x8c0 __xfs_trans_commit+0xa25/0xe10 xfs_symlink+0xe23/0x1660 xfs_vn_symlink+0x157/0x280 vfs_symlink+0x491/0x790 do_symlinkat+0x128/0x220 __x64_sys_symlink+0x7a/0x90 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 216: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 kmem_cache_free+0xb6/0x460 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x3d6/0x840 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff88801800f388 which belongs to the cache xfs_buf_item of size 272 The buggy address is located 104 bytes inside of 272-byte region [ffff88801800f388, ffff88801800f498) The buggy address belongs to the physical page: page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88801800f208 pfn:0x1800e head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640 raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> --- fs/xfs/xfs_buf_item.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 522d450a94b1..df1fe6e7e2ba 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -1018,6 +1018,8 @@ xfs_buf_item_relse( trace_xfs_buf_item_relse(bp, _RET_IP_); ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags)); + if (!list_empty(&bip->bli_item.li_cil)) + return; bp->b_log_item = NULL; xfs_buf_rele(bp); xfs_buf_item_free(bip); -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown 2022-11-03 8:36 ` [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan @ 2022-11-03 21:44 ` Dave Chinner 0 siblings, 0 replies; 17+ messages in thread From: Dave Chinner @ 2022-11-03 21:44 UTC (permalink / raw) To: Guo Xuenan Cc: djwong, dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang, zhengbin13, leo.lilong, zengheng4 On Thu, Nov 03, 2022 at 04:36:32PM +0800, Guo Xuenan wrote: > xfs log io error will trigger xlog shut down, and end_io worker call > shutdown_callbacks to unpin and release the buf log item. The race > condition is that when there are some thread doing transaction commit > and happened not to be intercepted by xlog_is_shutdown, then, these > log item will be insert into CIL, when unpin and release these buf log > item, UAF will occur. BTW, add delay before `xlog_cil_commit` while xlog > shutdown status can increase recurrence probability. > > The following call graph actually encountered this bad situation. > fsstress io end worker kworker/0:1H-216 > xlog_ioend_work > ->xlog_force_shutdown > ->xlog_state_shutdown_callbacks > ->xlog_cil_process_committed > ->xlog_cil_committed > ->xfs_trans_committed_bulk > ->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1); > ->xfs_trans_getsb > ->_xfs_trans_bjoin > ->xfs_buf_item_init > ->if (bip) { return 0;} //relog _xfs_trans_bjoin() takes a reference to the bli. > ->xlog_cil_commit > ->xlog_cil_insert_items //insert into CIL > ->xfs_buf_ioend_fail(bp); > ->xfs_buf_ioend > ->xfs_buf_item_done > ->xfs_buf_item_relse > ->xfs_buf_item_free So how is the bli getting freed here if the fstress task has just taken an extra reference and inserted it into the CIL? Ah, the problem is that xfs_buf_item_relse() isn't checking the reference count before it frees the BLI. That is, xfs_buf_item_relse() assumes that it is only called at the end of the BLI life cycle and so doesn't check the reference count, when in fact it clearly isn't. Also, should we be inserting new items into the CIL if the log is already marked as shut down? -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-03 8:36 [PATCH 0/2] xfs: shutdown UAF fixes Guo Xuenan 2022-11-03 8:36 ` [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-03 8:36 ` [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan @ 2022-11-07 14:27 ` Guo Xuenan 2022-11-07 14:27 ` [PATCH v2 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan 2022-11-07 17:13 ` [PATCH v2 " Darrick J. Wong 2 siblings, 2 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-07 14:27 UTC (permalink / raw) To: dchinner, djwong Cc: guoxuenan, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 Fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") changed the order of running callbacks and wait for iclog completion to avoid unmount path untimely destroy AIL. But which seems not enough to ensue this, adding mdelay in `xfs_buf_item_unpin` can prove that. The reproduction is as follows. To ensure destroy AIL safely, we should wait all xlog ioend workers done and sync the AIL. ================================================================== BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W 6.1.0-rc1-00002-gc28266863c4a #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-log/sda xlog_ioend_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xfs_trans_ail_delete+0x240/0x2a0 xfs_buf_item_done+0x7b/0xa0 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x4c8/0x860 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 9606: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 __kmalloc+0x59/0x140 kmem_alloc+0xb2/0x2f0 xfs_trans_ail_init+0x20/0x320 xfs_log_mount+0x37e/0x690 xfs_mountfs+0xe36/0x1b40 xfs_fs_fill_super+0xc5c/0x1a70 get_tree_bdev+0x3c5/0x6c0 vfs_get_tree+0x85/0x250 path_mount+0xec3/0x1830 do_mount+0xef/0x110 __x64_sys_mount+0x150/0x1f0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9662: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 __kmem_cache_free+0x99/0x2d0 kvfree+0x3a/0x40 xfs_log_unmount+0x60/0xf0 xfs_unmountfs+0xf3/0x1d0 xfs_fs_put_super+0x78/0x300 generic_shutdown_super+0x151/0x400 kill_block_super+0x9a/0xe0 deactivate_locked_super+0x82/0xe0 deactivate_super+0x91/0xb0 cleanup_mnt+0x32a/0x4a0 task_work_run+0x15f/0x240 exit_to_user_mode_prepare+0x188/0x190 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888023169400 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffff888023169400, ffff888023169480) The buggy address belongs to the physical page: page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023168f80 pfn:0x23168 head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> --- fs/xfs/xfs_log.c | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index f02a0dd522b3..89fcd2b8cdfc 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -82,6 +82,9 @@ STATIC int xlog_iclogs_empty( struct xlog *log); +static void +xlog_wait_iodone(struct xlog *log); + static int xfs_log_cover(struct xfs_mount *); @@ -886,6 +889,23 @@ xlog_force_iclog( return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); } +/* + * Cycle all the iclogbuf locks to make sure all log IO completion + * is done before we tear down AIL/CIL. + */ +static void +xlog_wait_iodone(struct xlog *log) +{ + int i; + xlog_in_core_t *iclog = log->l_iclog; + + for (i = 0; i < log->l_iclog_bufs; i++) { + down(&iclog->ic_sema); + up(&iclog->ic_sema); + iclog = iclog->ic_next; + } +} + /* * Wait for the iclog and all prior iclogs to be written disk as required by the * log force state machine. Waiting on ic_force_wait ensures iclog completions @@ -1095,6 +1115,7 @@ xfs_log_clean( struct xfs_mount *mp) { xfs_log_quiesce(mp); + xlog_wait_iodone(mp->m_log); xfs_log_unmount_write(mp); } @@ -2113,17 +2134,7 @@ xlog_dealloc_log( xlog_in_core_t *iclog, *next_iclog; int i; - /* - * Cycle all the iclogbuf locks to make sure all log IO completion - * is done before we tear down these buffers. - */ - iclog = log->l_iclog; - for (i = 0; i < log->l_iclog_bufs; i++) { - down(&iclog->ic_sema); - up(&iclog->ic_sema); - iclog = iclog->ic_next; - } - + xlog_wait_iodone(log); /* * Destroy the CIL after waiting for iclog IO completion because an * iclog EIO error will try to shut down the log, which accesses the -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] xfs: fix super block buf log item UAF during force shutdown 2022-11-07 14:27 ` [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan @ 2022-11-07 14:27 ` Guo Xuenan 2022-11-08 14:06 ` [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-07 17:13 ` [PATCH v2 " Darrick J. Wong 1 sibling, 1 reply; 17+ messages in thread From: Guo Xuenan @ 2022-11-07 14:27 UTC (permalink / raw) To: dchinner, djwong Cc: guoxuenan, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 xfs log io error will trigger xlog shut down, and end_io worker call xlog_state_shutdown_callbacks to unpin and release the buf log item. The race condition is that when there are some thread doing transaction commit and happened not to be intercepted by xlog_is_shutdown, then, these log item will be insert into CIL, when unpin and release these buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit` can increase recurrence probability. The following call graph actually encountered this bad situation. fsstress io end worker kworker/0:1H-216 xlog_ioend_work ->xlog_force_shutdown ->xlog_state_shutdown_callbacks ->xlog_cil_process_committed ->xlog_cil_committed ->xfs_trans_committed_bulk ->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1); ->xfs_trans_getsb ->_xfs_trans_bjoin ->xfs_buf_item_init ->if (bip) { return 0;} //relog ->xlog_cil_commit ->xlog_cil_insert_items //insert into CIL ->xfs_buf_ioend_fail(bp); ->xfs_buf_ioend ->xfs_buf_item_done ->xfs_buf_item_relse ->xfs_buf_item_free when cil push worker gather percpu cil and insert super block buf log item into ctx->log_items then uaf occurs. ================================================================== BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0 Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105 CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W 6.1.0-rc1-00001-g274115149b42 #136 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xlog_cil_push_work+0x1c8f/0x22f0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 2145: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x54/0x60 kmem_cache_alloc+0x14a/0x510 xfs_buf_item_init+0x160/0x6d0 _xfs_trans_bjoin+0x7f/0x2e0 xfs_trans_getsb+0xb6/0x3f0 xfs_trans_apply_sb_deltas+0x1f/0x8c0 __xfs_trans_commit+0xa25/0xe10 xfs_symlink+0xe23/0x1660 xfs_vn_symlink+0x157/0x280 vfs_symlink+0x491/0x790 do_symlinkat+0x128/0x220 __x64_sys_symlink+0x7a/0x90 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 216: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 kmem_cache_free+0xb6/0x460 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x3d6/0x840 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff88801800f388 which belongs to the cache xfs_buf_item of size 272 The buggy address is located 104 bytes inside of 272-byte region [ffff88801800f388, ffff88801800f498) The buggy address belongs to the physical page: page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88801800f208 pfn:0x1800e head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640 raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> --- fs/xfs/xfs_buf_item.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 522d450a94b1..df7322ed73fa 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -1018,6 +1018,8 @@ xfs_buf_item_relse( trace_xfs_buf_item_relse(bp, _RET_IP_); ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags)); + if (atomic_read(&bip->bli_refcount)) + return; bp->b_log_item = NULL; xfs_buf_rele(bp); xfs_buf_item_free(bip); -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-07 14:27 ` [PATCH v2 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan @ 2022-11-08 14:06 ` Guo Xuenan 2022-11-15 23:23 ` Darrick J. Wong 2022-11-16 6:02 ` Dave Chinner 0 siblings, 2 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-08 14:06 UTC (permalink / raw) To: dchinner, djwong Cc: guoxuenan, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 Fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") changed the order of running callbacks and wait for iclog completion to avoid unmount path untimely destroy AIL. But which seems not enough to ensue this, adding mdelay in `xfs_buf_item_unpin` can prove that. The reproduction is as follows. To ensure destroy AIL safely, we should wait all xlog ioend workers done and sync the AIL. ================================================================== BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W 6.1.0-rc1-00002-gc28266863c4a #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-log/sda xlog_ioend_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xfs_trans_ail_delete+0x240/0x2a0 xfs_buf_item_done+0x7b/0xa0 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x4c8/0x860 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 9606: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 __kmalloc+0x59/0x140 kmem_alloc+0xb2/0x2f0 xfs_trans_ail_init+0x20/0x320 xfs_log_mount+0x37e/0x690 xfs_mountfs+0xe36/0x1b40 xfs_fs_fill_super+0xc5c/0x1a70 get_tree_bdev+0x3c5/0x6c0 vfs_get_tree+0x85/0x250 path_mount+0xec3/0x1830 do_mount+0xef/0x110 __x64_sys_mount+0x150/0x1f0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9662: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 __kmem_cache_free+0x99/0x2d0 kvfree+0x3a/0x40 xfs_log_unmount+0x60/0xf0 xfs_unmountfs+0xf3/0x1d0 xfs_fs_put_super+0x78/0x300 generic_shutdown_super+0x151/0x400 kill_block_super+0x9a/0xe0 deactivate_locked_super+0x82/0xe0 deactivate_super+0x91/0xb0 cleanup_mnt+0x32a/0x4a0 task_work_run+0x15f/0x240 exit_to_user_mode_prepare+0x188/0x190 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888023169400 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffff888023169400, ffff888023169480) The buggy address belongs to the physical page: page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023168f80 pfn:0x23168 head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> --- fs/xfs/xfs_log.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index f02a0dd522b3..467bac00951c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -82,6 +82,9 @@ STATIC int xlog_iclogs_empty( struct xlog *log); +static void +xlog_wait_iodone(struct xlog *log); + static int xfs_log_cover(struct xfs_mount *); @@ -886,6 +889,23 @@ xlog_force_iclog( return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); } +/* + * Cycle all the iclogbuf locks to make sure all log IO completion + * is done before we tear down AIL/CIL. + */ +static void +xlog_wait_iodone(struct xlog *log) +{ + int i; + xlog_in_core_t *iclog = log->l_iclog; + + for (i = 0; i < log->l_iclog_bufs; i++) { + down(&iclog->ic_sema); + up(&iclog->ic_sema); + iclog = iclog->ic_next; + } +} + /* * Wait for the iclog and all prior iclogs to be written disk as required by the * log force state machine. Waiting on ic_force_wait ensures iclog completions @@ -1276,6 +1296,12 @@ xfs_log_cover( xfs_ail_push_all_sync(mp->m_ail); } while (xfs_log_need_covered(mp)); + /* + * Cycle all the iclogbuf locks to make sure all log IO completion + * is done before we tear down AIL. + */ + xlog_wait_iodone(mp->m_log); + return error; } @@ -2117,12 +2143,7 @@ xlog_dealloc_log( * Cycle all the iclogbuf locks to make sure all log IO completion * is done before we tear down these buffers. */ - iclog = log->l_iclog; - for (i = 0; i < log->l_iclog_bufs; i++) { - down(&iclog->ic_sema); - up(&iclog->ic_sema); - iclog = iclog->ic_next; - } + xlog_wait_iodone(log); /* * Destroy the CIL after waiting for iclog IO completion because an -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-08 14:06 ` [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan @ 2022-11-15 23:23 ` Darrick J. Wong 2022-11-17 1:18 ` Guo Xuenan 2022-11-16 6:02 ` Dave Chinner 1 sibling, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2022-11-15 23:23 UTC (permalink / raw) To: Guo Xuenan Cc: dchinner, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On Tue, Nov 08, 2022 at 10:06:05PM +0800, Guo Xuenan wrote: > Fix uaf in xfs_trans_ail_delete during xlog force shutdown. > In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in > xlog_state_shutdown_callbacks") changed the order of running callbacks > and wait for iclog completion to avoid unmount path untimely destroy AIL. > But which seems not enough to ensue this, adding mdelay in > `xfs_buf_item_unpin` can prove that. > > The reproduction is as follows. To ensure destroy AIL safely, > we should wait all xlog ioend workers done and sync the AIL. Er... could you /please/ start a new thread each time you send a new revision of this patchset? It's /so/ much easier to pick out patches for re-review if they're not buried in an existin thread. I /think/ the code here is ok though I'll probably tweak the comments before I commit them. Not sure where the other patch went...? --D > > ================================================================== > BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 > Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 > > CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W > 6.1.0-rc1-00002-gc28266863c4a #137 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-1ubuntu1.1 04/01/2014 > Workqueue: xfs-log/sda xlog_ioend_work > Call Trace: > <TASK> > dump_stack_lvl+0x4d/0x66 > print_report+0x171/0x4a6 > kasan_report+0xb3/0x130 > xfs_trans_ail_delete+0x240/0x2a0 > xfs_buf_item_done+0x7b/0xa0 > xfs_buf_ioend+0x1e9/0x11f0 > xfs_buf_item_unpin+0x4c8/0x860 > xfs_trans_committed_bulk+0x4c2/0x7c0 > xlog_cil_committed+0xab6/0xfb0 > xlog_cil_process_committed+0x117/0x1e0 > xlog_state_shutdown_callbacks+0x208/0x440 > xlog_force_shutdown+0x1b3/0x3a0 > xlog_ioend_work+0xef/0x1d0 > process_one_work+0x6f9/0xf70 > worker_thread+0x578/0xf30 > kthread+0x28c/0x330 > ret_from_fork+0x1f/0x30 > </TASK> > > Allocated by task 9606: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > __kasan_kmalloc+0x7a/0x90 > __kmalloc+0x59/0x140 > kmem_alloc+0xb2/0x2f0 > xfs_trans_ail_init+0x20/0x320 > xfs_log_mount+0x37e/0x690 > xfs_mountfs+0xe36/0x1b40 > xfs_fs_fill_super+0xc5c/0x1a70 > get_tree_bdev+0x3c5/0x6c0 > vfs_get_tree+0x85/0x250 > path_mount+0xec3/0x1830 > do_mount+0xef/0x110 > __x64_sys_mount+0x150/0x1f0 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Freed by task 9662: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > __kasan_slab_free+0x105/0x1a0 > __kmem_cache_free+0x99/0x2d0 > kvfree+0x3a/0x40 > xfs_log_unmount+0x60/0xf0 > xfs_unmountfs+0xf3/0x1d0 > xfs_fs_put_super+0x78/0x300 > generic_shutdown_super+0x151/0x400 > kill_block_super+0x9a/0xe0 > deactivate_locked_super+0x82/0xe0 > deactivate_super+0x91/0xb0 > cleanup_mnt+0x32a/0x4a0 > task_work_run+0x15f/0x240 > exit_to_user_mode_prepare+0x188/0x190 > syscall_exit_to_user_mode+0x12/0x30 > do_syscall_64+0x42/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > The buggy address belongs to the object at ffff888023169400 > which belongs to the cache kmalloc-128 of size 128 > The buggy address is located 0 bytes inside of > 128-byte region [ffff888023169400, ffff888023169480) > > The buggy address belongs to the physical page: > page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 > index:0xffff888023168f80 pfn:0x23168 > head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 > flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) > raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 > raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ================================================================== > Disabling lock debugging due to kernel taint > > Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") > Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> > --- > fs/xfs/xfs_log.c | 33 +++++++++++++++++++++++++++------ > 1 file changed, 27 insertions(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index f02a0dd522b3..467bac00951c 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -82,6 +82,9 @@ STATIC int > xlog_iclogs_empty( > struct xlog *log); > > +static void > +xlog_wait_iodone(struct xlog *log); > + > static int > xfs_log_cover(struct xfs_mount *); > > @@ -886,6 +889,23 @@ xlog_force_iclog( > return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); > } > > +/* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL/CIL. > + */ > +static void > +xlog_wait_iodone(struct xlog *log) > +{ > + int i; > + xlog_in_core_t *iclog = log->l_iclog; > + > + for (i = 0; i < log->l_iclog_bufs; i++) { > + down(&iclog->ic_sema); > + up(&iclog->ic_sema); > + iclog = iclog->ic_next; > + } > +} > + > /* > * Wait for the iclog and all prior iclogs to be written disk as required by the > * log force state machine. Waiting on ic_force_wait ensures iclog completions > @@ -1276,6 +1296,12 @@ xfs_log_cover( > xfs_ail_push_all_sync(mp->m_ail); > } while (xfs_log_need_covered(mp)); > > + /* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL. > + */ > + xlog_wait_iodone(mp->m_log); > + > return error; > } > > @@ -2117,12 +2143,7 @@ xlog_dealloc_log( > * Cycle all the iclogbuf locks to make sure all log IO completion > * is done before we tear down these buffers. > */ > - iclog = log->l_iclog; > - for (i = 0; i < log->l_iclog_bufs; i++) { > - down(&iclog->ic_sema); > - up(&iclog->ic_sema); > - iclog = iclog->ic_next; > - } > + xlog_wait_iodone(log); > > /* > * Destroy the CIL after waiting for iclog IO completion because an > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-15 23:23 ` Darrick J. Wong @ 2022-11-17 1:18 ` Guo Xuenan 0 siblings, 0 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-17 1:18 UTC (permalink / raw) To: Darrick J. Wong Cc: dchinner, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On 2022/11/16 7:23, Darrick J. Wong wrote: > On Tue, Nov 08, 2022 at 10:06:05PM +0800, Guo Xuenan wrote: >> Fix uaf in xfs_trans_ail_delete during xlog force shutdown. >> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in >> xlog_state_shutdown_callbacks") changed the order of running callbacks >> and wait for iclog completion to avoid unmount path untimely destroy AIL. >> But which seems not enough to ensue this, adding mdelay in >> `xfs_buf_item_unpin` can prove that. >> >> The reproduction is as follows. To ensure destroy AIL safely, >> we should wait all xlog ioend workers done and sync the AIL. > Er... could you /please/ start a new thread each time you send a new > revision of this patchset? It's /so/ much easier to pick out patches > for re-review if they're not buried in an existin thread. Oh.. Sorry for my bad habit.I will restart a new thread next time. > I /think/ the code here is ok though I'll probably tweak the comments > before I commit them. Not sure where the other patch went...? I thougth the reviewed patch did not need to send it again. Dave has some comments on this patch, I will send both of them in v4. Thanks, Xuenan > > --D > >> ================================================================== >> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 >> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 >> >> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W >> 6.1.0-rc1-00002-gc28266863c4a #137 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.13.0-1ubuntu1.1 04/01/2014 >> Workqueue: xfs-log/sda xlog_ioend_work >> Call Trace: >> <TASK> >> dump_stack_lvl+0x4d/0x66 >> print_report+0x171/0x4a6 >> kasan_report+0xb3/0x130 >> xfs_trans_ail_delete+0x240/0x2a0 >> xfs_buf_item_done+0x7b/0xa0 >> xfs_buf_ioend+0x1e9/0x11f0 >> xfs_buf_item_unpin+0x4c8/0x860 >> xfs_trans_committed_bulk+0x4c2/0x7c0 >> xlog_cil_committed+0xab6/0xfb0 >> xlog_cil_process_committed+0x117/0x1e0 >> xlog_state_shutdown_callbacks+0x208/0x440 >> xlog_force_shutdown+0x1b3/0x3a0 >> xlog_ioend_work+0xef/0x1d0 >> process_one_work+0x6f9/0xf70 >> worker_thread+0x578/0xf30 >> kthread+0x28c/0x330 >> ret_from_fork+0x1f/0x30 >> </TASK> >> >> Allocated by task 9606: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> __kasan_kmalloc+0x7a/0x90 >> __kmalloc+0x59/0x140 >> kmem_alloc+0xb2/0x2f0 >> xfs_trans_ail_init+0x20/0x320 >> xfs_log_mount+0x37e/0x690 >> xfs_mountfs+0xe36/0x1b40 >> xfs_fs_fill_super+0xc5c/0x1a70 >> get_tree_bdev+0x3c5/0x6c0 >> vfs_get_tree+0x85/0x250 >> path_mount+0xec3/0x1830 >> do_mount+0xef/0x110 >> __x64_sys_mount+0x150/0x1f0 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> Freed by task 9662: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> kasan_save_free_info+0x2a/0x40 >> __kasan_slab_free+0x105/0x1a0 >> __kmem_cache_free+0x99/0x2d0 >> kvfree+0x3a/0x40 >> xfs_log_unmount+0x60/0xf0 >> xfs_unmountfs+0xf3/0x1d0 >> xfs_fs_put_super+0x78/0x300 >> generic_shutdown_super+0x151/0x400 >> kill_block_super+0x9a/0xe0 >> deactivate_locked_super+0x82/0xe0 >> deactivate_super+0x91/0xb0 >> cleanup_mnt+0x32a/0x4a0 >> task_work_run+0x15f/0x240 >> exit_to_user_mode_prepare+0x188/0x190 >> syscall_exit_to_user_mode+0x12/0x30 >> do_syscall_64+0x42/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> The buggy address belongs to the object at ffff888023169400 >> which belongs to the cache kmalloc-128 of size 128 >> The buggy address is located 0 bytes inside of >> 128-byte region [ffff888023169400, ffff888023169480) >> >> The buggy address belongs to the physical page: >> page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 >> index:0xffff888023168f80 pfn:0x23168 >> head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 >> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) >> raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 >> raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >>> ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ^ >> ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ================================================================== >> Disabling lock debugging due to kernel taint >> >> Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") >> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> >> --- >> fs/xfs/xfs_log.c | 33 +++++++++++++++++++++++++++------ >> 1 file changed, 27 insertions(+), 6 deletions(-) >> >> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c >> index f02a0dd522b3..467bac00951c 100644 >> --- a/fs/xfs/xfs_log.c >> +++ b/fs/xfs/xfs_log.c >> @@ -82,6 +82,9 @@ STATIC int >> xlog_iclogs_empty( >> struct xlog *log); >> >> +static void >> +xlog_wait_iodone(struct xlog *log); >> + >> static int >> xfs_log_cover(struct xfs_mount *); >> >> @@ -886,6 +889,23 @@ xlog_force_iclog( >> return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); >> } >> >> +/* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL/CIL. >> + */ >> +static void >> +xlog_wait_iodone(struct xlog *log) >> +{ >> + int i; >> + xlog_in_core_t *iclog = log->l_iclog; >> + >> + for (i = 0; i < log->l_iclog_bufs; i++) { >> + down(&iclog->ic_sema); >> + up(&iclog->ic_sema); >> + iclog = iclog->ic_next; >> + } >> +} >> + >> /* >> * Wait for the iclog and all prior iclogs to be written disk as required by the >> * log force state machine. Waiting on ic_force_wait ensures iclog completions >> @@ -1276,6 +1296,12 @@ xfs_log_cover( >> xfs_ail_push_all_sync(mp->m_ail); >> } while (xfs_log_need_covered(mp)); >> >> + /* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL. >> + */ >> + xlog_wait_iodone(mp->m_log); >> + >> return error; >> } >> >> @@ -2117,12 +2143,7 @@ xlog_dealloc_log( >> * Cycle all the iclogbuf locks to make sure all log IO completion >> * is done before we tear down these buffers. >> */ >> - iclog = log->l_iclog; >> - for (i = 0; i < log->l_iclog_bufs; i++) { >> - down(&iclog->ic_sema); >> - up(&iclog->ic_sema); >> - iclog = iclog->ic_next; >> - } >> + xlog_wait_iodone(log); >> >> /* >> * Destroy the CIL after waiting for iclog IO completion because an >> -- >> 2.31.1 >> > . -- Guo Xuenan [OS Kernel Lab] ----------------------------- Email: guoxuenan@huawei.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-08 14:06 ` [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-15 23:23 ` Darrick J. Wong @ 2022-11-16 6:02 ` Dave Chinner 2022-11-17 2:12 ` Guo Xuenan 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2022-11-16 6:02 UTC (permalink / raw) To: Guo Xuenan Cc: dchinner, djwong, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On Tue, Nov 08, 2022 at 10:06:05PM +0800, Guo Xuenan wrote: > Fix uaf in xfs_trans_ail_delete during xlog force shutdown. > In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in > xlog_state_shutdown_callbacks") changed the order of running callbacks > and wait for iclog completion to avoid unmount path untimely destroy AIL. > But which seems not enough to ensue this, adding mdelay in > `xfs_buf_item_unpin` can prove that. > > The reproduction is as follows. To ensure destroy AIL safely, > we should wait all xlog ioend workers done and sync the AIL. Like Darrick, I didn't see this either because it's in an old thread.... > fs/xfs/xfs_log.c | 33 +++++++++++++++++++++++++++------ > 1 file changed, 27 insertions(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index f02a0dd522b3..467bac00951c 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -82,6 +82,9 @@ STATIC int > xlog_iclogs_empty( > struct xlog *log); > > +static void > +xlog_wait_iodone(struct xlog *log); > + Why do we need a forward prototype definition? > static int > xfs_log_cover(struct xfs_mount *); > > @@ -886,6 +889,23 @@ xlog_force_iclog( > return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); > } > > +/* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL/CIL. > + */ > +static void > +xlog_wait_iodone(struct xlog *log) > +{ > + int i; > + xlog_in_core_t *iclog = log->l_iclog; > + > + for (i = 0; i < log->l_iclog_bufs; i++) { > + down(&iclog->ic_sema); > + up(&iclog->ic_sema); > + iclog = iclog->ic_next; > + } > +} This doesn't guarantee no iclog IO is in progress, just that it waited for each iclog to finish any IO it had in progress. If we are in a normal runtime condition, xfs_log_force(mp, XFS_LOG_SYNC) performs this "wait for journal IO to complete" integrity operation. Which, in normal runtime unmount operation, we get from xfs_log_cover()..... > + > /* > * Wait for the iclog and all prior iclogs to be written disk as required by the > * log force state machine. Waiting on ic_force_wait ensures iclog completions > @@ -1276,6 +1296,12 @@ xfs_log_cover( > xfs_ail_push_all_sync(mp->m_ail); > } while (xfs_log_need_covered(mp)); > > + /* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL. > + */ > + xlog_wait_iodone(mp->m_log); .... and so this call is redundant for normal runtime log quiesce operations. That's because the synchronous transaction run in this loop: do { >>>>>> error = xfs_sync_sb(mp, true); if (error) break; xfs_ail_push_all_sync(mp->m_ail); } while (xfs_log_need_covered(mp)); causes a xfs_log_force(mp, XFS_LOG_SYNC) to be issued to force the transaction to disk and it *waits for the journal IO to complete. Further, it is iclog IO completion that moves the log covered state forwards, so this loop absolutely relies on the journal IO being fully completed and both the CIL and AIL being empty before it will exit. Hence adding xlog_wait_iodone() here would only have an effect if the log had been shut down. however, we never get here if the log has been shut down because the xfs_log_writable(mp) check at the start of xfs_log_cover() will fail. Hence we return without trying to cover the log or wait for iclog completion. IOWs, I don't see how this code does anything to avoid the problem that has been described - if the iclog callback runs the shutdown, by the time it gets to running shutdown callbacks it's already marked both the log and the mount as shut down, so unmount will definitely not run this code and so will not wait for the iclog running shutdown callbacks during IO completion.... ---- Really, the only place this iclog walk matters is in the unmount path prior to tearing down internal structures, and it only matters when ths filesystem has been shut down. That means it needs to be a part of xfs_log_unmount(), not part of the normal runtime freeze/remount readonly/unmount log quiescing. If we are shut down, then we have to guarantee that we've finished processing the iclogs before we tear down the things that use log items - the CIL, the AIL, the iclogs themselves, etc. It *must* also run in the shutdown case, so it can't be put in a function that is conditional on a normal running filesystem. Something like this: void xfs_log_unmount( struct xfs_mount *mp) { xfs_log_clean(mp); + /* + * If shutdown has come from iclog IO context, the log + * cleaning will have been skipped and so we need to wait + * for ithe iclog to complete shutdown processing before we + * tear anything down. + */ + xfs_iclog_iodone_wait(mp->m_log); xfs_buftarg_drain(mp->m_ddev_targp); And the iclog walk can be removed from xlog_dealloc_log() as we've already done the checks it needs before tearing down the iclogs.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-16 6:02 ` Dave Chinner @ 2022-11-17 2:12 ` Guo Xuenan 0 siblings, 0 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-17 2:12 UTC (permalink / raw) To: Dave Chinner Cc: dchinner, djwong, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On 2022/11/16 14:02, Dave Chinner wrote: > On Tue, Nov 08, 2022 at 10:06:05PM +0800, Guo Xuenan wrote: >> Fix uaf in xfs_trans_ail_delete during xlog force shutdown. >> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in >> xlog_state_shutdown_callbacks") changed the order of running callbacks >> and wait for iclog completion to avoid unmount path untimely destroy AIL. >> But which seems not enough to ensue this, adding mdelay in >> `xfs_buf_item_unpin` can prove that. >> >> The reproduction is as follows. To ensure destroy AIL safely, >> we should wait all xlog ioend workers done and sync the AIL. > Like Darrick, I didn't see this either because it's in an old > thread.... Get it and gladly accept :) >> fs/xfs/xfs_log.c | 33 +++++++++++++++++++++++++++------ >> 1 file changed, 27 insertions(+), 6 deletions(-) >> >> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c >> index f02a0dd522b3..467bac00951c 100644 >> --- a/fs/xfs/xfs_log.c >> +++ b/fs/xfs/xfs_log.c >> @@ -82,6 +82,9 @@ STATIC int >> xlog_iclogs_empty( >> struct xlog *log); >> >> +static void >> +xlog_wait_iodone(struct xlog *log); >> + > Why do we need a forward prototype definition? I guess you mean the function name is same as before, if so, this is coincidence. :) >> static int >> xfs_log_cover(struct xfs_mount *); >> >> @@ -886,6 +889,23 @@ xlog_force_iclog( >> return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); >> } >> >> +/* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL/CIL. >> + */ >> +static void >> +xlog_wait_iodone(struct xlog *log) >> +{ >> + int i; >> + xlog_in_core_t *iclog = log->l_iclog; >> + >> + for (i = 0; i < log->l_iclog_bufs; i++) { >> + down(&iclog->ic_sema); >> + up(&iclog->ic_sema); >> + iclog = iclog->ic_next; >> + } >> +} > This doesn't guarantee no iclog IO is in progress, just that it > waited for each iclog to finish any IO it had in progress. If we are > in a normal runtime condition, xfs_log_force(mp, XFS_LOG_SYNC) > performs this "wait for journal IO to complete" integrity operation. > > Which, in normal runtime unmount operation, we get from > xfs_log_cover()..... > >> + >> /* >> * Wait for the iclog and all prior iclogs to be written disk as required by the >> * log force state machine. Waiting on ic_force_wait ensures iclog completions >> @@ -1276,6 +1296,12 @@ xfs_log_cover( >> xfs_ail_push_all_sync(mp->m_ail); >> } while (xfs_log_need_covered(mp)); >> >> + /* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL. >> + */ >> + xlog_wait_iodone(mp->m_log); > .... and so this call is redundant for normal runtime log quiesce > operations. That's because the synchronous transaction run in this > loop: yes, it is redundant,through discussion here[1] with Darrick [1] https://lore.kernel.org/all/8575089d-bf9b-0383-939b-922f1440abf7@huawei.com/ > do { >>>>>>> error = xfs_sync_sb(mp, true); > if (error) > break; > xfs_ail_push_all_sync(mp->m_ail); > } while (xfs_log_need_covered(mp)); > > causes a xfs_log_force(mp, XFS_LOG_SYNC) to be issued to force the > transaction to disk and it *waits for the journal IO to complete. > Further, it is iclog IO completion that moves the log covered state > forwards, so this loop absolutely relies on the journal IO being > fully completed and both the CIL and AIL being empty before it will > exit. > > Hence adding xlog_wait_iodone() here would only have an effect if > the log had been shut down. however, we never get here if the log > has been shut down because the xfs_log_writable(mp) check at the > start of xfs_log_cover() will fail. Hence we return without trying > to cover the log or wait for iclog completion. > > IOWs, I don't see how this code does anything to avoid the problem > that has been described - if the iclog callback runs the shutdown, > by the time it gets to running shutdown callbacks it's already > marked both the log and the mount as shut down, so unmount will > definitely not run this code and so will not wait for the iclog > running shutdown callbacks during IO completion.... Thank you very much for your detailed explanation here. > ---- > > Really, the only place this iclog walk matters is in the unmount > path prior to tearing down internal structures, and it only matters > when ths filesystem has been shut down. That means it needs > to be a part of xfs_log_unmount(), not part of the normal runtime > freeze/remount readonly/unmount log quiescing. > > If we are shut down, then we have to guarantee that we've > finished processing the iclogs before we tear down the things that > use log items - the CIL, the AIL, the iclogs themselves, etc. It > *must* also run in the shutdown case, so it can't be put in a > function that is conditional on a normal running filesystem. > Something like this: > > void > xfs_log_unmount( > struct xfs_mount *mp) > { > xfs_log_clean(mp); > > + /* > + * If shutdown has come from iclog IO context, the log > + * cleaning will have been skipped and so we need to wait > + * for ithe iclog to complete shutdown processing before we > + * tear anything down. > + */ > + xfs_iclog_iodone_wait(mp->m_log); > xfs_buftarg_drain(mp->m_ddev_targp); > > And the iclog walk can be removed from xlog_dealloc_log() as we've > already done the checks it needs before tearing down the iclogs.... OK, I will send v4 in a new thread according to your kindly and detailed suggestion. Best regards Xuenan > Cheers, > > Dave. -- Guo Xuenan [OS Kernel Lab] ----------------------------- Email: guoxuenan@huawei.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-07 14:27 ` [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-07 14:27 ` [PATCH v2 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan @ 2022-11-07 17:13 ` Darrick J. Wong 2022-11-08 2:44 ` Guo Xuenan 1 sibling, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2022-11-07 17:13 UTC (permalink / raw) To: Guo Xuenan Cc: dchinner, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On Mon, Nov 07, 2022 at 10:27:15PM +0800, Guo Xuenan wrote: > Fix uaf in xfs_trans_ail_delete during xlog force shutdown. > In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in > xlog_state_shutdown_callbacks") changed the order of running callbacks > and wait for iclog completion to avoid unmount path untimely destroy AIL. > But which seems not enough to ensue this, adding mdelay in > `xfs_buf_item_unpin` can prove that. > > The reproduction is as follows. To ensure destroy AIL safely, > we should wait all xlog ioend workers done and sync the AIL. > > ================================================================== > BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 > Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 > > CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W > 6.1.0-rc1-00002-gc28266863c4a #137 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-1ubuntu1.1 04/01/2014 > Workqueue: xfs-log/sda xlog_ioend_work > Call Trace: > <TASK> > dump_stack_lvl+0x4d/0x66 > print_report+0x171/0x4a6 > kasan_report+0xb3/0x130 > xfs_trans_ail_delete+0x240/0x2a0 > xfs_buf_item_done+0x7b/0xa0 > xfs_buf_ioend+0x1e9/0x11f0 > xfs_buf_item_unpin+0x4c8/0x860 > xfs_trans_committed_bulk+0x4c2/0x7c0 > xlog_cil_committed+0xab6/0xfb0 > xlog_cil_process_committed+0x117/0x1e0 > xlog_state_shutdown_callbacks+0x208/0x440 > xlog_force_shutdown+0x1b3/0x3a0 > xlog_ioend_work+0xef/0x1d0 > process_one_work+0x6f9/0xf70 > worker_thread+0x578/0xf30 > kthread+0x28c/0x330 > ret_from_fork+0x1f/0x30 > </TASK> > > Allocated by task 9606: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > __kasan_kmalloc+0x7a/0x90 > __kmalloc+0x59/0x140 > kmem_alloc+0xb2/0x2f0 > xfs_trans_ail_init+0x20/0x320 > xfs_log_mount+0x37e/0x690 > xfs_mountfs+0xe36/0x1b40 > xfs_fs_fill_super+0xc5c/0x1a70 > get_tree_bdev+0x3c5/0x6c0 > vfs_get_tree+0x85/0x250 > path_mount+0xec3/0x1830 > do_mount+0xef/0x110 > __x64_sys_mount+0x150/0x1f0 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Freed by task 9662: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x40 > __kasan_slab_free+0x105/0x1a0 > __kmem_cache_free+0x99/0x2d0 > kvfree+0x3a/0x40 > xfs_log_unmount+0x60/0xf0 > xfs_unmountfs+0xf3/0x1d0 > xfs_fs_put_super+0x78/0x300 > generic_shutdown_super+0x151/0x400 > kill_block_super+0x9a/0xe0 > deactivate_locked_super+0x82/0xe0 > deactivate_super+0x91/0xb0 > cleanup_mnt+0x32a/0x4a0 > task_work_run+0x15f/0x240 > exit_to_user_mode_prepare+0x188/0x190 > syscall_exit_to_user_mode+0x12/0x30 > do_syscall_64+0x42/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > The buggy address belongs to the object at ffff888023169400 > which belongs to the cache kmalloc-128 of size 128 > The buggy address is located 0 bytes inside of > 128-byte region [ffff888023169400, ffff888023169480) > > The buggy address belongs to the physical page: > page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 > index:0xffff888023168f80 pfn:0x23168 > head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 > flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) > raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 > raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ================================================================== > Disabling lock debugging due to kernel taint > > Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") > Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> > --- > fs/xfs/xfs_log.c | 33 ++++++++++++++++++++++----------- > 1 file changed, 22 insertions(+), 11 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index f02a0dd522b3..89fcd2b8cdfc 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -82,6 +82,9 @@ STATIC int > xlog_iclogs_empty( > struct xlog *log); > > +static void > +xlog_wait_iodone(struct xlog *log); > + > static int > xfs_log_cover(struct xfs_mount *); > > @@ -886,6 +889,23 @@ xlog_force_iclog( > return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); > } > > +/* > + * Cycle all the iclogbuf locks to make sure all log IO completion > + * is done before we tear down AIL/CIL. > + */ > +static void > +xlog_wait_iodone(struct xlog *log) It's ok to have pulled this into a helper function, but to repeat a question from v1: why shouldn't this be done as part of xfs_log_quiesce? Is there a particular reason why we *don't* want to wait for log IO completions as part of a freeze? I guess that's because a frozen dead filesystem still has all the incore state, so log IO completion work racing with the freeze isn't a big deal? > +{ > + int i; > + xlog_in_core_t *iclog = log->l_iclog; Nit: Please use 'struct xlog_in_core', not the typedefs. > + > + for (i = 0; i < log->l_iclog_bufs; i++) { > + down(&iclog->ic_sema); > + up(&iclog->ic_sema); > + iclog = iclog->ic_next; > + } > +} > + > /* > * Wait for the iclog and all prior iclogs to be written disk as required by the > * log force state machine. Waiting on ic_force_wait ensures iclog completions > @@ -1095,6 +1115,7 @@ xfs_log_clean( > struct xfs_mount *mp) > { > xfs_log_quiesce(mp); > + xlog_wait_iodone(mp->m_log); Wherever you inject this callsite, it needs a comment stating /why/ it's necessary to wait for the iclog completions even after a log force and xfs_ail_push_all_sync call. --D > xfs_log_unmount_write(mp); > } > > @@ -2113,17 +2134,7 @@ xlog_dealloc_log( > xlog_in_core_t *iclog, *next_iclog; > int i; > > - /* > - * Cycle all the iclogbuf locks to make sure all log IO completion > - * is done before we tear down these buffers. > - */ > - iclog = log->l_iclog; > - for (i = 0; i < log->l_iclog_bufs; i++) { > - down(&iclog->ic_sema); > - up(&iclog->ic_sema); > - iclog = iclog->ic_next; > - } > - > + xlog_wait_iodone(log); > /* > * Destroy the CIL after waiting for iclog IO completion because an > * iclog EIO error will try to shut down the log, which accesses the > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL 2022-11-07 17:13 ` [PATCH v2 " Darrick J. Wong @ 2022-11-08 2:44 ` Guo Xuenan 0 siblings, 0 replies; 17+ messages in thread From: Guo Xuenan @ 2022-11-08 2:44 UTC (permalink / raw) To: Darrick J. Wong Cc: dchinner, fangwei1, houtao1, jack.qiu, leo.lilong, linux-xfs, yi.zhang, zengheng4, zhengbin13 On 2022/11/8 1:13, Darrick J. Wong wrote: > On Mon, Nov 07, 2022 at 10:27:15PM +0800, Guo Xuenan wrote: >> Fix uaf in xfs_trans_ail_delete during xlog force shutdown. >> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in >> xlog_state_shutdown_callbacks") changed the order of running callbacks >> and wait for iclog completion to avoid unmount path untimely destroy AIL. >> But which seems not enough to ensue this, adding mdelay in >> `xfs_buf_item_unpin` can prove that. >> >> The reproduction is as follows. To ensure destroy AIL safely, >> we should wait all xlog ioend workers done and sync the AIL. >> >> ================================================================== >> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 >> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 >> >> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W >> 6.1.0-rc1-00002-gc28266863c4a #137 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.13.0-1ubuntu1.1 04/01/2014 >> Workqueue: xfs-log/sda xlog_ioend_work >> Call Trace: >> <TASK> >> dump_stack_lvl+0x4d/0x66 >> print_report+0x171/0x4a6 >> kasan_report+0xb3/0x130 >> xfs_trans_ail_delete+0x240/0x2a0 >> xfs_buf_item_done+0x7b/0xa0 >> xfs_buf_ioend+0x1e9/0x11f0 >> xfs_buf_item_unpin+0x4c8/0x860 >> xfs_trans_committed_bulk+0x4c2/0x7c0 >> xlog_cil_committed+0xab6/0xfb0 >> xlog_cil_process_committed+0x117/0x1e0 >> xlog_state_shutdown_callbacks+0x208/0x440 >> xlog_force_shutdown+0x1b3/0x3a0 >> xlog_ioend_work+0xef/0x1d0 >> process_one_work+0x6f9/0xf70 >> worker_thread+0x578/0xf30 >> kthread+0x28c/0x330 >> ret_from_fork+0x1f/0x30 >> </TASK> >> >> Allocated by task 9606: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> __kasan_kmalloc+0x7a/0x90 >> __kmalloc+0x59/0x140 >> kmem_alloc+0xb2/0x2f0 >> xfs_trans_ail_init+0x20/0x320 >> xfs_log_mount+0x37e/0x690 >> xfs_mountfs+0xe36/0x1b40 >> xfs_fs_fill_super+0xc5c/0x1a70 >> get_tree_bdev+0x3c5/0x6c0 >> vfs_get_tree+0x85/0x250 >> path_mount+0xec3/0x1830 >> do_mount+0xef/0x110 >> __x64_sys_mount+0x150/0x1f0 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> Freed by task 9662: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> kasan_save_free_info+0x2a/0x40 >> __kasan_slab_free+0x105/0x1a0 >> __kmem_cache_free+0x99/0x2d0 >> kvfree+0x3a/0x40 >> xfs_log_unmount+0x60/0xf0 >> xfs_unmountfs+0xf3/0x1d0 >> xfs_fs_put_super+0x78/0x300 >> generic_shutdown_super+0x151/0x400 >> kill_block_super+0x9a/0xe0 >> deactivate_locked_super+0x82/0xe0 >> deactivate_super+0x91/0xb0 >> cleanup_mnt+0x32a/0x4a0 >> task_work_run+0x15f/0x240 >> exit_to_user_mode_prepare+0x188/0x190 >> syscall_exit_to_user_mode+0x12/0x30 >> do_syscall_64+0x42/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> The buggy address belongs to the object at ffff888023169400 >> which belongs to the cache kmalloc-128 of size 128 >> The buggy address is located 0 bytes inside of >> 128-byte region [ffff888023169400, ffff888023169480) >> >> The buggy address belongs to the physical page: >> page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 >> index:0xffff888023168f80 pfn:0x23168 >> head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 >> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) >> raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 >> raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >>> ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ^ >> ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ================================================================== >> Disabling lock debugging due to kernel taint >> >> Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") >> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> >> --- >> fs/xfs/xfs_log.c | 33 ++++++++++++++++++++++----------- >> 1 file changed, 22 insertions(+), 11 deletions(-) >> >> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c >> index f02a0dd522b3..89fcd2b8cdfc 100644 >> --- a/fs/xfs/xfs_log.c >> +++ b/fs/xfs/xfs_log.c >> @@ -82,6 +82,9 @@ STATIC int >> xlog_iclogs_empty( >> struct xlog *log); >> >> +static void >> +xlog_wait_iodone(struct xlog *log); >> + >> static int >> xfs_log_cover(struct xfs_mount *); >> >> @@ -886,6 +889,23 @@ xlog_force_iclog( >> return xlog_state_release_iclog(iclog->ic_log, iclog, NULL); >> } >> >> +/* >> + * Cycle all the iclogbuf locks to make sure all log IO completion >> + * is done before we tear down AIL/CIL. >> + */ >> +static void >> +xlog_wait_iodone(struct xlog *log) > It's ok to have pulled this into a helper function, but to repeat a > question from v1: why shouldn't this be done as part of xfs_log_quiesce? > Is there a particular reason why we *don't* want to wait for log IO > completions as part of a freeze? I guess that's because a frozen dead > filesystem still has all the incore state, so log IO completion work > racing with the freeze isn't a big deal? yes, you're right, as you guess, I thought the umount process need to wait xlog io compeletion, and fix it not affecting other path (xfs freeze).But, as you pointed out, I think put this into xfs_log_quiesce have no problem.I will resend v3 laterly >> +{ >> + int i; >> + xlog_in_core_t *iclog = log->l_iclog; > Nit: Please use 'struct xlog_in_core', not the typedefs. Ok, will fix it. >> + >> + for (i = 0; i < log->l_iclog_bufs; i++) { >> + down(&iclog->ic_sema); >> + up(&iclog->ic_sema); >> + iclog = iclog->ic_next; >> + } >> +} >> + >> /* >> * Wait for the iclog and all prior iclogs to be written disk as required by the >> * log force state machine. Waiting on ic_force_wait ensures iclog completions >> @@ -1095,6 +1115,7 @@ xfs_log_clean( >> struct xfs_mount *mp) >> { >> xfs_log_quiesce(mp); >> + xlog_wait_iodone(mp->m_log); > Wherever you inject this callsite, it needs a comment stating /why/ it's > necessary to wait for the iclog completions even after a log force and > xfs_ail_push_all_sync call. OK,comments will be added. Best regards, Xuenan. > --D > >> xfs_log_unmount_write(mp); >> } >> >> @@ -2113,17 +2134,7 @@ xlog_dealloc_log( >> xlog_in_core_t *iclog, *next_iclog; >> int i; >> >> - /* >> - * Cycle all the iclogbuf locks to make sure all log IO completion >> - * is done before we tear down these buffers. >> - */ >> - iclog = log->l_iclog; >> - for (i = 0; i < log->l_iclog_bufs; i++) { >> - down(&iclog->ic_sema); >> - up(&iclog->ic_sema); >> - iclog = iclog->ic_next; >> - } >> - >> + xlog_wait_iodone(log); >> /* >> * Destroy the CIL after waiting for iclog IO completion because an >> * iclog EIO error will try to shut down the log, which accesses the >> -- >> 2.31.1 >> > . -- Guo Xuenan [OS Kernel Lab] ----------------------------- Email: guoxuenan@huawei.com ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-11-17 2:13 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-03 8:36 [PATCH 0/2] xfs: shutdown UAF fixes Guo Xuenan 2022-11-03 8:36 ` [PATCH 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-03 21:16 ` Dave Chinner 2022-11-04 7:50 ` Guo Xuenan 2022-11-04 15:46 ` Darrick J. Wong 2022-11-05 3:32 ` Guo Xuenan 2022-11-03 8:36 ` [PATCH 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan 2022-11-03 21:44 ` Dave Chinner 2022-11-07 14:27 ` [PATCH v2 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-07 14:27 ` [PATCH v2 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan 2022-11-08 14:06 ` [PATCH v3 1/2] xfs: wait xlog ioend workqueue drained before tearing down AIL Guo Xuenan 2022-11-15 23:23 ` Darrick J. Wong 2022-11-17 1:18 ` Guo Xuenan 2022-11-16 6:02 ` Dave Chinner 2022-11-17 2:12 ` Guo Xuenan 2022-11-07 17:13 ` [PATCH v2 " Darrick J. Wong 2022-11-08 2:44 ` Guo Xuenan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.