All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] xfs: shutdown UAF fixes
@ 2022-11-18 12:11 Guo Xuenan
  2022-11-18 12:11 ` [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL Guo Xuenan
  2022-11-18 12:11 ` [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan
  0 siblings, 2 replies; 8+ messages in thread
From: Guo Xuenan @ 2022-11-18 12:11 UTC (permalink / raw)
  To: djwong, dchinner
  Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

Hi xfs folks:

The following patches fix some race of xfs force shutdown. 

Patch 1 fix uaf in xfs_trans_ail_delete during xlog force shutdown.
In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in
xlog_state_shutdown_callbacks"), seems not enough to avoid UAF of AIL
before it being tear down in umount.

Patch 2 fix uaf of super block buffer log item during xlog shut down,
since xfs buf log item can be reloged, super block buffer is most
frequently modified of all xfs_buf, especially when disable lazy-count
feature, during force shutdown we will unpin and release log item, due
to xfs relog mechanism, which may release the log item alread inserted
in CIL.

I reproduce the two problems, /importantly/, adding following patches and
disable lazy-count feature to increase recurrence probability.

Kernel patch for reproduce the issue of patch 1:
```
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 522d450a94b1..b1221d517c00 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -503,6 +503,9 @@ xfs_buf_item_unpin(
         * the AIL properly holds a reference on the bli.
         */
        freed = atomic_dec_and_test(&bip->bli_refcount);
+       if (remove)
+               mdelay(1000);
+
        if (freed && !stale && remove)
                xfs_buf_hold(bp);
        if (atomic_dec_and_test(&bp->b_pin_count))

```

Kernel patch for reproduce the issue of patch 2:
```
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..b1aac4a7576c 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -25,6 +25,9 @@
 #include "xfs_dquot.h"
 #include "xfs_icache.h"
 
+#include <linux/delay.h>
+#include "xfs_log_priv.h"
+
 struct kmem_cache      *xfs_trans_cache;
 
 #if defined(CONFIG_TRACEPOINTS)
@@ -1002,6 +1005,8 @@ __xfs_trans_commit(
                xfs_trans_apply_sb_deltas(tp);
        xfs_trans_apply_dquot_deltas(tp);
 
+       if (xlog_is_shutdown(log))
+               mdelay(1000);
        xlog_cil_commit(log, tp, &commit_seq, regrant);
 
        xfs_trans_free(tp);
```

Guo Xuenan (2):
  xfs: wait xlog ioend workqueue drained before tearing down AIL
  xfs: fix super block buf log item UAF during force shutdown

 fs/xfs/xfs_buf_item.c  | 8 +++++---
 fs/xfs/xfs_trans_ail.c | 3 +++
 2 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL
  2022-11-18 12:11 [PATCH v5 0/2] xfs: shutdown UAF fixes Guo Xuenan
@ 2022-11-18 12:11 ` Guo Xuenan
  2022-11-21 18:11   ` Darrick J. Wong
  2022-11-18 12:11 ` [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan
  1 sibling, 1 reply; 8+ messages in thread
From: Guo Xuenan @ 2022-11-18 12:11 UTC (permalink / raw)
  To: djwong, dchinner
  Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

Fix uaf in xfs_trans_ail_delete during xlog force shutdown.
In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in
xlog_state_shutdown_callbacks") changed the order of running callbacks
and wait for iclog completion to avoid unmount path untimely destroy AIL.
But which seems not enough to ensue this, adding mdelay in
`xfs_buf_item_unpin` can prove that.

The reproduction is as follows. To ensure destroy AIL safely,
we should wait all xlog ioend workers done and sync the AIL.

==================================================================
BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0
Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43

CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G        W
6.1.0-rc1-00002-gc28266863c4a #137
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: xfs-log/sda xlog_ioend_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x4d/0x66
 print_report+0x171/0x4a6
 kasan_report+0xb3/0x130
 xfs_trans_ail_delete+0x240/0x2a0
 xfs_buf_item_done+0x7b/0xa0
 xfs_buf_ioend+0x1e9/0x11f0
 xfs_buf_item_unpin+0x4c8/0x860
 xfs_trans_committed_bulk+0x4c2/0x7c0
 xlog_cil_committed+0xab6/0xfb0
 xlog_cil_process_committed+0x117/0x1e0
 xlog_state_shutdown_callbacks+0x208/0x440
 xlog_force_shutdown+0x1b3/0x3a0
 xlog_ioend_work+0xef/0x1d0
 process_one_work+0x6f9/0xf70
 worker_thread+0x578/0xf30
 kthread+0x28c/0x330
 ret_from_fork+0x1f/0x30
 </TASK>

Allocated by task 9606:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 __kasan_kmalloc+0x7a/0x90
 __kmalloc+0x59/0x140
 kmem_alloc+0xb2/0x2f0
 xfs_trans_ail_init+0x20/0x320
 xfs_log_mount+0x37e/0x690
 xfs_mountfs+0xe36/0x1b40
 xfs_fs_fill_super+0xc5c/0x1a70
 get_tree_bdev+0x3c5/0x6c0
 vfs_get_tree+0x85/0x250
 path_mount+0xec3/0x1830
 do_mount+0xef/0x110
 __x64_sys_mount+0x150/0x1f0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 9662:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_save_free_info+0x2a/0x40
 __kasan_slab_free+0x105/0x1a0
 __kmem_cache_free+0x99/0x2d0
 kvfree+0x3a/0x40
 xfs_log_unmount+0x60/0xf0
 xfs_unmountfs+0xf3/0x1d0
 xfs_fs_put_super+0x78/0x300
 generic_shutdown_super+0x151/0x400
 kill_block_super+0x9a/0xe0
 deactivate_locked_super+0x82/0xe0
 deactivate_super+0x91/0xb0
 cleanup_mnt+0x32a/0x4a0
 task_work_run+0x15f/0x240
 exit_to_user_mode_prepare+0x188/0x190
 syscall_exit_to_user_mode+0x12/0x30
 do_syscall_64+0x42/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888023169400
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
 128-byte region [ffff888023169400, ffff888023169480)

The buggy address belongs to the physical page:
page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff888023168f80 pfn:0x23168
head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0
raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks")
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
---
 fs/xfs/xfs_log.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 0141d9907d31..fc61cc024023 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -888,6 +888,23 @@ xlog_force_iclog(
 	return xlog_state_release_iclog(iclog->ic_log, iclog, NULL);
 }
 
+/*
+ * Cycle all the iclogbuf locks to make sure all log IO completion
+ * is done before we tear down these buffers.
+ */
+static void
+xlog_wait_iclog_completion(struct xlog *log)
+{
+	int		i;
+	struct xlog_in_core	*iclog = log->l_iclog;
+
+	for (i = 0; i < log->l_iclog_bufs; i++) {
+		down(&iclog->ic_sema);
+		up(&iclog->ic_sema);
+		iclog = iclog->ic_next;
+	}
+}
+
 /*
  * Wait for the iclog and all prior iclogs to be written disk as required by the
  * log force state machine. Waiting on ic_force_wait ensures iclog completions
@@ -1113,6 +1130,14 @@ xfs_log_unmount(
 {
 	xfs_log_clean(mp);
 
+	/*
+	 * If shutdown has come from iclog IO context, the log
+	 * cleaning will have been skipped and so we need to wait
+	 * for the iclog to complete shutdown processing before we
+	 * tear anything down.
+	 */
+	xlog_wait_iclog_completion(mp->m_log);
+
 	xfs_buftarg_drain(mp->m_ddev_targp);
 
 	xfs_trans_ail_destroy(mp);
@@ -2115,17 +2140,6 @@ xlog_dealloc_log(
 	xlog_in_core_t	*iclog, *next_iclog;
 	int		i;
 
-	/*
-	 * Cycle all the iclogbuf locks to make sure all log IO completion
-	 * is done before we tear down these buffers.
-	 */
-	iclog = log->l_iclog;
-	for (i = 0; i < log->l_iclog_bufs; i++) {
-		down(&iclog->ic_sema);
-		up(&iclog->ic_sema);
-		iclog = iclog->ic_next;
-	}
-
 	/*
 	 * Destroy the CIL after waiting for iclog IO completion because an
 	 * iclog EIO error will try to shut down the log, which accesses the
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown
  2022-11-18 12:11 [PATCH v5 0/2] xfs: shutdown UAF fixes Guo Xuenan
  2022-11-18 12:11 ` [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL Guo Xuenan
@ 2022-11-18 12:11 ` Guo Xuenan
  2022-11-21 18:14   ` Darrick J. Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Guo Xuenan @ 2022-11-18 12:11 UTC (permalink / raw)
  To: djwong, dchinner
  Cc: linux-xfs, guoxuenan, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

xfs log io error will trigger xlog shut down, and end_io worker call
xlog_state_shutdown_callbacks to unpin and release the buf log item.
The race condition is that when there are some thread doing transaction
commit and happened not to be intercepted by xlog_is_shutdown, then,
these log item will be insert into CIL, when unpin and release these
buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
can increase recurrence probability.

The following call graph actually encountered this bad situation.
fsstress                    io end worker kworker/0:1H-216
                            xlog_ioend_work
                              ->xlog_force_shutdown
                                ->xlog_state_shutdown_callbacks
                                  ->xlog_cil_process_committed
                                    ->xlog_cil_committed
                                      ->xfs_trans_committed_bulk
->xfs_trans_apply_sb_deltas             ->li_ops->iop_unpin(lip, 1);
  ->xfs_trans_getsb
    ->_xfs_trans_bjoin
      ->xfs_buf_item_init
        ->if (bip) { return 0;} //relog
->xlog_cil_commit
  ->xlog_cil_insert_items //insert into CIL
                                           ->xfs_buf_ioend_fail(bp);
                                             ->xfs_buf_ioend
                                               ->xfs_buf_item_done
                                                 ->xfs_buf_item_relse
                                                   ->xfs_buf_item_free

when cil push worker gather percpu cil and insert super block buf log item
into ctx->log_items then uaf occurs.

==================================================================
BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105

CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
6.1.0-rc1-00001-g274115149b42 #136
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: xfs-cil/sda xlog_cil_push_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x4d/0x66
 print_report+0x171/0x4a6
 kasan_report+0xb3/0x130
 xlog_cil_push_work+0x1c8f/0x22f0
 process_one_work+0x6f9/0xf70
 worker_thread+0x578/0xf30
 kthread+0x28c/0x330
 ret_from_fork+0x1f/0x30
 </TASK>

Allocated by task 2145:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 __kasan_slab_alloc+0x54/0x60
 kmem_cache_alloc+0x14a/0x510
 xfs_buf_item_init+0x160/0x6d0
 _xfs_trans_bjoin+0x7f/0x2e0
 xfs_trans_getsb+0xb6/0x3f0
 xfs_trans_apply_sb_deltas+0x1f/0x8c0
 __xfs_trans_commit+0xa25/0xe10
 xfs_symlink+0xe23/0x1660
 xfs_vn_symlink+0x157/0x280
 vfs_symlink+0x491/0x790
 do_symlinkat+0x128/0x220
 __x64_sys_symlink+0x7a/0x90
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 216:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_save_free_info+0x2a/0x40
 __kasan_slab_free+0x105/0x1a0
 kmem_cache_free+0xb6/0x460
 xfs_buf_ioend+0x1e9/0x11f0
 xfs_buf_item_unpin+0x3d6/0x840
 xfs_trans_committed_bulk+0x4c2/0x7c0
 xlog_cil_committed+0xab6/0xfb0
 xlog_cil_process_committed+0x117/0x1e0
 xlog_state_shutdown_callbacks+0x208/0x440
 xlog_force_shutdown+0x1b3/0x3a0
 xlog_ioend_work+0xef/0x1d0
 process_one_work+0x6f9/0xf70
 worker_thread+0x578/0xf30
 kthread+0x28c/0x330
 ret_from_fork+0x1f/0x30

The buggy address belongs to the object at ffff88801800f388
 which belongs to the cache xfs_buf_item of size 272
The buggy address is located 104 bytes inside of
 272-byte region [ffff88801800f388, ffff88801800f498)

The buggy address belongs to the physical page:
page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff88801800f208 pfn:0x1800e
head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
---
 fs/xfs/xfs_buf_item.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 522d450a94b1..df7322ed73fa 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1018,6 +1018,8 @@ xfs_buf_item_relse(
 	trace_xfs_buf_item_relse(bp, _RET_IP_);
 	ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
 
+	if (atomic_read(&bip->bli_refcount))
+		return;
 	bp->b_log_item = NULL;
 	xfs_buf_rele(bp);
 	xfs_buf_item_free(bip);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL
  2022-11-18 12:11 ` [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL Guo Xuenan
@ 2022-11-21 18:11   ` Darrick J. Wong
  2022-11-22  2:13     ` Guo Xuenan
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2022-11-21 18:11 UTC (permalink / raw)
  To: Guo Xuenan
  Cc: dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

On Fri, Nov 18, 2022 at 08:11:42PM +0800, Guo Xuenan wrote:
> Fix uaf in xfs_trans_ail_delete during xlog force shutdown.
> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in
> xlog_state_shutdown_callbacks") changed the order of running callbacks
> and wait for iclog completion to avoid unmount path untimely destroy AIL.
> But which seems not enough to ensue this, adding mdelay in
> `xfs_buf_item_unpin` can prove that.
> 
> The reproduction is as follows. To ensure destroy AIL safely,
> we should wait all xlog ioend workers done and sync the AIL.
> 
> ==================================================================
> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0
> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43
> 
> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G        W
> 6.1.0-rc1-00002-gc28266863c4a #137
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Workqueue: xfs-log/sda xlog_ioend_work
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4d/0x66
>  print_report+0x171/0x4a6
>  kasan_report+0xb3/0x130
>  xfs_trans_ail_delete+0x240/0x2a0
>  xfs_buf_item_done+0x7b/0xa0
>  xfs_buf_ioend+0x1e9/0x11f0
>  xfs_buf_item_unpin+0x4c8/0x860
>  xfs_trans_committed_bulk+0x4c2/0x7c0
>  xlog_cil_committed+0xab6/0xfb0
>  xlog_cil_process_committed+0x117/0x1e0
>  xlog_state_shutdown_callbacks+0x208/0x440
>  xlog_force_shutdown+0x1b3/0x3a0
>  xlog_ioend_work+0xef/0x1d0
>  process_one_work+0x6f9/0xf70
>  worker_thread+0x578/0xf30
>  kthread+0x28c/0x330
>  ret_from_fork+0x1f/0x30
>  </TASK>
> 
> Allocated by task 9606:
>  kasan_save_stack+0x1e/0x40
>  kasan_set_track+0x21/0x30
>  __kasan_kmalloc+0x7a/0x90
>  __kmalloc+0x59/0x140
>  kmem_alloc+0xb2/0x2f0
>  xfs_trans_ail_init+0x20/0x320
>  xfs_log_mount+0x37e/0x690
>  xfs_mountfs+0xe36/0x1b40
>  xfs_fs_fill_super+0xc5c/0x1a70
>  get_tree_bdev+0x3c5/0x6c0
>  vfs_get_tree+0x85/0x250
>  path_mount+0xec3/0x1830
>  do_mount+0xef/0x110
>  __x64_sys_mount+0x150/0x1f0
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Freed by task 9662:
>  kasan_save_stack+0x1e/0x40
>  kasan_set_track+0x21/0x30
>  kasan_save_free_info+0x2a/0x40
>  __kasan_slab_free+0x105/0x1a0
>  __kmem_cache_free+0x99/0x2d0
>  kvfree+0x3a/0x40
>  xfs_log_unmount+0x60/0xf0
>  xfs_unmountfs+0xf3/0x1d0
>  xfs_fs_put_super+0x78/0x300
>  generic_shutdown_super+0x151/0x400
>  kill_block_super+0x9a/0xe0
>  deactivate_locked_super+0x82/0xe0
>  deactivate_super+0x91/0xb0
>  cleanup_mnt+0x32a/0x4a0
>  task_work_run+0x15f/0x240
>  exit_to_user_mode_prepare+0x188/0x190
>  syscall_exit_to_user_mode+0x12/0x30
>  do_syscall_64+0x42/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> The buggy address belongs to the object at ffff888023169400
>  which belongs to the cache kmalloc-128 of size 128
> The buggy address is located 0 bytes inside of
>  128-byte region [ffff888023169400, ffff888023169480)
> 
> The buggy address belongs to the physical page:
> page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000
> index:0xffff888023168f80 pfn:0x23168
> head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0
> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
> raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0
> raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                    ^
>  ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
> Disabling lock debugging due to kernel taint
> 
> Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks")
> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> ---
>  fs/xfs/xfs_log.c | 36 +++++++++++++++++++++++++-----------
>  1 file changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 0141d9907d31..fc61cc024023 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -888,6 +888,23 @@ xlog_force_iclog(
>  	return xlog_state_release_iclog(iclog->ic_log, iclog, NULL);
>  }
>  
> +/*
> + * Cycle all the iclogbuf locks to make sure all log IO completion
> + * is done before we tear down these buffers.
> + */
> +static void
> +xlog_wait_iclog_completion(struct xlog *log)
> +{
> +	int		i;
> +	struct xlog_in_core	*iclog = log->l_iclog;
> +
> +	for (i = 0; i < log->l_iclog_bufs; i++) {
> +		down(&iclog->ic_sema);
> +		up(&iclog->ic_sema);
> +		iclog = iclog->ic_next;
> +	}
> +}
> +
>  /*
>   * Wait for the iclog and all prior iclogs to be written disk as required by the
>   * log force state machine. Waiting on ic_force_wait ensures iclog completions
> @@ -1113,6 +1130,14 @@ xfs_log_unmount(
>  {
>  	xfs_log_clean(mp);
>  
> +	/*
> +	 * If shutdown has come from iclog IO context, the log
> +	 * cleaning will have been skipped and so we need to wait
> +	 * for the iclog to complete shutdown processing before we
> +	 * tear anything down.
> +	 */
> +	xlog_wait_iclog_completion(mp->m_log);
> +
>  	xfs_buftarg_drain(mp->m_ddev_targp);
>  
>  	xfs_trans_ail_destroy(mp);
> @@ -2115,17 +2140,6 @@ xlog_dealloc_log(
>  	xlog_in_core_t	*iclog, *next_iclog;
>  	int		i;
>  
> -	/*
> -	 * Cycle all the iclogbuf locks to make sure all log IO completion
> -	 * is done before we tear down these buffers.
> -	 */
> -	iclog = log->l_iclog;
> -	for (i = 0; i < log->l_iclog_bufs; i++) {
> -		down(&iclog->ic_sema);
> -		up(&iclog->ic_sema);
> -		iclog = iclog->ic_next;
> -	}

Why is it ok to remove this loop here?  If the xlog_recover() call in
xfs_log_mount shuts down the log, don't we still need to cycle the iclog
locks to make sure the completions have all run before we free them?

--D

> -
>  	/*
>  	 * Destroy the CIL after waiting for iclog IO completion because an
>  	 * iclog EIO error will try to shut down the log, which accesses the
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown
  2022-11-18 12:11 ` [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan
@ 2022-11-21 18:14   ` Darrick J. Wong
  2022-11-22  1:56     ` Guo Xuenan
       [not found]     ` <acd05d3a-9df0-9e64-6846-f24626b80e74@huawei.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2022-11-21 18:14 UTC (permalink / raw)
  To: Guo Xuenan
  Cc: dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

On Fri, Nov 18, 2022 at 08:11:43PM +0800, Guo Xuenan wrote:
> xfs log io error will trigger xlog shut down, and end_io worker call
> xlog_state_shutdown_callbacks to unpin and release the buf log item.
> The race condition is that when there are some thread doing transaction
> commit and happened not to be intercepted by xlog_is_shutdown, then,
> these log item will be insert into CIL, when unpin and release these
> buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
> can increase recurrence probability.
> 
> The following call graph actually encountered this bad situation.
> fsstress                    io end worker kworker/0:1H-216
>                             xlog_ioend_work
>                               ->xlog_force_shutdown
>                                 ->xlog_state_shutdown_callbacks
>                                   ->xlog_cil_process_committed
>                                     ->xlog_cil_committed
>                                       ->xfs_trans_committed_bulk
> ->xfs_trans_apply_sb_deltas             ->li_ops->iop_unpin(lip, 1);
>   ->xfs_trans_getsb
>     ->_xfs_trans_bjoin
>       ->xfs_buf_item_init
>         ->if (bip) { return 0;} //relog
> ->xlog_cil_commit
>   ->xlog_cil_insert_items //insert into CIL
>                                            ->xfs_buf_ioend_fail(bp);
>                                              ->xfs_buf_ioend
>                                                ->xfs_buf_item_done
>                                                  ->xfs_buf_item_relse
>                                                    ->xfs_buf_item_free
> 
> when cil push worker gather percpu cil and insert super block buf log item
> into ctx->log_items then uaf occurs.
> 
> ==================================================================
> BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
> Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
> 
> CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
> 6.1.0-rc1-00001-g274115149b42 #136
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Workqueue: xfs-cil/sda xlog_cil_push_work
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4d/0x66
>  print_report+0x171/0x4a6
>  kasan_report+0xb3/0x130
>  xlog_cil_push_work+0x1c8f/0x22f0
>  process_one_work+0x6f9/0xf70
>  worker_thread+0x578/0xf30
>  kthread+0x28c/0x330
>  ret_from_fork+0x1f/0x30
>  </TASK>
> 
> Allocated by task 2145:
>  kasan_save_stack+0x1e/0x40
>  kasan_set_track+0x21/0x30
>  __kasan_slab_alloc+0x54/0x60
>  kmem_cache_alloc+0x14a/0x510
>  xfs_buf_item_init+0x160/0x6d0
>  _xfs_trans_bjoin+0x7f/0x2e0
>  xfs_trans_getsb+0xb6/0x3f0
>  xfs_trans_apply_sb_deltas+0x1f/0x8c0
>  __xfs_trans_commit+0xa25/0xe10
>  xfs_symlink+0xe23/0x1660
>  xfs_vn_symlink+0x157/0x280
>  vfs_symlink+0x491/0x790
>  do_symlinkat+0x128/0x220
>  __x64_sys_symlink+0x7a/0x90
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Freed by task 216:
>  kasan_save_stack+0x1e/0x40
>  kasan_set_track+0x21/0x30
>  kasan_save_free_info+0x2a/0x40
>  __kasan_slab_free+0x105/0x1a0
>  kmem_cache_free+0xb6/0x460
>  xfs_buf_ioend+0x1e9/0x11f0
>  xfs_buf_item_unpin+0x3d6/0x840
>  xfs_trans_committed_bulk+0x4c2/0x7c0
>  xlog_cil_committed+0xab6/0xfb0
>  xlog_cil_process_committed+0x117/0x1e0
>  xlog_state_shutdown_callbacks+0x208/0x440
>  xlog_force_shutdown+0x1b3/0x3a0
>  xlog_ioend_work+0xef/0x1d0
>  process_one_work+0x6f9/0xf70
>  worker_thread+0x578/0xf30
>  kthread+0x28c/0x330
>  ret_from_fork+0x1f/0x30
> 
> The buggy address belongs to the object at ffff88801800f388
>  which belongs to the cache xfs_buf_item of size 272
> The buggy address is located 104 bytes inside of
>  272-byte region [ffff88801800f388, ffff88801800f498)
> 
> The buggy address belongs to the physical page:
> page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
> index:0xffff88801800f208 pfn:0x1800e
> head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
> raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
> raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                              ^
>  ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
> Disabling lock debugging due to kernel taint
> 
> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> ---
>  fs/xfs/xfs_buf_item.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 522d450a94b1..df7322ed73fa 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1018,6 +1018,8 @@ xfs_buf_item_relse(
>  	trace_xfs_buf_item_relse(bp, _RET_IP_);
>  	ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
>  
> +	if (atomic_read(&bip->bli_refcount))
> +		return;

...and the answers to the questions posed here[1] and here[2] are...?

[1] https://lore.kernel.org/linux-xfs/Y3aLWgGStNPEo2z4@magnolia/
[2] https://lore.kernel.org/linux-xfs/20221103214408.GI3600936@dread.disaster.area/

--D

>  	bp->b_log_item = NULL;
>  	xfs_buf_rele(bp);
>  	xfs_buf_item_free(bip);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown
  2022-11-21 18:14   ` Darrick J. Wong
@ 2022-11-22  1:56     ` Guo Xuenan
       [not found]     ` <acd05d3a-9df0-9e64-6846-f24626b80e74@huawei.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Guo Xuenan @ 2022-11-22  1:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

Hi Darrick,

On 2022/11/22 2:14, Darrick J. Wong wrote:
> On Fri, Nov 18, 2022 at 08:11:43PM +0800, Guo Xuenan wrote:
>> xfs log io error will trigger xlog shut down, and end_io worker call
>> xlog_state_shutdown_callbacks to unpin and release the buf log item.
>> The race condition is that when there are some thread doing transaction
>> commit and happened not to be intercepted by xlog_is_shutdown, then,
>> these log item will be insert into CIL, when unpin and release these
>> buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
>> can increase recurrence probability.
>>
>> The following call graph actually encountered this bad situation.
>> fsstress                    io end worker kworker/0:1H-216
>>                              xlog_ioend_work
>>                                ->xlog_force_shutdown
>>                                  ->xlog_state_shutdown_callbacks
>>                                    ->xlog_cil_process_committed
>>                                      ->xlog_cil_committed
>>                                        ->xfs_trans_committed_bulk
>> ->xfs_trans_apply_sb_deltas             ->li_ops->iop_unpin(lip, 1);
>>    ->xfs_trans_getsb
>>      ->_xfs_trans_bjoin
>>        ->xfs_buf_item_init
>>          ->if (bip) { return 0;} //relog
>> ->xlog_cil_commit
>>    ->xlog_cil_insert_items //insert into CIL
>>                                             ->xfs_buf_ioend_fail(bp);
>>                                               ->xfs_buf_ioend
>>                                                 ->xfs_buf_item_done
>>                                                   ->xfs_buf_item_relse
>>                                                     ->xfs_buf_item_free
>>
>> when cil push worker gather percpu cil and insert super block buf log item
>> into ctx->log_items then uaf occurs.
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
>> Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
>>
>> CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
>> 6.1.0-rc1-00001-g274115149b42 #136
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.13.0-1ubuntu1.1 04/01/2014
>> Workqueue: xfs-cil/sda xlog_cil_push_work
>> Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0x4d/0x66
>>   print_report+0x171/0x4a6
>>   kasan_report+0xb3/0x130
>>   xlog_cil_push_work+0x1c8f/0x22f0
>>   process_one_work+0x6f9/0xf70
>>   worker_thread+0x578/0xf30
>>   kthread+0x28c/0x330
>>   ret_from_fork+0x1f/0x30
>>   </TASK>
>>
>> Allocated by task 2145:
>>   kasan_save_stack+0x1e/0x40
>>   kasan_set_track+0x21/0x30
>>   __kasan_slab_alloc+0x54/0x60
>>   kmem_cache_alloc+0x14a/0x510
>>   xfs_buf_item_init+0x160/0x6d0
>>   _xfs_trans_bjoin+0x7f/0x2e0
>>   xfs_trans_getsb+0xb6/0x3f0
>>   xfs_trans_apply_sb_deltas+0x1f/0x8c0
>>   __xfs_trans_commit+0xa25/0xe10
>>   xfs_symlink+0xe23/0x1660
>>   xfs_vn_symlink+0x157/0x280
>>   vfs_symlink+0x491/0x790
>>   do_symlinkat+0x128/0x220
>>   __x64_sys_symlink+0x7a/0x90
>>   do_syscall_64+0x35/0x80
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Freed by task 216:
>>   kasan_save_stack+0x1e/0x40
>>   kasan_set_track+0x21/0x30
>>   kasan_save_free_info+0x2a/0x40
>>   __kasan_slab_free+0x105/0x1a0
>>   kmem_cache_free+0xb6/0x460
>>   xfs_buf_ioend+0x1e9/0x11f0
>>   xfs_buf_item_unpin+0x3d6/0x840
>>   xfs_trans_committed_bulk+0x4c2/0x7c0
>>   xlog_cil_committed+0xab6/0xfb0
>>   xlog_cil_process_committed+0x117/0x1e0
>>   xlog_state_shutdown_callbacks+0x208/0x440
>>   xlog_force_shutdown+0x1b3/0x3a0
>>   xlog_ioend_work+0xef/0x1d0
>>   process_one_work+0x6f9/0xf70
>>   worker_thread+0x578/0xf30
>>   kthread+0x28c/0x330
>>   ret_from_fork+0x1f/0x30
>>
>> The buggy address belongs to the object at ffff88801800f388
>>   which belongs to the cache xfs_buf_item of size 272
>> The buggy address is located 104 bytes inside of
>>   272-byte region [ffff88801800f388, ffff88801800f498)
>>
>> The buggy address belongs to the physical page:
>> page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
>> index:0xffff88801800f208 pfn:0x1800e
>> head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
>> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
>> raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
>> raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                               ^
>>   ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>> ==================================================================
>> Disabling lock debugging due to kernel taint
>>
>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>> ---
>>   fs/xfs/xfs_buf_item.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
>> index 522d450a94b1..df7322ed73fa 100644
>> --- a/fs/xfs/xfs_buf_item.c
>> +++ b/fs/xfs/xfs_buf_item.c
>> @@ -1018,6 +1018,8 @@ xfs_buf_item_relse(
>>   	trace_xfs_buf_item_relse(bp, _RET_IP_);
>>   	ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
>>   
>> +	if (atomic_read(&bip->bli_refcount))
>> +		return;
> ...and the answers to the questions posed here[1] and here[2] are...?
>
> [1] https://lore.kernel.org/linux-xfs/Y3aLWgGStNPEo2z4@magnolia/
> [2] https://lore.kernel.org/linux-xfs/20221103214408.GI3600936@dread.disaster.area/
I'm so sorry about that, I have replied the mails appears in my sent box,
there must be something wrong with my email software.
(Sorry again, I will resend, .....hope you can get this.)
> --D
>
>>   	bp->b_log_item = NULL;
>>   	xfs_buf_rele(bp);
>>   	xfs_buf_item_free(bip);
>> -- 
>> 2.31.1
>>
> .

-- 
Guo Xuenan [OS Kernel Lab]
-----------------------------
Email: guoxuenan@huawei.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL
  2022-11-21 18:11   ` Darrick J. Wong
@ 2022-11-22  2:13     ` Guo Xuenan
  0 siblings, 0 replies; 8+ messages in thread
From: Guo Xuenan @ 2022-11-22  2:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

On 2022/11/22 2:11, Darrick J. Wong wrote:
> On Fri, Nov 18, 2022 at 08:11:42PM +0800, Guo Xuenan wrote:
>> Fix uaf in xfs_trans_ail_delete during xlog force shutdown.
>> In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in
>> xlog_state_shutdown_callbacks") changed the order of running callbacks
>> and wait for iclog completion to avoid unmount path untimely destroy AIL.
>> But which seems not enough to ensue this, adding mdelay in
>> `xfs_buf_item_unpin` can prove that.
>>
>> The reproduction is as follows. To ensure destroy AIL safely,
>> we should wait all xlog ioend workers done and sync the AIL.
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0
>> Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43
>>
>> CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G        W
>> 6.1.0-rc1-00002-gc28266863c4a #137
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.13.0-1ubuntu1.1 04/01/2014
>> Workqueue: xfs-log/sda xlog_ioend_work
>> Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0x4d/0x66
>>   print_report+0x171/0x4a6
>>   kasan_report+0xb3/0x130
>>   xfs_trans_ail_delete+0x240/0x2a0
>>   xfs_buf_item_done+0x7b/0xa0
>>   xfs_buf_ioend+0x1e9/0x11f0
>>   xfs_buf_item_unpin+0x4c8/0x860
>>   xfs_trans_committed_bulk+0x4c2/0x7c0
>>   xlog_cil_committed+0xab6/0xfb0
>>   xlog_cil_process_committed+0x117/0x1e0
>>   xlog_state_shutdown_callbacks+0x208/0x440
>>   xlog_force_shutdown+0x1b3/0x3a0
>>   xlog_ioend_work+0xef/0x1d0
>>   process_one_work+0x6f9/0xf70
>>   worker_thread+0x578/0xf30
>>   kthread+0x28c/0x330
>>   ret_from_fork+0x1f/0x30
>>   </TASK>
>>
>> Allocated by task 9606:
>>   kasan_save_stack+0x1e/0x40
>>   kasan_set_track+0x21/0x30
>>   __kasan_kmalloc+0x7a/0x90
>>   __kmalloc+0x59/0x140
>>   kmem_alloc+0xb2/0x2f0
>>   xfs_trans_ail_init+0x20/0x320
>>   xfs_log_mount+0x37e/0x690
>>   xfs_mountfs+0xe36/0x1b40
>>   xfs_fs_fill_super+0xc5c/0x1a70
>>   get_tree_bdev+0x3c5/0x6c0
>>   vfs_get_tree+0x85/0x250
>>   path_mount+0xec3/0x1830
>>   do_mount+0xef/0x110
>>   __x64_sys_mount+0x150/0x1f0
>>   do_syscall_64+0x35/0x80
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Freed by task 9662:
>>   kasan_save_stack+0x1e/0x40
>>   kasan_set_track+0x21/0x30
>>   kasan_save_free_info+0x2a/0x40
>>   __kasan_slab_free+0x105/0x1a0
>>   __kmem_cache_free+0x99/0x2d0
>>   kvfree+0x3a/0x40
>>   xfs_log_unmount+0x60/0xf0
>>   xfs_unmountfs+0xf3/0x1d0
>>   xfs_fs_put_super+0x78/0x300
>>   generic_shutdown_super+0x151/0x400
>>   kill_block_super+0x9a/0xe0
>>   deactivate_locked_super+0x82/0xe0
>>   deactivate_super+0x91/0xb0
>>   cleanup_mnt+0x32a/0x4a0
>>   task_work_run+0x15f/0x240
>>   exit_to_user_mode_prepare+0x188/0x190
>>   syscall_exit_to_user_mode+0x12/0x30
>>   do_syscall_64+0x42/0x80
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> The buggy address belongs to the object at ffff888023169400
>>   which belongs to the cache kmalloc-128 of size 128
>> The buggy address is located 0 bytes inside of
>>   128-byte region [ffff888023169400, ffff888023169480)
>>
>> The buggy address belongs to the physical page:
>> page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000
>> index:0xffff888023168f80 pfn:0x23168
>> head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0
>> flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
>> raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0
>> raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>   ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                     ^
>>   ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>   ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>> ==================================================================
>> Disabling lock debugging due to kernel taint
>>
>> Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks")
>> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
>> ---
>>   fs/xfs/xfs_log.c | 36 +++++++++++++++++++++++++-----------
>>   1 file changed, 25 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index 0141d9907d31..fc61cc024023 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -888,6 +888,23 @@ xlog_force_iclog(
>>   	return xlog_state_release_iclog(iclog->ic_log, iclog, NULL);
>>   }
>>   
>> +/*
>> + * Cycle all the iclogbuf locks to make sure all log IO completion
>> + * is done before we tear down these buffers.
>> + */
>> +static void
>> +xlog_wait_iclog_completion(struct xlog *log)
>> +{
>> +	int		i;
>> +	struct xlog_in_core	*iclog = log->l_iclog;
>> +
>> +	for (i = 0; i < log->l_iclog_bufs; i++) {
>> +		down(&iclog->ic_sema);
>> +		up(&iclog->ic_sema);
>> +		iclog = iclog->ic_next;
>> +	}
>> +}
>> +
>>   /*
>>    * Wait for the iclog and all prior iclogs to be written disk as required by the
>>    * log force state machine. Waiting on ic_force_wait ensures iclog completions
>> @@ -1113,6 +1130,14 @@ xfs_log_unmount(
>>   {
>>   	xfs_log_clean(mp);
>>   
>> +	/*
>> +	 * If shutdown has come from iclog IO context, the log
>> +	 * cleaning will have been skipped and so we need to wait
>> +	 * for the iclog to complete shutdown processing before we
>> +	 * tear anything down.
>> +	 */
>> +	xlog_wait_iclog_completion(mp->m_log);
>> +
>>   	xfs_buftarg_drain(mp->m_ddev_targp);
>>   
>>   	xfs_trans_ail_destroy(mp);
>> @@ -2115,17 +2140,6 @@ xlog_dealloc_log(
>>   	xlog_in_core_t	*iclog, *next_iclog;
>>   	int		i;
>>   
>> -	/*
>> -	 * Cycle all the iclogbuf locks to make sure all log IO completion
>> -	 * is done before we tear down these buffers.
>> -	 */
>> -	iclog = log->l_iclog;
>> -	for (i = 0; i < log->l_iclog_bufs; i++) {
>> -		down(&iclog->ic_sema);
>> -		up(&iclog->ic_sema);
>> -		iclog = iclog->ic_next;
>> -	}
> Why is it ok to remove this loop here?  If the xlog_recover() call in
> xfs_log_mount shuts down the log, don't we still need to cycle the iclog
> locks to make sure the completions have all run before we free them?

Dave provided some opinions here [1].

 > "And the iclog walk can be removed from xlog_dealloc_log() as we've
already done the checks it needs before tearing down the iclogs...."

[1]https://lore.kernel.org/all/20221116060229.GC3600936@dread.disaster.area/ 

> --D
>
>> -
>>   	/*
>>   	 * Destroy the CIL after waiting for iclog IO completion because an
>>   	 * iclog EIO error will try to shut down the log, which accesses the
>> -- 
>> 2.31.1
>>
> .

-- 
Guo Xuenan [OS Kernel Lab]
-----------------------------
Email: guoxuenan@huawei.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown
       [not found]     ` <acd05d3a-9df0-9e64-6846-f24626b80e74@huawei.com>
@ 2022-11-28 22:59       ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2022-11-28 22:59 UTC (permalink / raw)
  To: Guo Xuenan
  Cc: dchinner, linux-xfs, houtao1, jack.qiu, fangwei1, yi.zhang,
	zhengbin13, leo.lilong, zengheng4

On Mon, Nov 28, 2022 at 10:11:52AM +0800, Guo Xuenan wrote:
> Hi Darrick:
> 
> On 2022/11/22 2:14, Darrick J. Wong wrote:
> > On Fri, Nov 18, 2022 at 08:11:43PM +0800, Guo Xuenan wrote:
> > > xfs log io error will trigger xlog shut down, and end_io worker call
> > > xlog_state_shutdown_callbacks to unpin and release the buf log item.
> > > The race condition is that when there are some thread doing transaction
> > > commit and happened not to be intercepted by xlog_is_shutdown, then,
> > > these log item will be insert into CIL, when unpin and release these
> > > buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
> > > can increase recurrence probability.
> > > 
> > > The following call graph actually encountered this bad situation.
> > > fsstress                    io end worker kworker/0:1H-216
> > >                              xlog_ioend_work
> > >                                ->xlog_force_shutdown
> > >                                  ->xlog_state_shutdown_callbacks
> > >                                    ->xlog_cil_process_committed
> > >                                      ->xlog_cil_committed
> > >                                        ->xfs_trans_committed_bulk
> > > ->xfs_trans_apply_sb_deltas             ->li_ops->iop_unpin(lip, 1);
> > >    ->xfs_trans_getsb
> > >      ->_xfs_trans_bjoin
> > >        ->xfs_buf_item_init
> > >          ->if (bip) { return 0;} //relog
> > > ->xlog_cil_commit
> > >    ->xlog_cil_insert_items //insert into CIL
> > >                                             ->xfs_buf_ioend_fail(bp);
> > >                                               ->xfs_buf_ioend
> > >                                                 ->xfs_buf_item_done
> > >                                                   ->xfs_buf_item_relse
> > >                                                     ->xfs_buf_item_free
> > > 
> > > when cil push worker gather percpu cil and insert super block buf log item
> > > into ctx->log_items then uaf occurs.
> > > 
> > > ==================================================================
> > > BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
> > > Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
> > > 
> > > CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
> > > 6.1.0-rc1-00001-g274115149b42 #136
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > 1.13.0-1ubuntu1.1 04/01/2014
> > > Workqueue: xfs-cil/sda xlog_cil_push_work
> > > Call Trace:
> > >   <TASK>
> > >   dump_stack_lvl+0x4d/0x66
> > >   print_report+0x171/0x4a6
> > >   kasan_report+0xb3/0x130
> > >   xlog_cil_push_work+0x1c8f/0x22f0
> > >   process_one_work+0x6f9/0xf70
> > >   worker_thread+0x578/0xf30
> > >   kthread+0x28c/0x330
> > >   ret_from_fork+0x1f/0x30
> > >   </TASK>
> > > 
> > > Allocated by task 2145:
> > >   kasan_save_stack+0x1e/0x40
> > >   kasan_set_track+0x21/0x30
> > >   __kasan_slab_alloc+0x54/0x60
> > >   kmem_cache_alloc+0x14a/0x510
> > >   xfs_buf_item_init+0x160/0x6d0
> > >   _xfs_trans_bjoin+0x7f/0x2e0
> > >   xfs_trans_getsb+0xb6/0x3f0
> > >   xfs_trans_apply_sb_deltas+0x1f/0x8c0
> > >   __xfs_trans_commit+0xa25/0xe10
> > >   xfs_symlink+0xe23/0x1660
> > >   xfs_vn_symlink+0x157/0x280
> > >   vfs_symlink+0x491/0x790
> > >   do_symlinkat+0x128/0x220
> > >   __x64_sys_symlink+0x7a/0x90
> > >   do_syscall_64+0x35/0x80
> > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > 
> > > Freed by task 216:
> > >   kasan_save_stack+0x1e/0x40
> > >   kasan_set_track+0x21/0x30
> > >   kasan_save_free_info+0x2a/0x40
> > >   __kasan_slab_free+0x105/0x1a0
> > >   kmem_cache_free+0xb6/0x460
> > >   xfs_buf_ioend+0x1e9/0x11f0
> > >   xfs_buf_item_unpin+0x3d6/0x840
> > >   xfs_trans_committed_bulk+0x4c2/0x7c0
> > >   xlog_cil_committed+0xab6/0xfb0
> > >   xlog_cil_process_committed+0x117/0x1e0
> > >   xlog_state_shutdown_callbacks+0x208/0x440
> > >   xlog_force_shutdown+0x1b3/0x3a0
> > >   xlog_ioend_work+0xef/0x1d0
> > >   process_one_work+0x6f9/0xf70
> > >   worker_thread+0x578/0xf30
> > >   kthread+0x28c/0x330
> > >   ret_from_fork+0x1f/0x30
> > > 
> > > The buggy address belongs to the object at ffff88801800f388
> > >   which belongs to the cache xfs_buf_item of size 272
> > > The buggy address is located 104 bytes inside of
> > >   272-byte region [ffff88801800f388, ffff88801800f498)
> > > 
> > > The buggy address belongs to the physical page:
> > > page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
> > > index:0xffff88801800f208 pfn:0x1800e
> > > head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
> > > flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
> > > raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
> > > raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > 
> > > Memory state around the buggy address:
> > >   ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >   ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >                                                               ^
> > >   ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >   ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > ==================================================================
> > > Disabling lock debugging due to kernel taint
> > > 
> > > Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
> > > ---
> > >   fs/xfs/xfs_buf_item.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > > index 522d450a94b1..df7322ed73fa 100644
> > > --- a/fs/xfs/xfs_buf_item.c
> > > +++ b/fs/xfs/xfs_buf_item.c
> > > @@ -1018,6 +1018,8 @@ xfs_buf_item_relse(
> > >   	trace_xfs_buf_item_relse(bp, _RET_IP_);
> > >   	ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
> > > +	if (atomic_read(&bip->bli_refcount))
> > > +		return;
> > ...and the answers to the questions posed here[1] and here[2] are...?
> > 
> > [1] https://lore.kernel.org/linux-xfs/Y3aLWgGStNPEo2z4@magnolia/
> > [2] https://lore.kernel.org/linux-xfs/20221103214408.GI3600936@dread.disaster.area/
> I have replied [1] [2] at the old thread, but I found that I didn't find my
> reply
> in the xfs mailing list.Sorry for bothering you, if you've recieved it. I
> will reply
> here again, since I find my email replay to PATCH v5 display on the
> website,but I did
> not find my replay to [1] [2]. (I'm so sorry if I bothered you.)

AH, that's what happened.  I'm not sure what's up with huawei vs. vger,
but I guess something's screwed up with your corporate mail that makes
sending patches a big pain.

(...and it's not limited to huawei, most major US companies seem to have
rolled out the brave new world of email systems where listservs are
broken and nobody cares because Microsoft and Google sell teamware.)

> > Q1. Odd switch from ^^^^^^^^^^^^^ spaces to tabs here.
> 
> For better alignment I have changed all of them to spaces in PATCHv5.

Ok, I'll look for them on the list then.

> > Q2. Why are we still inserting things into the CIL of a dead log?
> 
> Because,dead log condition is set before xlog_cil_commit.
> Current code as follows:
> __xfs_trans_commit(
>     struct xfs_trans *tp,
>     bool  regrant)
> {    ...
> 
>     if (xlog_is_shutdown(log)) {  <= HERE is the dead log condition.
>         error = -EIO;
>         goto out_unreserve;
>     }
> 
>     ASSERT(tp->t_ticket != NULL);
> 
>     /*
>      * If we need to update the superblock, then do it now.
>      */
>     if (tp->t_flags & XFS_TRANS_SB_DIRTY)
>         xfs_trans_apply_sb_deltas(tp);
>     xfs_trans_apply_dquot_deltas(tp);
>     //----HERE--to reporduce the problem, add delay here, so it will------
>     + if (xlog_is_shutdown(log))
>     +     mdelay(1000);
>     xlog_cil_commit(log, tp, &commit_seq, regrant);
>     ...
> }
> In my opinion,due to concurrentcy of transaction commit and the status
> change of xlog,
> I think xlog_is_shutdown can not suddenly prevent /all/ log items from
> inserting into
> CIL list.

Ah, ok.  Thanks for pointing out that race condition more clearly.

> > Q3.If we're releasing the buf item, shouldn't this be an
> atomic_dec_and_test or something?
> 
> I have do some test,I am sure it should not. because which will lead to
> bli_refcount < 0
> and ASSERT(atomic_read(&bip->bli_refcount) > 0) will be trigger.
> The reason is that function xfs_buf_item_unpin have decreased the refcount,
> so we should not
> 
> decrease it again in xfs_buf_item_relse.
> 
> As Dave said here[2],"xfs_buf_item_relse only called at end of BLI life
> cycle, so dosen't check
> the reference count,when in fact it clearly isn't."
> (In [2] Dave have same question as Q2, I also replied, seem sent email
> failed, really sorry)

Ok, I'll run both through the nightly tests tonight.

--D

> Thanks
> Xuenan
> > --D
> > 
> > >   	bp->b_log_item = NULL;
> > >   	xfs_buf_rele(bp);
> > >   	xfs_buf_item_free(bip);
> > > -- 
> > > 2.31.1
> > > 
> > .
> 
> -- 
> Guo Xuenan [OS Kernel Lab]
> -----------------------------
> Email: guoxuenan@huawei.com
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-28 23:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 12:11 [PATCH v5 0/2] xfs: shutdown UAF fixes Guo Xuenan
2022-11-18 12:11 ` [PATCH v5 1/2] xfs: wait iclog complete before tearing down AIL Guo Xuenan
2022-11-21 18:11   ` Darrick J. Wong
2022-11-22  2:13     ` Guo Xuenan
2022-11-18 12:11 ` [PATCH v5 2/2] xfs: fix super block buf log item UAF during force shutdown Guo Xuenan
2022-11-21 18:14   ` Darrick J. Wong
2022-11-22  1:56     ` Guo Xuenan
     [not found]     ` <acd05d3a-9df0-9e64-6846-f24626b80e74@huawei.com>
2022-11-28 22:59       ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.