linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] fix use after free in xlog_wait()
@ 2020-06-11  1:39 Yu Kuai
  2020-06-11  2:28 ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2020-06-11  1:39 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, linux-kernel, yukuai3, yi.zhang

I recently got UAF by running generic/019 in qemu:

==================================================================
  BUG: KASAN: use-after-free in __lock_acquire+0x4508/0x68c0
  Read of size 8 at addr ffff88811327f080 by task fio/11147

  CPU: 6 PID: 11147 Comm: fio Tainted: G        W         5.7.0-next-20200602+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
  Call Trace:
   dump_stack+0xf6/0x16e
   ? __lock_acquire+0x4508/0x68c0
   ? __lock_acquire+0x4508/0x68c0
   print_address_description.constprop.0+0x1a/0x210
   ? __lock_acquire+0x4508/0x68c0
   kasan_report.cold+0x1f/0x37
   ? lockdep_hardirqs_on_prepare+0x480/0x550
   ? __lock_acquire+0x4508/0x68c0
   __lock_acquire+0x4508/0x68c0
   ? print_usage_bug+0x1f0/0x1f0
   ? finish_task_switch+0x126/0x5e0
   ? lockdep_hardirqs_on_prepare+0x550/0x550
   ? mark_held_locks+0x9e/0xe0
   ? __schedule+0x801/0x1d90
   ? _raw_spin_unlock_irq+0x1f/0x30
   lock_acquire+0x182/0x790
   ? remove_wait_queue+0x1d/0x180
   ? __switch_to_asm+0x42/0x70
   ? lock_release+0x710/0x710
   ? __schedule+0x85c/0x1d90
   ? xfs_log_commit_cil+0x1d8e/0x2a50
   ? __sched_text_start+0x8/0x8
   _raw_spin_lock_irqsave+0x32/0x50
   ? remove_wait_queue+0x1d/0x180
   remove_wait_queue+0x1d/0x180
   xfs_log_commit_cil+0x1d9e/0x2a50
   ? xlog_cil_empty+0x90/0x90
   ? wake_up_q+0x140/0x140
   ? rcu_read_lock_sched_held+0x9c/0xd0
   ? rcu_read_lock_bh_held+0xb0/0xb0
   __xfs_trans_commit+0x292/0xec0
   ? xfs_trans_unreserve_and_mod_sb+0xab0/0xab0
   ? rcu_read_lock_bh_held+0xb0/0xb0
   ? xfs_isilocked+0x87/0x2e0
   ? xfs_trans_log_inode+0x1ad/0x480
   xfs_vn_update_time+0x3eb/0x6d0
   ? xfs_setattr_mode.isra.0+0xa0/0xa0
   ? current_time+0xa8/0x110
   ? timestamp_truncate+0x2f0/0x2f0
   ? xfs_setattr_mode.isra.0+0xa0/0xa0
   update_time+0x70/0xc0
   file_update_time+0x2b7/0x490
   ? update_time+0xc0/0xc0
   ? __sb_start_write+0x197/0x3e0
   __xfs_filemap_fault.constprop.0+0x1b7/0x480
   do_page_mkwrite+0x1ac/0x470
   do_wp_page+0x9e2/0x1b10
   ? do_raw_spin_lock+0x121/0x290
   ? finish_mkwrite_fault+0x4a0/0x4a0
   ? rwlock_bug.part.0+0x90/0x90
   ? handle_mm_fault+0xa81/0x3570
   handle_mm_fault+0x1c65/0x3570
   ? __pmd_alloc+0x4c0/0x4c0
   ? vmacache_find+0x55/0x2a0
   do_user_addr_fault+0x635/0xd42
   exc_page_fault+0xdd/0x5b0
   ? asm_common_interrupt+0x8/0x40
   ? asm_exc_page_fault+0x8/0x30
   asm_exc_page_fault+0x1e/0x30
  RIP: 0033:0x7f40e022336a
  Code: Bad RIP value.
  RSP: 002b:00007ffedefb0218 EFLAGS: 00010206
  RAX: 00007f40b7a5a000 RBX: 0000000002562280 RCX: 00000000025633d0
  RDX: 0000000000000fc0 RSI: 0000000002562420 RDI: 00007f40b7a5a000
  RBP: 00007f40b8620190 R08: 0000000000000000 R09: 00007f40b7a5aff0
  R10: 00007ffedeff8000 R11: 00007f40b7a5aff0 R12: 0000000000000001
  R13: 0000000000001000 R14: 00000000025622a8 R15: 00007f40b8620198

  Allocated by task 6826:
   save_stack+0x1b/0x40
   __kasan_kmalloc.constprop.0+0xc2/0xd0
   kmem_alloc+0x154/0x450
   xlog_cil_push_work+0xff/0x1250
   process_one_work+0xa3e/0x17a0
   worker_thread+0x8e2/0x1050
   kthread+0x355/0x470
   ret_from_fork+0x22/0x30

   Freed by task 6826:
   save_stack+0x1b/0x40
   __kasan_slab_free+0x12c/0x170
   kfree+0xd6/0x300
   kvfree+0x42/0x50
   xlog_cil_committed+0xa9c/0xf30
   xlog_cil_push_work+0xa8c/0x1250
   process_one_work+0xa3e/0x17a0
   worker_thread+0x8e2/0x1050
   kthread+0x355/0x470
   ret_from_fork+0x22/0x30

  The buggy address belongs to the object at ffff88811327f000
   which belongs to the cache kmalloc-256 of size 256
  The buggy address is located 128 bytes inside of
   256-byte region [ffff88811327f000, ffff88811327f100)
  The buggy address belongs to the page:
  page:ffffea00044c9f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00044c9f00 order:2 compound_mapcount:0 compound_pincount:0
  flags: 0x200000000010200(slab|head)
  raw: 0200000000010200 dead000000000100 dead000000000122 ffff88811a40e800
  raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff88811327ef80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff88811327f000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff88811327f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                     ^
   ffff88811327f100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff88811327f180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==================================================================

I think the reason is that when 'ctx' is freed in xlog_cil_committed(),
a previous call to xlog_wait(&ctx->xc_ctx->push_wait, ...) hasn't finished
yet. Thus when remove_wait_queue() is called, UAF will be triggered
since 'ctx' was freed:

thread1		    thread2             thread3

__xfs_trans_commit
 xfs_log_commit_cil
  xlog_wait
   schedule
                    xlog_cil_push_work
		     wake_up_all
		                        xlog_cil_committed
					 kmem_free
   remove_wait_queue
    spin_lock_irqsave --> UAF

I tried to fix the problem by using autoremove_wake_function() in
xlog_wait(), however, soft lockup will be triggered this way.

Instead, make sure waitqueue_active(&ctx->push_wait) return false before
freeing 'ctx'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/xfs/xfs_log_cil.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b43f0e8f43f2..59b21485b0fc 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -607,7 +607,7 @@ xlog_cil_committed(
 
 	if (!list_empty(&ctx->busy_extents))
 		xlog_discard_busy_extents(mp, ctx);
-	else
+	else if (!waitqueue_active(&ctx->push_wait))
 		kmem_free(ctx);
 }
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] fix use after free in xlog_wait()
  2020-06-11  1:39 [RFC PATCH] fix use after free in xlog_wait() Yu Kuai
@ 2020-06-11  2:28 ` Dave Chinner
  2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
  2020-06-11  3:01   ` [RFC PATCH] fix use after free in xlog_wait() yukuai (C)
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2020-06-11  2:28 UTC (permalink / raw)
  To: Yu Kuai; +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang

On Thu, Jun 11, 2020 at 09:39:52AM +0800, Yu Kuai wrote:
> I recently got UAF by running generic/019 in qemu:
> 
> ==================================================================
>   BUG: KASAN: use-after-free in __lock_acquire+0x4508/0x68c0
>   Read of size 8 at addr ffff88811327f080 by task fio/11147
....
>    remove_wait_queue+0x1d/0x180
>    xfs_log_commit_cil+0x1d9e/0x2a50
>    __xfs_trans_commit+0x292/0xec0

Ok, so this is waking up from a the CIL context overrunning the hard
size limit....

>    Freed by task 6826:
>    save_stack+0x1b/0x40
>    __kasan_slab_free+0x12c/0x170
>    kfree+0xd6/0x300
>    kvfree+0x42/0x50
>    xlog_cil_committed+0xa9c/0xf30
>    xlog_cil_push_work+0xa8c/0x1250
>    process_one_work+0xa3e/0x17a0
>    worker_thread+0x8e2/0x1050
>    kthread+0x355/0x470
>    ret_from_fork+0x22/0x30

Hmmmm. The CIL push work freed the context which means somethign
went wrong somewhere - we must be in CIL commit error path here...

/me checks generic/019

Oh, it's a repeated shutdown test. Right, so we're getting a
shutdown in the middle of a CIL push when the CIL is hard throttling
callers and the CIL context gets freed before the throttled tasks
can be woken.

Gotcha. Yup, that's a real issue, thanks for reporting it!

> I think the reason is that when 'ctx' is freed in xlog_cil_committed(),
> a previous call to xlog_wait(&ctx->xc_ctx->push_wait, ...) hasn't finished
> yet. Thus when remove_wait_queue() is called, UAF will be triggered
> since 'ctx' was freed:
> 
> thread1		    thread2             thread3
> 
> __xfs_trans_commit
>  xfs_log_commit_cil
>   xlog_wait
>    schedule
>                     xlog_cil_push_work
> 		     wake_up_all
> 		                        xlog_cil_committed
> 					 kmem_free
>    remove_wait_queue
>     spin_lock_irqsave --> UAF

Actually, it's a lot simpler:

thread1			thread2

__xfs_trans_commit
 xfs_log_commit_cil
  xlog_wait
   schedule
			xlog_cil_push_work
			wake_up_all
			<shutdown aborts commit>
			xlog_cil_committed
			kmem_free

   remove_wait_queue
    spin_lock_irqsave --> UAF

> Instead, make sure waitqueue_active(&ctx->push_wait) return false before
> freeing 'ctx'.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  fs/xfs/xfs_log_cil.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index b43f0e8f43f2..59b21485b0fc 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -607,7 +607,7 @@ xlog_cil_committed(
>  
>  	if (!list_empty(&ctx->busy_extents))
>  		xlog_discard_busy_extents(mp, ctx);
> -	else
> +	else if (!waitqueue_active(&ctx->push_wait))
>  		kmem_free(ctx);

That will just leak the memory instead, which is no better.

Let me go write a patch to fix this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] xfs: fix use-after-free on CIL context on shutdown
  2020-06-11  2:28 ` Dave Chinner
@ 2020-06-11  2:45   ` Dave Chinner
  2020-06-11 15:11     ` Brian Foster
                       ` (2 more replies)
  2020-06-11  3:01   ` [RFC PATCH] fix use after free in xlog_wait() yukuai (C)
  1 sibling, 3 replies; 9+ messages in thread
From: Dave Chinner @ 2020-06-11  2:45 UTC (permalink / raw)
  To: Yu Kuai; +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang


From: Dave Chinner <dchinner@redhat.com>

xlog_wait() on the CIL context can reference a freed context if the
waiter doesn't get scheduled before the CIL context is freed. This
can happen when a task is on the hard throttle and the CIL push
aborts due to a shutdown. This was detected by generic/019:

thread 1			thread 2

__xfs_trans_commit
 xfs_log_commit_cil
  <CIL size over hard throttle limit>
  xlog_wait
   schedule
				xlog_cil_push_work
				wake_up_all
				<shutdown aborts commit>
				xlog_cil_committed
				kmem_free

   remove_wait_queue
    spin_lock_irqsave --> UAF

Fix it by moving the wait queue to the CIL rather than keeping it in
in the CIL context that gets freed on push completion. Because the
wait queue is now independent of the CIL context and we might have
multiple contexts in flight at once, only wake the waiters on the
push throttle when the context we are pushing is over the hard
throttle size threshold.

Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c  | 10 +++++-----
 fs/xfs/xfs_log_priv.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b43f0e8f43f2e..9ed90368ab311 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -671,7 +671,8 @@ xlog_cil_push_work(
 	/*
 	 * Wake up any background push waiters now this context is being pushed.
 	 */
-	wake_up_all(&ctx->push_wait);
+	if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
+		wake_up_all(&cil->xc_push_wait);
 
 	/*
 	 * Check if we've anything to push. If there is nothing, then we don't
@@ -743,13 +744,12 @@ xlog_cil_push_work(
 
 	/*
 	 * initialise the new context and attach it to the CIL. Then attach
-	 * the current context to the CIL committing lsit so it can be found
+	 * the current context to the CIL committing list so it can be found
 	 * during log forces to extract the commit lsn of the sequence that
 	 * needs to be forced.
 	 */
 	INIT_LIST_HEAD(&new_ctx->committing);
 	INIT_LIST_HEAD(&new_ctx->busy_extents);
-	init_waitqueue_head(&new_ctx->push_wait);
 	new_ctx->sequence = ctx->sequence + 1;
 	new_ctx->cil = cil;
 	cil->xc_ctx = new_ctx;
@@ -937,7 +937,7 @@ xlog_cil_push_background(
 	if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
 		trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
 		ASSERT(cil->xc_ctx->space_used < log->l_logsize);
-		xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock);
+		xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
 		return;
 	}
 
@@ -1216,12 +1216,12 @@ xlog_cil_init(
 	INIT_LIST_HEAD(&cil->xc_committing);
 	spin_lock_init(&cil->xc_cil_lock);
 	spin_lock_init(&cil->xc_push_lock);
+	init_waitqueue_head(&cil->xc_push_wait);
 	init_rwsem(&cil->xc_ctx_lock);
 	init_waitqueue_head(&cil->xc_commit_wait);
 
 	INIT_LIST_HEAD(&ctx->committing);
 	INIT_LIST_HEAD(&ctx->busy_extents);
-	init_waitqueue_head(&ctx->push_wait);
 	ctx->sequence = 1;
 	ctx->cil = cil;
 	cil->xc_ctx = ctx;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index ec22c7a3867f1..75a62870b63af 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -240,7 +240,6 @@ struct xfs_cil_ctx {
 	struct xfs_log_vec	*lv_chain;	/* logvecs being pushed */
 	struct list_head	iclog_entry;
 	struct list_head	committing;	/* ctx committing list */
-	wait_queue_head_t	push_wait;	/* background push throttle */
 	struct work_struct	discard_endio_work;
 };
 
@@ -274,6 +273,7 @@ struct xfs_cil {
 	wait_queue_head_t	xc_commit_wait;
 	xfs_lsn_t		xc_current_sequence;
 	struct work_struct	xc_push_work;
+	wait_queue_head_t	xc_push_wait;	/* background push throttle */
 } ____cacheline_aligned_in_smp;
 
 /*

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] fix use after free in xlog_wait()
  2020-06-11  2:28 ` Dave Chinner
  2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
@ 2020-06-11  3:01   ` yukuai (C)
  2020-06-11  5:05     ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: yukuai (C) @ 2020-06-11  3:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang

On 2020/6/11 10:28, Dave Chinner wrote
> Actually, it's a lot simpler:
> 
> thread1			thread2
> 
> __xfs_trans_commit
>   xfs_log_commit_cil
>    xlog_wait
>     schedule
> 			xlog_cil_push_work
> 			wake_up_all
> 			<shutdown aborts commit>
> 			xlog_cil_committed
> 			kmem_free
> 
>     remove_wait_queue
>      spin_lock_irqsave --> UAF
> 

It's ture in this case, however, I got another result when I
tried to reporduce it, which seems 'ctx' can be freed in a
different path:

[   64.975996] run fstests generic/019 at 2020-06-10 16:13:44
[   69.126208] xfs filesystem being mounted at /tmp/scratch supports 
timestamps until 2038 (0x7fffffff)
[   99.166846] XFS (sdb): log I/O error -5
[   99.170111] XFS (sdb): Log I/O Error Detected. Shutting down filesystem
[   99.171071] XFS (sdb): Please unmount the filesystem and rectify the 
problem(s)
[   99.179569] ------------[ cut here ]------------
[   99.180432] WARNING: CPU: 7 PID: 2705 at fs/iomap/buffered-io.c:1030 
iomap_page_mkwrite_actor+0x17d/0x1b0
[   99.181273] 
==================================================================
[   99.181758] Modules linked in:
[   99.182806] BUG: KASAN: use-after-free in do_raw_spin_trylock+0x67/0x180
[   99.183255] CPU: 7 PID: 2705 Comm: fio Not tainted 
5.7.0-next-20200602+ #29
[   99.184166] Read of size 4 at addr ffff888115f28868 by task fio/2704
[   99.184171]
[   99.185142] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 
04/01/2014
[   99.185995] CPU: 6 PID: 2704 Comm: fio Not tainted 
5.7.0-next-20200602+ #29
[   99.186227] RIP: 0010:iomap_page_mkwrite_actor+0x17d/0x1b0
[   99.188199] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 
04/01/2014
[   99.189178] Code: 89 ef e8 a6 d4 c7 ff e9 3f ff ff ff e8 fc 64 ad ff 
89 da 31 f6 48 89 ef e8 b0 1e f2 ff 49 89 dc e9 26 ff ff ff e8 e3 64 ad 
ff <0f> 0b eb be e8 da 64 ad ff 4d 8d 67 ffe
[   99.189899] Call Trace:
[   99.191748] RSP: 0000:ffff88810599fa18 EFLAGS: 00010293
[   99.194218]  dump_stack+0xf6/0x16e
[   99.194574] RAX: ffff888106df3680 RBX: 0000000000001000 RCX: 
ffffffff94338cad
[   99.195301]  ? do_raw_spin_trylock+0x67/0x180
[   99.195777] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000001
[   99.195786] RBP: ffffea0002544800 R08: ffff888106df3680 R09: 
fffff940004a8901
[   99.196764]  ? do_raw_spin_trylock+0x67/0x180
[   99.196778]  print_address_description.constprop.0.cold+0xd3/0x415
[   99.197393] R10: ffffea0002544807 R11: fffff940004a8900 R12: 
0000000000000000
[   99.198380]  ? do_raw_spin_trylock+0x67/0x180
[   99.199349] R13: 0000000000000000 R14: ffff8880b6ee7280 R15: 
ffffea00025447c8
[   99.199939]  kasan_report.cold+0x1f/0x37
[   99.199949]  ? __switch_to+0x510/0xef0
[   99.200812] FS:  00007ff7d8562740(0000) GS:ffff88811a400000(0000) 
knlGS:0000000000000000
[   99.201755]  ? do_raw_spin_trylock+0x67/0x180
[   99.201765]  check_memory_region+0x14e/0x1b0
[   99.202378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   99.203319]  do_raw_spin_trylock+0x67/0x180
[   99.203858] CR2: 00007ff7ae3e4ff0 CR3: 00000001028be000 CR4: 
00000000000006e0
[   99.204374]  ? do_raw_spin_lock+0x290/0x290
[   99.205470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   99.206069]  _raw_spin_lock_irqsave+0x44/0x80
[   99.206641] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[   99.206647] Call Trace:
[   99.207420]  ? remove_wait_queue+0x22/0x190
[   99.208014]  iomap_apply+0x2d7/0xc00
[   99.208949]  remove_wait_queue+0x22/0x190
[   99.209539]  ? iomap_page_create+0x350/0x350
[   99.210510]  xfs_log_commit_cil+0x1c7e/0x2940
[   99.211115]  ? trace_event_raw_event_iomap_apply+0x430/0x430
[   99.212073]  ? xlog_cil_empty+0x90/0x90
[   99.212421]  ? down_write_trylock+0x2f0/0x2f0
[   99.212979]  ? check_flags.part.0+0x430/0x430
[   99.213486]  ? update_time+0xc0/0xc0
[   99.214025]  ? wake_up_q+0x140/0x140
[   99.214610]  iomap_page_mkwrite+0x26a/0x3b0
[   99.215201]  ? xlog_ticket_alloc+0x3e0/0x3e0
[   99.215966]  ? iomap_page_create+0x350/0x350
[   99.216487]  ? __kasan_kmalloc.constprop.0+0xc2/0xd0
[   99.217097]  __xfs_filemap_fault.constprop.0+0x32f/0x4e0
[   99.217675]  __xfs_trans_commit+0x2b3/0xf20
[   99.218206]  do_page_mkwrite+0x1b1/0x470
[   99.218681]  ? xfs_trans_unreserve_and_mod_sb+0xab0/0xab0
[   99.219259]  do_wp_page+0x9e7/0x1b10
[   99.219838]  ? xfs_isilocked+0x8c/0x2f0
[   99.220439]  ? finish_mkwrite_fault+0x4b0/0x4b0
[   99.221102]  ? xfs_trans_log_inode+0x1b2/0x480
[   99.221817]  ? do_user_addr_fault+0x2fd/0xd42
[   99.222382]  xfs_vn_update_time+0x40a/0x730
[   99.222922]  handle_mm_fault+0x1c9f/0x3600
[   99.223634]  ? xfs_setattr_mode.isra.0+0xb0/0xb0
[   99.224140]  ? __pmd_alloc+0x390/0x390
[   99.224653]  ? current_time+0xad/0x110
[   99.225278]  ? vmacache_find+0x5a/0x2a0
[   99.225861]  ? timestamp_truncate+0x2f0/0x2f0
[   99.226461]  do_user_addr_fault+0x635/0xd42
[   99.227019]  ? xfs_setattr_mode.isra.0+0xb0/0xb0
[   99.227029]  update_time+0x75/0xc0
[   99.227760]  exc_page_fault+0x12a/0x6f0
[   99.228377]  file_update_time+0x2bc/0x490
[   99.228889]  ? asm_exc_page_fault+0x8/0x30
[   99.229414]  ? update_time+0xc0/0xc0
[   99.229933]  asm_exc_page_fault+0x1e/0x30
[   99.230515]  ? __sb_start_write+0x225/0x3f0
[   99.231099] RIP: 0033:0x7ff7d709636a
[   99.231743]  ? __sb_start_write+0x1a3/0x3f0
[   99.231755]  __xfs_filemap_fault.constprop.0+0x313/0x4e0
[   99.232230] Code: Bad RIP value.
[   99.232747]  do_page_mkwrite+0x1b1/0x470
[   99.233296] RSP: 002b:00007ffd89c6e2a8 EFLAGS: 00010206
[   99.233841]  do_wp_page+0x9e7/0x1b10
[   99.234333] RAX: 00007ff7ae3e4000 RBX: 000000000159c280 RCX: 
000000000159d3d0
[   99.234863]  ? finish_mkwrite_fault+0x4b0/0x4b0
[   99.235432] RDX: 0000000000000fc0 RSI: 000000000159c420 RDI: 
00007ff7ae3e4000
[   99.235906]  ? _raw_spin_unlock_irq+0x24/0x30
[   99.236485] RBP: 00007ff7af493190 R08: 0000000000000000 R09: 
00007ff7ae3e4ff0
[   99.237198]  handle_mm_fault+0x1c9f/0x3600
[   99.237638] R10: 00007ffd89d04000 R11: 00007ff7ae3e4ff0 R12: 
0000000000000001
[   99.238163]  ? __pmd_alloc+0x390/0x390
[   99.238861] R13: 0000000000001000 R14: 000000000159c2a8 R15: 
00007ff7af493198
[   99.238874] irq event stamp: 0
[   99.239368]  ? vmacache_find+0x5a/0x2a0
[   99.240324] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[   99.240946]  do_user_addr_fault+0x635/0xd42
[   99.241903] hardirqs last disabled at (0): [<ffffffff93b3715c>] 
copy_process+0x182c/0x64b0
[   99.242488]  exc_page_fault+0x12a/0x6f0
[   99.243447] softirqs last  enabled at (0): [<ffffffff93b371fd>] 
copy_process+0x18cd/0x64b0
[   99.244001]  ? asm_exc_page_fault+0x8/0x30
[   99.244959] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   99.244965] ---[ end trace a1aa182be3fbd861 ]---
[   99.245482]  asm_exc_page_fault+0x1e/0x30
[   99.254277] RIP: 0033:0x7ff7d709636a
[   99.254749] Code: Bad RIP value.
[   99.255182] RSP: 002b:00007ffd89c6e2a8 EFLAGS: 00010206
[   99.255879] RAX: 00007ff7ae7f5000 RBX: 000000000159c280 RCX: 
000000000159d3d0
[   99.256815] RDX: 0000000000000fc0 RSI: 000000000159c420 RDI: 
00007ff7ae7f5000
[   99.257760] RBP: 00007ff7af47a3a0 R08: 0000000000000000 R09: 
00007ff7ae7f5ff0
[   99.258720] R10: 00007ffd89d04000 R11: 00007ff7ae7f5ff0 R12: 
0000000000000001
[   99.259727] R13: 0000000000001000 R14: 000000000159c2a8 R15: 
00007ff7af47a3a8
[   99.260722]
[   99.260934] Allocated by task 986:
[   99.261403]  save_stack+0x1b/0x40
[   99.261843]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   99.262483]  __kmalloc+0x179/0x3e0
[   99.262937]  kmem_alloc+0x175/0x4a0
[   99.263416]  xlog_cil_push_work+0x104/0x1250
[   99.264038]  process_one_work+0xa62/0x1840
[   99.264583]  worker_thread+0xa3/0x1050
[   99.265087]  kthread+0x35a/0x470
[   99.265520]  ret_from_fork+0x22/0x30
[   99.266021]
[   99.266238] Freed by task 1123:
[   99.266658]  save_stack+0x1b/0x40
[   99.267095]  __kasan_slab_free+0x117/0x160
[   99.267652]  kfree+0xdf/0x320
[   99.268070]  kvfree+0x47/0x50
[   99.268500]  xlog_cil_committed+0xaf0/0xf80
[   99.269104]  xlog_cil_process_committed+0xfe/0x1e0
[   99.269743]  xlog_state_do_callback+0x513/0x9b0
[   99.270335]  xfs_log_force_umount+0x2d6/0x4a0
[   99.270925]  xfs_do_force_shutdown+0xa2/0x220
[   99.271514]  xlog_ioend_work+0x13a/0x240
[   99.272042]  process_one_work+0xa62/0x1840
[   99.272582]  worker_thread+0xa3/0x1050
[   99.273104]  kthread+0x35a/0x470
[   99.273527]  ret_from_fork+0x22/0x30
[   99.274008]
[   99.274221] The buggy address belongs to the object at ffff888115f28800
[   99.274221]  which belongs to the cache kmalloc-512 of size 512
[   99.275821] The buggy address is located 104 bytes inside of
[   99.275821]  512-byte region [ffff888115f28800, ffff888115f28a00)
[   99.277477] The buggy address belongs to the page:
[   99.278121] page:ffffea000457ca00 refcount:1 mapcount:0 
mapping:0000000000000000 index:0x0 head:ffffea000457ca00 order:3 
compound_mapcount:0 compound_pincount:0
[   99.279988] flags: 0x200000000010200(slab|head)
[   99.280582] raw: 0200000000010200 ffffea0004156c00 0000000400000004 
ffff888117c0e580
[   99.281615] raw: 0000000000000000 0000000080200020 00000001ffffffff 
0000000000000000
[   99.282615] page dumped because: kasan: bad access detected
[   99.283364]
[   99.283585] Memory state around the buggy address:
[   99.284227]  ffff888115f28700: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   99.285206]  ffff888115f28780: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   99.286164] >ffff888115f28800: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   99.287116]                                                           ^
[   99.287962]  ffff888115f28880: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   99.288915]  ffff888115f28900: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   99.289874] 
==================================================================

Thanks!
Yu Kuai


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] fix use after free in xlog_wait()
  2020-06-11  3:01   ` [RFC PATCH] fix use after free in xlog_wait() yukuai (C)
@ 2020-06-11  5:05     ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2020-06-11  5:05 UTC (permalink / raw)
  To: yukuai (C); +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang

On Thu, Jun 11, 2020 at 11:01:38AM +0800, yukuai (C) wrote:
> On 2020/6/11 10:28, Dave Chinner wrote
> > Actually, it's a lot simpler:
> > 
> > thread1			thread2
> > 
> > __xfs_trans_commit
> >   xfs_log_commit_cil
> >    xlog_wait
> >     schedule
> > 			xlog_cil_push_work
> > 			wake_up_all
> > 			<shutdown aborts commit>
> > 			xlog_cil_committed
> > 			kmem_free
> > 
> >     remove_wait_queue
> >      spin_lock_irqsave --> UAF
> > 
> 
> It's ture in this case, however, I got another result when I
> tried to reporduce it, which seems 'ctx' can be freed in a
> different path:

Yup, it's effectively the same thing because of the nature of the IO
failures (generated at submit time) and scheduler behaviour of
workqueues. THis means the IO completion that processes the error is
is queued to a workqueue on the same CPU. When thread 2 finishes
running (it hasn't seen an error yet) the completion work will get
get scheduled ahead of thread1 (cpu bound kernel task vs unbound
user task).  The completion work then runs the shutdown because it
saw a log IO error and because it's the commit record bio it runs
the journal checkpoint completion to abort all the items attached to
it and free the CIL context. Then thread 1 runs again.

The only difference between the two cases is which IO of the CIL
commit the request was failed on....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix use-after-free on CIL context on shutdown
  2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
@ 2020-06-11 15:11     ` Brian Foster
  2020-06-16  1:16     ` yukuai (C)
  2020-06-19 13:46     ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2020-06-11 15:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Yu Kuai, darrick.wong, linux-xfs, linux-kernel, yi.zhang

On Thu, Jun 11, 2020 at 12:45:03PM +1000, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
> 
> thread 1			thread 2
> 
> __xfs_trans_commit
>  xfs_log_commit_cil
>   <CIL size over hard throttle limit>
>   xlog_wait
>    schedule
> 				xlog_cil_push_work
> 				wake_up_all
> 				<shutdown aborts commit>
> 				xlog_cil_committed
> 				kmem_free
> 
>    remove_wait_queue
>     spin_lock_irqsave --> UAF
> 
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.
> 
> Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
> Reported-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks reasonable:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_log_cil.c  | 10 +++++-----
>  fs/xfs/xfs_log_priv.h |  2 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index b43f0e8f43f2e..9ed90368ab311 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -671,7 +671,8 @@ xlog_cil_push_work(
>  	/*
>  	 * Wake up any background push waiters now this context is being pushed.
>  	 */
> -	wake_up_all(&ctx->push_wait);
> +	if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
> +		wake_up_all(&cil->xc_push_wait);
>  
>  	/*
>  	 * Check if we've anything to push. If there is nothing, then we don't
> @@ -743,13 +744,12 @@ xlog_cil_push_work(
>  
>  	/*
>  	 * initialise the new context and attach it to the CIL. Then attach
> -	 * the current context to the CIL committing lsit so it can be found
> +	 * the current context to the CIL committing list so it can be found
>  	 * during log forces to extract the commit lsn of the sequence that
>  	 * needs to be forced.
>  	 */
>  	INIT_LIST_HEAD(&new_ctx->committing);
>  	INIT_LIST_HEAD(&new_ctx->busy_extents);
> -	init_waitqueue_head(&new_ctx->push_wait);
>  	new_ctx->sequence = ctx->sequence + 1;
>  	new_ctx->cil = cil;
>  	cil->xc_ctx = new_ctx;
> @@ -937,7 +937,7 @@ xlog_cil_push_background(
>  	if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
>  		trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
>  		ASSERT(cil->xc_ctx->space_used < log->l_logsize);
> -		xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock);
> +		xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
>  		return;
>  	}
>  
> @@ -1216,12 +1216,12 @@ xlog_cil_init(
>  	INIT_LIST_HEAD(&cil->xc_committing);
>  	spin_lock_init(&cil->xc_cil_lock);
>  	spin_lock_init(&cil->xc_push_lock);
> +	init_waitqueue_head(&cil->xc_push_wait);
>  	init_rwsem(&cil->xc_ctx_lock);
>  	init_waitqueue_head(&cil->xc_commit_wait);
>  
>  	INIT_LIST_HEAD(&ctx->committing);
>  	INIT_LIST_HEAD(&ctx->busy_extents);
> -	init_waitqueue_head(&ctx->push_wait);
>  	ctx->sequence = 1;
>  	ctx->cil = cil;
>  	cil->xc_ctx = ctx;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index ec22c7a3867f1..75a62870b63af 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -240,7 +240,6 @@ struct xfs_cil_ctx {
>  	struct xfs_log_vec	*lv_chain;	/* logvecs being pushed */
>  	struct list_head	iclog_entry;
>  	struct list_head	committing;	/* ctx committing list */
> -	wait_queue_head_t	push_wait;	/* background push throttle */
>  	struct work_struct	discard_endio_work;
>  };
>  
> @@ -274,6 +273,7 @@ struct xfs_cil {
>  	wait_queue_head_t	xc_commit_wait;
>  	xfs_lsn_t		xc_current_sequence;
>  	struct work_struct	xc_push_work;
> +	wait_queue_head_t	xc_push_wait;	/* background push throttle */
>  } ____cacheline_aligned_in_smp;
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix use-after-free on CIL context on shutdown
  2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
  2020-06-11 15:11     ` Brian Foster
@ 2020-06-16  1:16     ` yukuai (C)
  2020-06-16  2:38       ` Dave Chinner
  2020-06-19 13:46     ` Christoph Hellwig
  2 siblings, 1 reply; 9+ messages in thread
From: yukuai (C) @ 2020-06-16  1:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang

On 2020/6/11 10:45, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
> 
> thread 1			thread 2
> 
> __xfs_trans_commit
>   xfs_log_commit_cil
>    <CIL size over hard throttle limit>
>    xlog_wait
>     schedule
> 				xlog_cil_push_work
> 				wake_up_all
> 				<shutdown aborts commit>
> 				xlog_cil_committed
> 				kmem_free
> 
>     remove_wait_queue
>      spin_lock_irqsave --> UAF
> 
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.

Hi, Dave,

How do you think about the following fix:

1. use autoremove_wake_func(), and remove remove_wait_queue() to
avoid UAF.
2. add finish_wait().

@@ -576,12 +576,13 @@ xlog_wait(
                 __releases(lock)
  {
         DECLARE_WAITQUEUE(wait, current);
+       wait.func = autoremove_wake_function;

         add_wait_queue_exclusive(wq, &wait);
         __set_current_state(TASK_UNINTERRUPTIBLE);
         spin_unlock(lock);
         schedule();
-       remove_wait_queue(wq, &wait);
+       finish_wait(wq, &wait);
  }

Best regards!
Yu Kuai


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix use-after-free on CIL context on shutdown
  2020-06-16  1:16     ` yukuai (C)
@ 2020-06-16  2:38       ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2020-06-16  2:38 UTC (permalink / raw)
  To: yukuai (C); +Cc: darrick.wong, linux-xfs, linux-kernel, yi.zhang

On Tue, Jun 16, 2020 at 09:16:09AM +0800, yukuai (C) wrote:
> On 2020/6/11 10:45, Dave Chinner wrote:
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xlog_wait() on the CIL context can reference a freed context if the
> > waiter doesn't get scheduled before the CIL context is freed. This
> > can happen when a task is on the hard throttle and the CIL push
> > aborts due to a shutdown. This was detected by generic/019:
> > 
> > thread 1			thread 2
> > 
> > __xfs_trans_commit
> >   xfs_log_commit_cil
> >    <CIL size over hard throttle limit>
> >    xlog_wait
> >     schedule
> > 				xlog_cil_push_work
> > 				wake_up_all
> > 				<shutdown aborts commit>
> > 				xlog_cil_committed
> > 				kmem_free
> > 
> >     remove_wait_queue
> >      spin_lock_irqsave --> UAF
> > 
> > Fix it by moving the wait queue to the CIL rather than keeping it in
> > in the CIL context that gets freed on push completion. Because the
> > wait queue is now independent of the CIL context and we might have
> > multiple contexts in flight at once, only wake the waiters on the
> > push throttle when the context we are pushing is over the hard
> > throttle size threshold.
> 
> Hi, Dave,
> 
> How do you think about the following fix:
> 
> 1. use autoremove_wake_func(), and remove remove_wait_queue() to
> avoid UAF.
> 2. add finish_wait().
> 
> @@ -576,12 +576,13 @@ xlog_wait(
>                 __releases(lock)
>  {
>         DECLARE_WAITQUEUE(wait, current);
> +       wait.func = autoremove_wake_function;
> 
>         add_wait_queue_exclusive(wq, &wait);
>         __set_current_state(TASK_UNINTERRUPTIBLE);
>         spin_unlock(lock);
>         schedule();
> -       remove_wait_queue(wq, &wait);
> +       finish_wait(wq, &wait);
>  }

Yes, that would address this specific symptom of the problem, but it
doesn't fix the problem root cause: that the wq can be freed while
this function sleeps. IMO, this sort of change leaves a trap for
future modifications - all the code calling xlog_wait() assumes the
embedded wq the task is sleeping on still exists after waiting so we
really should be fixing the problem the incorrect existence
guarantee in the CIL code that you tripped over.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix use-after-free on CIL context on shutdown
  2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
  2020-06-11 15:11     ` Brian Foster
  2020-06-16  1:16     ` yukuai (C)
@ 2020-06-19 13:46     ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-06-19 13:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Yu Kuai, darrick.wong, linux-xfs, linux-kernel, yi.zhang

On Thu, Jun 11, 2020 at 12:45:03PM +1000, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
> 
> thread 1			thread 2
> 
> __xfs_trans_commit
>  xfs_log_commit_cil
>   <CIL size over hard throttle limit>
>   xlog_wait
>    schedule
> 				xlog_cil_push_work
> 				wake_up_all
> 				<shutdown aborts commit>
> 				xlog_cil_committed
> 				kmem_free
> 
>    remove_wait_queue
>     spin_lock_irqsave --> UAF
> 
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.
> 
> Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
> Reported-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-19 13:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  1:39 [RFC PATCH] fix use after free in xlog_wait() Yu Kuai
2020-06-11  2:28 ` Dave Chinner
2020-06-11  2:45   ` [PATCH] xfs: fix use-after-free on CIL context on shutdown Dave Chinner
2020-06-11 15:11     ` Brian Foster
2020-06-16  1:16     ` yukuai (C)
2020-06-16  2:38       ` Dave Chinner
2020-06-19 13:46     ` Christoph Hellwig
2020-06-11  3:01   ` [RFC PATCH] fix use after free in xlog_wait() yukuai (C)
2020-06-11  5:05     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).