All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
@ 2022-04-12 16:04 Lin Ma
  2022-04-13  6:57 ` Krzysztof Kozlowski
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Lin Ma @ 2022-04-12 16:04 UTC (permalink / raw)
  To: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd; +Cc: Lin Ma

Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).

The race can be demonstrated below:

Thread-1                           Thread-2
                                 | nci_dev_up()
                                 |   nci_open_device()
                                 |     __nci_request(nci_reset_req)
                                 |       nci_send_cmd
                                 |         queue_work(cmd_work)
nci_unregister_device()          |
  nci_close_device()             | ...
    del_timer_sync(cmd_timer)[1] |
...                              | Worker
nci_free_device()                | nci_cmd_work()
  kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.

This UAF is easy to trigger, crash trace by POC is like below

[   66.703713] ==================================================================
[   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[   66.703974]
[   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[   66.703974] Call Trace:
[   66.703974]  <TASK>
[   66.703974]  dump_stack_lvl+0x57/0x7d
[   66.703974]  print_report.cold+0x5e/0x5db
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  kasan_report+0xbe/0x1c0
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  enqueue_timer+0x448/0x490
[   66.703974]  __mod_timer+0x5e6/0xb80
[   66.703974]  ? mark_held_locks+0x9e/0xe0
[   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[   66.703974]  ? queue_work_on+0x61/0x80
[   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
[   66.703974]  process_one_work+0x8bb/0x1510
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
[   66.703974]  ? rwlock_bug.part.0+0x90/0x90
[   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
[   66.703974]  worker_thread+0x575/0x1190
[   66.703974]  ? process_one_work+0x1510/0x1510
[   66.703974]  kthread+0x2a0/0x340
[   66.703974]  ? kthread_complete_and_exit+0x20/0x20
[   66.703974]  ret_from_fork+0x22/0x30
[   66.703974]  </TASK>
[   66.703974]
[   66.703974] Allocated by task 267:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  __kasan_kmalloc+0x81/0xa0
[   66.703974]  nci_allocate_device+0xd3/0x390
[   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
[   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   66.703974]  tty_ioctl+0x764/0x1310
[   66.703974]  __x64_sys_ioctl+0x122/0x190
[   66.703974]  do_syscall_64+0x3b/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   66.703974]
[   66.703974] Freed by task 406:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  kasan_set_track+0x21/0x30
[   66.703974]  kasan_set_free_info+0x20/0x30
[   66.703974]  __kasan_slab_free+0x108/0x170
[   66.703974]  kfree+0xb0/0x330
[   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
[   66.703974]  nci_uart_tty_close+0xdf/0x180
[   66.703974]  tty_ldisc_kill+0x73/0x110
[   66.703974]  tty_ldisc_hangup+0x281/0x5b0
[   66.703974]  __tty_hangup.part.0+0x431/0x890
[   66.703974]  tty_release+0x3a8/0xc80
[   66.703974]  __fput+0x1f0/0x8c0
[   66.703974]  task_work_run+0xc9/0x170
[   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
[   66.703974]  syscall_exit_to_user_mode+0x19/0x50
[   66.703974]  do_syscall_64+0x48/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae

To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 net/nfc/nci/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index d2537383a3e8..0d7763c322b5 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev)
 	mutex_lock(&ndev->req_lock);
 
 	if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
+		/* Need to flush the cmd wq in case
+		 * there is a queued/running cmd_work
+		 */
+		flush_workqueue(ndev->cmd_wq);
 		del_timer_sync(&ndev->cmd_timer);
 		del_timer_sync(&ndev->data_timer);
 		mutex_unlock(&ndev->req_lock);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
  2022-04-12 16:04 [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf Lin Ma
@ 2022-04-13  6:57 ` Krzysztof Kozlowski
  2022-04-13 13:50 ` patchwork-bot+netdevbpf
  2022-04-18 13:41 ` Guenter Roeck
  2 siblings, 0 replies; 6+ messages in thread
From: Krzysztof Kozlowski @ 2022-04-13  6:57 UTC (permalink / raw)
  To: Lin Ma, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd

On 12/04/2022 18:04, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 

Thanks!

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
  2022-04-12 16:04 [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf Lin Ma
  2022-04-13  6:57 ` Krzysztof Kozlowski
@ 2022-04-13 13:50 ` patchwork-bot+netdevbpf
  2022-04-18 13:41 ` Guenter Roeck
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-04-13 13:50 UTC (permalink / raw)
  To: Lin Ma; +Cc: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 13 Apr 2022 00:04:30 +0800 you wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 
> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> [...]

Here is the summary with links:
  - [v0] nfc: nci: add flush_workqueue to prevent uaf
    https://git.kernel.org/netdev/net/c/ef27324e2cb7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
  2022-04-12 16:04 [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf Lin Ma
  2022-04-13  6:57 ` Krzysztof Kozlowski
  2022-04-13 13:50 ` patchwork-bot+netdevbpf
@ 2022-04-18 13:41 ` Guenter Roeck
  2022-04-18 13:59   ` Lin Ma
  2 siblings, 1 reply; 6+ messages in thread
From: Guenter Roeck @ 2022-04-18 13:41 UTC (permalink / raw)
  To: Lin Ma; +Cc: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd

On Wed, Apr 13, 2022 at 12:04:30AM +0800, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 
> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> In short, the cleanup routine thought that the cmd_timer has already
> been detached by [1] but the mod_timer can re-attach the timer [2], even
> it is already released [3], resulting in UAF.
> 
> This UAF is easy to trigger, crash trace by POC is like below
> 
> [   66.703713] ==================================================================
> [   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
> [   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
> [   66.703974]
> [   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
> [   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
> [   66.703974] Call Trace:
> [   66.703974]  <TASK>
> [   66.703974]  dump_stack_lvl+0x57/0x7d
> [   66.703974]  print_report.cold+0x5e/0x5db
> [   66.703974]  ? enqueue_timer+0x448/0x490
> [   66.703974]  kasan_report+0xbe/0x1c0
> [   66.703974]  ? enqueue_timer+0x448/0x490
> [   66.703974]  enqueue_timer+0x448/0x490
> [   66.703974]  __mod_timer+0x5e6/0xb80
> [   66.703974]  ? mark_held_locks+0x9e/0xe0
> [   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
> [   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
> [   66.703974]  ? queue_work_on+0x61/0x80
> [   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
> [   66.703974]  process_one_work+0x8bb/0x1510
> [   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
> [   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
> [   66.703974]  ? rwlock_bug.part.0+0x90/0x90
> [   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
> [   66.703974]  worker_thread+0x575/0x1190
> [   66.703974]  ? process_one_work+0x1510/0x1510
> [   66.703974]  kthread+0x2a0/0x340
> [   66.703974]  ? kthread_complete_and_exit+0x20/0x20
> [   66.703974]  ret_from_fork+0x22/0x30
> [   66.703974]  </TASK>
> [   66.703974]
> [   66.703974] Allocated by task 267:
> [   66.703974]  kasan_save_stack+0x1e/0x40
> [   66.703974]  __kasan_kmalloc+0x81/0xa0
> [   66.703974]  nci_allocate_device+0xd3/0x390
> [   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
> [   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
> [   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
> [   66.703974]  tty_ioctl+0x764/0x1310
> [   66.703974]  __x64_sys_ioctl+0x122/0x190
> [   66.703974]  do_syscall_64+0x3b/0x90
> [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   66.703974]
> [   66.703974] Freed by task 406:
> [   66.703974]  kasan_save_stack+0x1e/0x40
> [   66.703974]  kasan_set_track+0x21/0x30
> [   66.703974]  kasan_set_free_info+0x20/0x30
> [   66.703974]  __kasan_slab_free+0x108/0x170
> [   66.703974]  kfree+0xb0/0x330
> [   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
> [   66.703974]  nci_uart_tty_close+0xdf/0x180
> [   66.703974]  tty_ldisc_kill+0x73/0x110
> [   66.703974]  tty_ldisc_hangup+0x281/0x5b0
> [   66.703974]  __tty_hangup.part.0+0x431/0x890
> [   66.703974]  tty_release+0x3a8/0xc80
> [   66.703974]  __fput+0x1f0/0x8c0
> [   66.703974]  task_work_run+0xc9/0x170
> [   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
> [   66.703974]  syscall_exit_to_user_mode+0x19/0x50
> [   66.703974]  do_syscall_64+0x48/0x90
> [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> To fix the UAF, this patch adds flush_workqueue() to ensure the
> nci_cmd_work is finished before the following del_timer_sync.
> This combination will promise the timer is actually detached.
> 
> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
> Signed-off-by: Lin Ma <linma@zju.edu.cn>
> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>  net/nfc/nci/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
> index d2537383a3e8..0d7763c322b5 100644
> --- a/net/nfc/nci/core.c
> +++ b/net/nfc/nci/core.c
> @@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev)
>  	mutex_lock(&ndev->req_lock);
>  
>  	if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
> +		/* Need to flush the cmd wq in case
> +		 * there is a queued/running cmd_work
> +		 */
> +		flush_workqueue(ndev->cmd_wq);
>  		del_timer_sync(&ndev->cmd_timer);

I have been wondering about this and the same code further below.
What prevents the command timer from firing after the call to
flush_workqueue() ?

Thanks,
Guenter

>  		del_timer_sync(&ndev->data_timer);
>  		mutex_unlock(&ndev->req_lock);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
  2022-04-18 13:41 ` Guenter Roeck
@ 2022-04-18 13:59   ` Lin Ma
  2022-04-23 13:52     ` Guenter Roeck
  0 siblings, 1 reply; 6+ messages in thread
From: Lin Ma @ 2022-04-18 13:59 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd

Hello Guenter,

> I have been wondering about this and the same code further below.
> What prevents the command timer from firing after the call to
> flush_workqueue() ?
> 
> Thanks,
> Guenter
> 

From my understanding, once the flush_workqueue() is executed, the work that queued in
ndev->cmd_wq will be taken the care of.

That is, once the flush_workqueue() is finished, it promises there is no executing or 
pending nci_cmd_work() ever.

static void nci_cmd_work(struct work_struct *work)
{
    // ...
		mod_timer(&ndev->cmd_timer,
			  jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
    // ...
}

The command timer is still able be fired because the mod_timer() here. That is why the
del_timer_sync() is necessary after the flush_workqueue().

One very puzzling part is that you may find out the timer queue the work again

/* NCI command timer function */
static void nci_cmd_timer(struct timer_list *t)
{
    // ...
	queue_work(ndev->cmd_wq, &ndev->cmd_work);
}

But I found that this is okay because there is no packets in ndev->cmd_q buffers hence 
even there is a queued nci_cmd_work(), it simply checks the queue and returns.

That is, the old race picture as below

> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

is impossible now because the patched flush_workqueue() make the race like below

> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     flush_workqueue()[patch]     | Worker
>                                  | nci_cmd_work()
>                                  |   mod_timer(cmd_timer)[2]
>     // work over then return
>     del_timer_sync(cmd_timer)[1] |
>                                  | Timer
>                                  | nci_cmd_timer()
>                                  | 
>     // timer over then return    |
> ...                              |
> nci_free_device()                | 
>   kfree(ndev)[3]                 | 


With above thinkings and the given fact that my POC didn't raise the UAF, I think the 
flush_workqueue() + del_timer_sync() combination is okay to hinder this race.

Tell me if there is anything wrong.

Regards
Lin Ma



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
  2022-04-18 13:59   ` Lin Ma
@ 2022-04-23 13:52     ` Guenter Roeck
  0 siblings, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2022-04-23 13:52 UTC (permalink / raw)
  To: Lin Ma; +Cc: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd

On Mon, Apr 18, 2022 at 09:59:10PM +0800, Lin Ma wrote:
> Hello Guenter,
> 
> > I have been wondering about this and the same code further below.
> > What prevents the command timer from firing after the call to
> > flush_workqueue() ?
> > 
> > Thanks,
> > Guenter
> > 
> 
> From my understanding, once the flush_workqueue() is executed, the work that queued in
> ndev->cmd_wq will be taken the care of.
> 
> That is, once the flush_workqueue() is finished, it promises there is no executing or 
> pending nci_cmd_work() ever.
> 
> static void nci_cmd_work(struct work_struct *work)
> {
>     // ...
> 		mod_timer(&ndev->cmd_timer,
> 			  jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
>     // ...
> }
> 
> The command timer is still able be fired because the mod_timer() here. That is why the
> del_timer_sync() is necessary after the flush_workqueue().
> 
> One very puzzling part is that you may find out the timer queue the work again
> 
> /* NCI command timer function */
> static void nci_cmd_timer(struct timer_list *t)
> {
>     // ...
> 	queue_work(ndev->cmd_wq, &ndev->cmd_work);
> }
> 
> But I found that this is okay because there is no packets in ndev->cmd_q buffers hence 
> even there is a queued nci_cmd_work(), it simply checks the queue and returns.
> 
> That is, the old race picture as below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     del_timer_sync(cmd_timer)[1] |
> > ...                              | Worker
> > nci_free_device()                | nci_cmd_work()
> >   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> is impossible now because the patched flush_workqueue() make the race like below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     flush_workqueue()[patch]     | Worker
> >                                  | nci_cmd_work()
> >                                  |   mod_timer(cmd_timer)[2]
> >     // work over then return
> >     del_timer_sync(cmd_timer)[1] |
> >                                  | Timer
> >                                  | nci_cmd_timer()
> >                                  | 
> >     // timer over then return    |
> > ...                              |
> > nci_free_device()                | 
> >   kfree(ndev)[3]                 | 
> 
> 
> With above thinkings and the given fact that my POC didn't raise the UAF, I think the 
> flush_workqueue() + del_timer_sync() combination is okay to hinder this race.
> 
> Tell me if there is anything wrong.
> 

Thanks a lot for the detailed explanation and analysis.
I agree with your conclusion.

Guenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-23 13:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 16:04 [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf Lin Ma
2022-04-13  6:57 ` Krzysztof Kozlowski
2022-04-13 13:50 ` patchwork-bot+netdevbpf
2022-04-18 13:41 ` Guenter Roeck
2022-04-18 13:59   ` Lin Ma
2022-04-23 13:52     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.