linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
@ 2022-04-25  9:58 Duoming Zhou
  2022-04-26 11:17 ` Paolo Abeni
  0 siblings, 1 reply; 3+ messages in thread
From: Duoming Zhou @ 2022-04-25  9:58 UTC (permalink / raw)
  To: krzysztof.kozlowski, linux-kernel
  Cc: davem, gregkh, alexander.deucher, broonie, akpm, netdev, linma,
	Duoming Zhou

There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.

There are three situations that could lead to double-free bugs.

The first situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |  nfcmrvl_nci_unregister_dev
 release_firmware()           |   nfcmrvl_fw_dnld_abort
  kfree(fw) //(1)             |    fw_dnld_over
                              |     release_firmware
  ...                         |      kfree(fw) //(2)
                              |     ...

The second situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |
 mod_timer                    |
 (wait a time)                |
 fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
   fw_dnld_over               |   nfcmrvl_fw_dnld_abort
    release_firmware          |    fw_dnld_over
     kfree(fw) //(1)          |     release_firmware
     ...                      |      kfree(fw) //(2)

The third situation is shown below:

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_recv_frame          |
 if(..->fw_download_in_progress)|
  nfcmrvl_fw_dnld_recv_frame    |
   queue_work                   |
                                |
fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
 fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
  release_firmware              |   fw_dnld_over
   kfree(fw) //(1)              |    release_firmware
                                |     kfree(fw) //(2)

The firmware struct is deallocated in position (1) and deallocated
in position (2) again.

The crash trace triggered by POC is like below:

[  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
[  122.640457] Call Trace:
[  122.640457]  <TASK>
[  122.640457]  kfree+0xb0/0x330
[  122.640457]  fw_dnld_over+0x28/0xf0
[  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
[  122.640457]  nci_uart_tty_close+0x87/0xd0
[  122.640457]  tty_ldisc_kill+0x3e/0x80
[  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
[  122.640457]  __tty_hangup.part.0+0x316/0x520
[  122.640457]  tty_release+0x200/0x670
[  122.640457]  __fput+0x110/0x410
[  122.640457]  task_work_run+0x86/0xd0
[  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
[  122.640457]  syscall_exit_to_user_mode+0x19/0x50
[  122.640457]  do_syscall_64+0x48/0x90
[  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.640457] RIP: 0033:0x7f68433f6beb

What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.

This patch reorders destructive operations after nci_unregister_device
to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
is well synchronized and won't return if there is a running routine.
This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
nfc_{un,}register_device").

Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in V2:
  - Correct the subject prefix.

 drivers/nfc/nfcmrvl/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
 	struct nci_dev *ndev = priv->ndev;
 
+	nci_unregister_device(ndev);
 	if (priv->ndev->nfc_dev->fw_download_in_progress)
 		nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 	if (gpio_is_valid(priv->config.reset_n_io))
 		gpio_free(priv->config.reset_n_io);
 
-	nci_unregister_device(ndev);
 	nci_free_device(ndev);
 	kfree(priv);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-25  9:58 [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
@ 2022-04-26 11:17 ` Paolo Abeni
  2022-04-26 15:55   ` duoming
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Abeni @ 2022-04-26 11:17 UTC (permalink / raw)
  To: Duoming Zhou, krzysztof.kozlowski, linux-kernel
  Cc: davem, gregkh, alexander.deucher, broonie, akpm, netdev, linma

On Mon, 2022-04-25 at 17:58 +0800, Duoming Zhou wrote:
> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> gpio and so on could be destructed while the upper layer functions such as
> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> to double-free, use-after-free and null-ptr-deref bugs.
> 
> There are three situations that could lead to double-free bugs.
> 
> The first situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |  nfcmrvl_nci_unregister_dev
>  release_firmware()           |   nfcmrvl_fw_dnld_abort
>   kfree(fw) //(1)             |    fw_dnld_over
>                               |     release_firmware
>   ...                         |      kfree(fw) //(2)
>                               |     ...
> 
> The second situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |
>  mod_timer                    |
>  (wait a time)                |
>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
>     release_firmware          |    fw_dnld_over
>      kfree(fw) //(1)          |     release_firmware
>      ...                      |      kfree(fw) //(2)
> 
> The third situation is shown below:
> 
>        (Thread 1)               |       (Thread 2)
> nfcmrvl_nci_recv_frame          |
>  if(..->fw_download_in_progress)|
>   nfcmrvl_fw_dnld_recv_frame    |
>    queue_work                   |
>                                 |
> fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
>  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
>   release_firmware              |   fw_dnld_over
>    kfree(fw) //(1)              |    release_firmware
>                                 |     kfree(fw) //(2)
> 
> The firmware struct is deallocated in position (1) and deallocated
> in position (2) again.
> 
> The crash trace triggered by POC is like below:
> 
> [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> [  122.640457] Call Trace:
> [  122.640457]  <TASK>
> [  122.640457]  kfree+0xb0/0x330
> [  122.640457]  fw_dnld_over+0x28/0xf0
> [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> [  122.640457]  nci_uart_tty_close+0x87/0xd0
> [  122.640457]  tty_ldisc_kill+0x3e/0x80
> [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> [  122.640457]  __tty_hangup.part.0+0x316/0x520
> [  122.640457]  tty_release+0x200/0x670
> [  122.640457]  __fput+0x110/0x410
> [  122.640457]  task_work_run+0x86/0xd0
> [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> [  122.640457]  do_syscall_64+0x48/0x90
> [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  122.640457] RIP: 0033:0x7f68433f6beb
> 
> What's more, there are also use-after-free and null-ptr-deref bugs
> in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> then, we dereference firmware, gpio or the members of priv->fw_dnld in
> nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> 
> This patch reorders destructive operations after nci_unregister_device
> to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> is well synchronized and won't return if there is a running routine.
> This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> nfc_{un,}register_device").

It looks like the above is not enough to close all the possible races,
specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
still race one vs another. 

I *think* that the approach you already suggested here:

https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@zju.edu.cn/

should be safer - but you have to protect with the same spinlock even
every fw_dnld->fw modification.

@Lin Ma: I see you don't like the spinlock solution, but this other
option looks racing. Do you have other suggestions? (and/or would you
reconsider the spinlock?)

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-26 11:17 ` Paolo Abeni
@ 2022-04-26 15:55   ` duoming
  0 siblings, 0 replies; 3+ messages in thread
From: duoming @ 2022-04-26 15:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: krzysztof.kozlowski, linux-kernel, davem, gregkh,
	alexander.deucher, broonie, akpm, netdev, linma

hello,

On Mon, Tue, 26 Apr 2022 13:17:21 +0200 Paolo wrote:

> > There are destructive operations such as nfcmrvl_fw_dnld_abort and
> > gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> > gpio and so on could be destructed while the upper layer functions such as
> > nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> > to double-free, use-after-free and null-ptr-deref bugs.
> > 
> > There are three situations that could lead to double-free bugs.
> > 
> > The first situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |  nfcmrvl_nci_unregister_dev
> >  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >   kfree(fw) //(1)             |    fw_dnld_over
> >                               |     release_firmware
> >   ...                         |      kfree(fw) //(2)
> >                               |     ...
> > 
> > The second situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |
> >  mod_timer                    |
> >  (wait a time)                |
> >  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >     release_firmware          |    fw_dnld_over
> >      kfree(fw) //(1)          |     release_firmware
> >      ...                      |      kfree(fw) //(2)
> > 
> > The third situation is shown below:
> > 
> >        (Thread 1)               |       (Thread 2)
> > nfcmrvl_nci_recv_frame          |
> >  if(..->fw_download_in_progress)|
> >   nfcmrvl_fw_dnld_recv_frame    |
> >    queue_work                   |
> >                                 |
> > fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
> >  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
> >   release_firmware              |   fw_dnld_over
> >    kfree(fw) //(1)              |    release_firmware
> >                                 |     kfree(fw) //(2)
> > 
> > The firmware struct is deallocated in position (1) and deallocated
> > in position (2) again.
> > 
> > The crash trace triggered by POC is like below:
> > 
> > [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> > [  122.640457] Call Trace:
> > [  122.640457]  <TASK>
> > [  122.640457]  kfree+0xb0/0x330
> > [  122.640457]  fw_dnld_over+0x28/0xf0
> > [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> > [  122.640457]  nci_uart_tty_close+0x87/0xd0
> > [  122.640457]  tty_ldisc_kill+0x3e/0x80
> > [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> > [  122.640457]  __tty_hangup.part.0+0x316/0x520
> > [  122.640457]  tty_release+0x200/0x670
> > [  122.640457]  __fput+0x110/0x410
> > [  122.640457]  task_work_run+0x86/0xd0
> > [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> > [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> > [  122.640457]  do_syscall_64+0x48/0x90
> > [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  122.640457] RIP: 0033:0x7f68433f6beb
> > 
> > What's more, there are also use-after-free and null-ptr-deref bugs
> > in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> > set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> > then, we dereference firmware, gpio or the members of priv->fw_dnld in
> > nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> > 
> > This patch reorders destructive operations after nci_unregister_device
> > to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> > is well synchronized and won't return if there is a running routine.
> > This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> > nfc_{un,}register_device").

> It looks like the above is not enough to close all the possible races,
> specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
> still race one vs another. 

The fw_dnld_timeout() and fw_dnld_rx_work() could not race with each other.
Because fw_dnld_rx_work() is used to receive packets from firmware download
process, the release_firmware() in fw_dnld_rx_work() will not execute unless it
has received firmware packets, the code is shown below. 

fw_dnld_rx_work                                   
  while ((skb = skb_dequeue(&fw_dnld->rx_q))) {
    ...  
    fw_dnld_over
      release_firmware
  }

The release_firmware() is only called when firmware download process failed, and
fw_dnld_rx_work() will not receive any packets from firmware download process.
As a result, the race is non-existent. The process is shown below:

fw_dnld_rx_work          | nfcmrvl_fw_dnld_start
  ...                    |   ...
                         |   release_firmware
  (wait for skb)         |   (the following will not send any skb)
  while (skb = ...) {    |   
    ...                  | 
    fw_dnld_over         |
      release_firmware   |
  }                      |
  (receive no packet)    |

> I *think* that the approach you already suggested here:
> 
> https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@zju.edu.cn/
> 
> should be safer - but you have to protect with the same spinlock even
> every fw_dnld->fw modification.

The spinlock should not be used to protect release_firmware(), because there is vfree() in release_firmware().
The vfree() is a function that may sleep. If we use spinlock, the sleep in atomic bug will happen. The process
is shown below:

spin_lock()
release_firmware()
  firmware_free_data()
    vfree()  //sleep in atomic
spin_unlock()

We reorders destructive operations after nci_unregister_device(), 
using nci_unregister_device() to synchonize with firmware download routine.
But I found this is not enough, when nci_unregister_device() is return, there 
are still firmware download routine is running.

[   65.835462] BUG: KASAN: use-after-free in nci_fw_download+0x26/0x60
[   65.840236] Read of size 8 at addr ffff88800c2f5008
...
[   65.845755] Call Trace:
[   65.856235]  nci_fw_download+0x26/0x60
[   65.856235]  nfc_fw_download+0x99/0xe0
[   65.856235]  nfc_genl_fw_download+0x10b/0x170
...

In order to solve this problem, I propose the following solutions:

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
        struct nci_dev *ndev = priv->ndev;
 
+       nci_unregister_device(ndev);
        if (priv->ndev->nfc_dev->fw_download_in_progress)
                nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
        if (gpio_is_valid(priv->config.reset_n_io))
                gpio_free(priv->config.reset_n_io);
 
-       nci_unregister_device(ndev);
        nci_free_device(ndev);
        kfree(priv);
 }

diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..da8199f67d4 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,7 @@
 #define NFC_CHECK_PRES_FREQ_MS 2000
 
 int nfc_devlist_generation;
+bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */
@@ -38,7 +39,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
        device_lock(&dev->dev);
 
-       if (!device_is_registered(&dev->dev)) {
+       if (!device_is_registered(&dev->dev) || !nfc_download) {
                rc = -ENODEV;
                goto error;
        }
@@ -1134,6 +1135,7 @@ int nfc_register_device(struct nfc_dev *dev)
                        dev->rfkill = NULL;
                }
        }
+       nfc_download = true;
        device_unlock(&dev->dev);
 
        rc = nfc_genl_device_added(dev);
@@ -1166,6 +1168,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
                rfkill_unregister(dev->rfkill);
                rfkill_destroy(dev->rfkill);
        }
+       nfc_download = false;
        device_unlock(&dev->dev);
 
        if (dev->ops->check_presence) {
-- 

The new solution not only reorders destructive operations after nci_unregister_device,
but also adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_unregister_dev      |
  nci_unregister_device         |
    nfc_unregister_device       | nfc_fw_download
      device_lock()             |
      ...                       |
      nfc_download = false;//(1)|   ...
      device_unlock()           |
  ...                           |   device_lock()
  (destructive operations)      |   if(.. || !nfc_download)
                                |     goto error; //(2)
                                |   rc = dev->ops->fw_download();
                                | error:
                                |   device_unlock()

If the device is detaching, nfc_download will be set to false in position (1) and 
the download function will goto error.

       (Thread 1)                  |       (Thread 2)
nfcmrvl_nci_unregister_dev         |
  nci_unregister_device            |
    nfc_unregister_device          | nfc_fw_download
                                   |   device_lock()
                                   |   rc = dev->ops->fw_download();
                                   |   dev->fw_download_in_progress = false;
                                   |   device_unlock()
      device_lock()                |   ...
      ...                          |   
      nfc_download = false;        |   ...
      device_unlock()              |
  ...                              |
  if (..->fw_download_in_progress) |
    nfcmrvl_fw_dnld_abort          |
      fw_dnld_over                 |
        release_firmware           |

If the download function is executing, nci_unregister_device() will wait until
firmware download routine is finished. What`s more, when the firmware download
routine is finished, the fw_download_in_progress() in nfc_dev struct will be set
to false. the release_firmware() in nfcmrvl_nci_unregister_dev() will not execute,
which could synchonize with fw_dnld_rx_work() and fw_dnld_timeout().

I think the new solution is good, it has been tested enough. I sent "[PATCH net v3]
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs"
just now.

Best regards,
Duoming Zhou

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-26 15:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25  9:58 [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-26 11:17 ` Paolo Abeni
2022-04-26 15:55   ` duoming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).