All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v5 0/2] Replace improper checks and fix bugs in nfc subsystem
@ 2022-04-29  1:14 Duoming Zhou
  2022-04-29  1:14 ` [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions Duoming Zhou
  2022-04-29  1:14 ` [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
  0 siblings, 2 replies; 12+ messages in thread
From: Duoming Zhou @ 2022-04-29  1:14 UTC (permalink / raw)
  To: linux-kernel, kuba, krzysztof.kozlowski, gregkh
  Cc: davem, edumazet, pabeni, alexander.deucher, akpm, broonie,
	netdev, linma, Duoming Zhou

The first patch is used to replace improper checks in netlink related
functions of nfc core, the second patch is used to fix bugs in
nfcmrvl driver.

Duoming Zhou (2):
  nfc: replace improper check device_is_registered() in netlink related
    functions
  nfc: nfcmrvl: main: reorder destructive operations in
    nfcmrvl_nci_unregister_dev to avoid bugs

 drivers/nfc/nfcmrvl/main.c |  2 +-
 include/net/nfc/nfc.h      |  1 +
 net/nfc/core.c             | 26 ++++++++++++++------------
 3 files changed, 16 insertions(+), 13 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions
  2022-04-29  1:14 [PATCH net v5 0/2] Replace improper checks and fix bugs in nfc subsystem Duoming Zhou
@ 2022-04-29  1:14 ` Duoming Zhou
  2022-04-29  7:03   ` Greg KH
  2022-04-29  7:19   ` Krzysztof Kozlowski
  2022-04-29  1:14 ` [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
  1 sibling, 2 replies; 12+ messages in thread
From: Duoming Zhou @ 2022-04-29  1:14 UTC (permalink / raw)
  To: linux-kernel, kuba, krzysztof.kozlowski, gregkh
  Cc: davem, edumazet, pabeni, alexander.deucher, akpm, broonie,
	netdev, linma, Duoming Zhou

The device_is_registered() in nfc core is used to check whether
nfc device is registered in netlink related functions such as
nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
is protected by device_lock, there is still a race condition between
device_del() and device_is_registered(). The root cause is that
kobject_del() in device_del() is not protected by device_lock.

   (cleanup task)         |     (netlink task)
                          |
nfc_unregister_device     | nfc_fw_download
 device_del               |  device_lock
  ...                     |   if (!device_is_registered)//(1)
  kobject_del//(2)        |   ...
 ...                      |  device_unlock

The device_is_registered() returns the value of state_in_sysfs and
the state_in_sysfs is set to zero in kobject_del(). If we pass check in
position (1), then set zero in position (2). As a result, the check
in position (1) is useless.

This patch uses bool variable instead of device_is_registered() to judge
whether the nfc device is registered, which is well synchronized.

Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v5:
  - Replace device_is_registered() to bool variable.

 include/net/nfc/nfc.h |  1 +
 net/nfc/core.c        | 26 ++++++++++++++------------
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/net/nfc/nfc.h b/include/net/nfc/nfc.h
index 5dee575fbe8..7bb4ccb1830 100644
--- a/include/net/nfc/nfc.h
+++ b/include/net/nfc/nfc.h
@@ -167,6 +167,7 @@ struct nfc_dev {
 	int n_targets;
 	int targets_generation;
 	struct device dev;
+	bool dev_register;
 	bool dev_up;
 	bool fw_download_in_progress;
 	u8 rf_mode;
diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..52147da2286 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -38,7 +38,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -94,7 +94,7 @@ int nfc_dev_up(struct nfc_dev *dev)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -142,7 +142,7 @@ int nfc_dev_down(struct nfc_dev *dev)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -207,7 +207,7 @@ int nfc_start_poll(struct nfc_dev *dev, u32 im_protocols, u32 tm_protocols)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -246,7 +246,7 @@ int nfc_stop_poll(struct nfc_dev *dev)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -291,7 +291,7 @@ int nfc_dep_link_up(struct nfc_dev *dev, int target_index, u8 comm_mode)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -335,7 +335,7 @@ int nfc_dep_link_down(struct nfc_dev *dev)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -401,7 +401,7 @@ int nfc_activate_target(struct nfc_dev *dev, u32 target_idx, u32 protocol)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -448,7 +448,7 @@ int nfc_deactivate_target(struct nfc_dev *dev, u32 target_idx, u8 mode)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -495,7 +495,7 @@ int nfc_data_exchange(struct nfc_dev *dev, u32 target_idx, struct sk_buff *skb,
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		kfree_skb(skb);
 		goto error;
@@ -552,7 +552,7 @@ int nfc_enable_se(struct nfc_dev *dev, u32 se_idx)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -601,7 +601,7 @@ int nfc_disable_se(struct nfc_dev *dev, u32 se_idx)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!dev->dev_register) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -1134,6 +1134,7 @@ int nfc_register_device(struct nfc_dev *dev)
 			dev->rfkill = NULL;
 		}
 	}
+	dev->dev_register = true;
 	device_unlock(&dev->dev);
 
 	rc = nfc_genl_device_added(dev);
@@ -1166,6 +1167,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
 		rfkill_unregister(dev->rfkill);
 		rfkill_destroy(dev->rfkill);
 	}
+	dev->dev_register = false;
 	device_unlock(&dev->dev);
 
 	if (dev->ops->check_presence) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-29  1:14 [PATCH net v5 0/2] Replace improper checks and fix bugs in nfc subsystem Duoming Zhou
  2022-04-29  1:14 ` [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions Duoming Zhou
@ 2022-04-29  1:14 ` Duoming Zhou
  2022-04-29  7:27   ` Krzysztof Kozlowski
  1 sibling, 1 reply; 12+ messages in thread
From: Duoming Zhou @ 2022-04-29  1:14 UTC (permalink / raw)
  To: linux-kernel, kuba, krzysztof.kozlowski, gregkh
  Cc: davem, edumazet, pabeni, alexander.deucher, akpm, broonie,
	netdev, linma, Duoming Zhou

There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.

There are three situations that could lead to double-free bugs.

The first situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |  nfcmrvl_nci_unregister_dev
 release_firmware()           |   nfcmrvl_fw_dnld_abort
  kfree(fw) //(1)             |    fw_dnld_over
                              |     release_firmware
  ...                         |      kfree(fw) //(2)
                              |     ...

The second situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |
 mod_timer                    |
 (wait a time)                |
 fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
   fw_dnld_over               |   nfcmrvl_fw_dnld_abort
    release_firmware          |    fw_dnld_over
     kfree(fw) //(1)          |     release_firmware
     ...                      |      kfree(fw) //(2)

The third situation is shown below:

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_recv_frame          |
 if(..->fw_download_in_progress)|
  nfcmrvl_fw_dnld_recv_frame    |
   queue_work                   |
                                |
fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
 fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
  release_firmware              |   fw_dnld_over
   kfree(fw) //(1)              |    release_firmware
                                |     kfree(fw) //(2)

The firmware struct is deallocated in position (1) and deallocated
in position (2) again.

The crash trace triggered by POC is like below:

[  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
[  122.640457] Call Trace:
[  122.640457]  <TASK>
[  122.640457]  kfree+0xb0/0x330
[  122.640457]  fw_dnld_over+0x28/0xf0
[  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
[  122.640457]  nci_uart_tty_close+0x87/0xd0
[  122.640457]  tty_ldisc_kill+0x3e/0x80
[  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
[  122.640457]  __tty_hangup.part.0+0x316/0x520
[  122.640457]  tty_release+0x200/0x670
[  122.640457]  __fput+0x110/0x410
[  122.640457]  task_work_run+0x86/0xd0
[  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
[  122.640457]  syscall_exit_to_user_mode+0x19/0x50
[  122.640457]  do_syscall_64+0x48/0x90
[  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.640457] RIP: 0033:0x7f68433f6beb

What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.

This patch reorders destructive operations after nci_unregister_device
and uses bool variable in nfc_dev to synchronize between cleanup routine
and firmware download routine. The process is shown below.

       (Thread 1)                 |       (Thread 2)
nfcmrvl_nci_unregister_dev        |
  nci_unregister_device           |
    nfc_unregister_device         | nfc_fw_download
      device_lock()               |
      ...                         |
      dev->dev_register = false;  |   ...
      device_unlock()             |
  ...                             |   device_lock()
  (destructive operations)        |   if(!dev->dev_register)
                                  |     goto error;
                                  | error:
                                  |   device_unlock()

If the device is detaching, the download function will goto error.
If the download function is executing, nci_unregister_device will
wait until download function is finished.

Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v5:
  - Use bool variable added in patch 1/2 to synchronize.

 drivers/nfc/nfcmrvl/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
 	struct nci_dev *ndev = priv->ndev;
 
+	nci_unregister_device(ndev);
 	if (priv->ndev->nfc_dev->fw_download_in_progress)
 		nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 	if (gpio_is_valid(priv->config.reset_n_io))
 		gpio_free(priv->config.reset_n_io);
 
-	nci_unregister_device(ndev);
 	nci_free_device(ndev);
 	kfree(priv);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions
  2022-04-29  1:14 ` [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions Duoming Zhou
@ 2022-04-29  7:03   ` Greg KH
  2022-04-29  7:19   ` Krzysztof Kozlowski
  1 sibling, 0 replies; 12+ messages in thread
From: Greg KH @ 2022-04-29  7:03 UTC (permalink / raw)
  To: Duoming Zhou
  Cc: linux-kernel, kuba, krzysztof.kozlowski, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

On Fri, Apr 29, 2022 at 09:14:32AM +0800, Duoming Zhou wrote:
> The device_is_registered() in nfc core is used to check whether
> nfc device is registered in netlink related functions such as
> nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
> is protected by device_lock, there is still a race condition between
> device_del() and device_is_registered(). The root cause is that
> kobject_del() in device_del() is not protected by device_lock.
> 
>    (cleanup task)         |     (netlink task)
>                           |
> nfc_unregister_device     | nfc_fw_download
>  device_del               |  device_lock
>   ...                     |   if (!device_is_registered)//(1)
>   kobject_del//(2)        |   ...
>  ...                      |  device_unlock
> 
> The device_is_registered() returns the value of state_in_sysfs and
> the state_in_sysfs is set to zero in kobject_del(). If we pass check in
> position (1), then set zero in position (2). As a result, the check
> in position (1) is useless.
> 
> This patch uses bool variable instead of device_is_registered() to judge
> whether the nfc device is registered, which is well synchronized.
> 
> Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
> Changes in v5:
>   - Replace device_is_registered() to bool variable.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions
  2022-04-29  1:14 ` [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions Duoming Zhou
  2022-04-29  7:03   ` Greg KH
@ 2022-04-29  7:19   ` Krzysztof Kozlowski
  2022-04-29  9:18     ` duoming
  1 sibling, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2022-04-29  7:19 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel, kuba, gregkh
  Cc: davem, edumazet, pabeni, alexander.deucher, akpm, broonie, netdev, linma

On 29/04/2022 03:14, Duoming Zhou wrote:
> The device_is_registered() in nfc core is used to check whether
> nfc device is registered in netlink related functions such as
> nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
> is protected by device_lock, there is still a race condition between
> device_del() and device_is_registered(). The root cause is that
> kobject_del() in device_del() is not protected by device_lock.
> 
>    (cleanup task)         |     (netlink task)
>                           |
> nfc_unregister_device     | nfc_fw_download
>  device_del               |  device_lock
>   ...                     |   if (!device_is_registered)//(1)
>   kobject_del//(2)        |   ...
>  ...                      |  device_unlock
> 
> The device_is_registered() returns the value of state_in_sysfs and
> the state_in_sysfs is set to zero in kobject_del(). If we pass check in
> position (1), then set zero in position (2). As a result, the check
> in position (1) is useless.
> 
> This patch uses bool variable instead of device_is_registered() to judge
> whether the nfc device is registered, which is well synchronized.
> 
> Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
> Changes in v5:
>   - Replace device_is_registered() to bool variable.
> 
>  include/net/nfc/nfc.h |  1 +
>  net/nfc/core.c        | 26 ++++++++++++++------------
>  2 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/nfc/nfc.h b/include/net/nfc/nfc.h
> index 5dee575fbe8..7bb4ccb1830 100644
> --- a/include/net/nfc/nfc.h
> +++ b/include/net/nfc/nfc.h
> @@ -167,6 +167,7 @@ struct nfc_dev {
>  	int n_targets;
>  	int targets_generation;
>  	struct device dev;
> +	bool dev_register;
>  	bool dev_up;
>  	bool fw_download_in_progress;
>  	u8 rf_mode;
> diff --git a/net/nfc/core.c b/net/nfc/core.c
> index dc7a2404efd..52147da2286 100644
> --- a/net/nfc/core.c
> +++ b/net/nfc/core.c
> @@ -38,7 +38,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -94,7 +94,7 @@ int nfc_dev_up(struct nfc_dev *dev)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -142,7 +142,7 @@ int nfc_dev_down(struct nfc_dev *dev)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -207,7 +207,7 @@ int nfc_start_poll(struct nfc_dev *dev, u32 im_protocols, u32 tm_protocols)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -246,7 +246,7 @@ int nfc_stop_poll(struct nfc_dev *dev)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -291,7 +291,7 @@ int nfc_dep_link_up(struct nfc_dev *dev, int target_index, u8 comm_mode)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -335,7 +335,7 @@ int nfc_dep_link_down(struct nfc_dev *dev)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -401,7 +401,7 @@ int nfc_activate_target(struct nfc_dev *dev, u32 target_idx, u32 protocol)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -448,7 +448,7 @@ int nfc_deactivate_target(struct nfc_dev *dev, u32 target_idx, u8 mode)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -495,7 +495,7 @@ int nfc_data_exchange(struct nfc_dev *dev, u32 target_idx, struct sk_buff *skb,
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		kfree_skb(skb);
>  		goto error;
> @@ -552,7 +552,7 @@ int nfc_enable_se(struct nfc_dev *dev, u32 se_idx)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -601,7 +601,7 @@ int nfc_disable_se(struct nfc_dev *dev, u32 se_idx)
>  
>  	device_lock(&dev->dev);
>  
> -	if (!device_is_registered(&dev->dev)) {
> +	if (!dev->dev_register) {
>  		rc = -ENODEV;
>  		goto error;
>  	}
> @@ -1134,6 +1134,7 @@ int nfc_register_device(struct nfc_dev *dev)
>  			dev->rfkill = NULL;
>  		}
>  	}
> +	dev->dev_register = true;
>  	device_unlock(&dev->dev);
>  
>  	rc = nfc_genl_device_added(dev);
> @@ -1166,6 +1167,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
>  		rfkill_unregister(dev->rfkill);
>  		rfkill_destroy(dev->rfkill);
>  	}
> +	dev->dev_register = false;

We already have flag for it - dev->shutting_down. Currently it is used
only in if device implements check_presence but I think it can be easily
moved to common path.

Having multiple fields for similar, but slightly different cases, is
getting us closer and closer to spaghetti code.


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-29  1:14 ` [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
@ 2022-04-29  7:27   ` Krzysztof Kozlowski
  2022-04-29  9:13     ` duoming
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2022-04-29  7:27 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel, kuba, gregkh
  Cc: davem, edumazet, pabeni, alexander.deucher, akpm, broonie, netdev, linma

On 29/04/2022 03:14, Duoming Zhou wrote:
> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> gpio and so on could be destructed while the upper layer functions such as
> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> to double-free, use-after-free and null-ptr-deref bugs.
> 
> There are three situations that could lead to double-free bugs.
> 
> The first situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |  nfcmrvl_nci_unregister_dev
>  release_firmware()           |   nfcmrvl_fw_dnld_abort
>   kfree(fw) //(1)             |    fw_dnld_over
>                               |     release_firmware
>   ...                         |      kfree(fw) //(2)
>                               |     ...
> 
> The second situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |
>  mod_timer                    |
>  (wait a time)                |
>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
>     release_firmware          |    fw_dnld_over
>      kfree(fw) //(1)          |     release_firmware
>      ...                      |      kfree(fw) //(2)

How exactly the case here is being prevented?

If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?


> 
> The third situation is shown below:
> 
>        (Thread 1)               |       (Thread 2)
> nfcmrvl_nci_recv_frame          |
>  if(..->fw_download_in_progress)|
>   nfcmrvl_fw_dnld_recv_frame    |
>    queue_work                   |
>                                 |
> fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
>  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
>   release_firmware              |   fw_dnld_over
>    kfree(fw) //(1)              |    release_firmware
>                                 |     kfree(fw) //(2)
> 
> The firmware struct is deallocated in position (1) and deallocated
> in position (2) again.
> 
> The crash trace triggered by POC is like below:
> 
> [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> [  122.640457] Call Trace:
> [  122.640457]  <TASK>
> [  122.640457]  kfree+0xb0/0x330
> [  122.640457]  fw_dnld_over+0x28/0xf0
> [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> [  122.640457]  nci_uart_tty_close+0x87/0xd0
> [  122.640457]  tty_ldisc_kill+0x3e/0x80
> [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> [  122.640457]  __tty_hangup.part.0+0x316/0x520
> [  122.640457]  tty_release+0x200/0x670
> [  122.640457]  __fput+0x110/0x410
> [  122.640457]  task_work_run+0x86/0xd0
> [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> [  122.640457]  do_syscall_64+0x48/0x90
> [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  122.640457] RIP: 0033:0x7f68433f6beb

Please always strip unrelated parts of oops, so minimum the timing. The
addresses could be removed as well.

> 
> What's more, there are also use-after-free and null-ptr-deref bugs
> in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> then, we dereference firmware, gpio or the members of priv->fw_dnld in
> nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> 
> This patch reorders destructive operations after nci_unregister_device
> and uses bool variable in nfc_dev to synchronize between cleanup routine
> and firmware download routine. The process is shown below.
> 
>        (Thread 1)                 |       (Thread 2)
> nfcmrvl_nci_unregister_dev        |
>   nci_unregister_device           |
>     nfc_unregister_device         | nfc_fw_download
>       device_lock()               |
>       ...                         |
>       dev->dev_register = false;  |   ...
>       device_unlock()             |
>   ...                             |   device_lock()
>   (destructive operations)        |   if(!dev->dev_register)
>                                   |     goto error;
>                                   | error:
>                                   |   device_unlock()

How T2 calls appeared here? They were not present in any of your
previous process-examples...

> 
> If the device is detaching, the download function will goto error.
> If the download function is executing, nci_unregister_device will
> wait until download function is finished.
> 
> Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
> Changes in v5:
>   - Use bool variable added in patch 1/2 to synchronize.
> 
>  drivers/nfc/nfcmrvl/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
> index 2fcf545012b..1a5284de434 100644
> --- a/drivers/nfc/nfcmrvl/main.c
> +++ b/drivers/nfc/nfcmrvl/main.c
> @@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
>  {
>  	struct nci_dev *ndev = priv->ndev;
>  
> +	nci_unregister_device(ndev);
>  	if (priv->ndev->nfc_dev->fw_download_in_progress)
>  		nfcmrvl_fw_dnld_abort(priv);
>  
> @@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
>  	if (gpio_is_valid(priv->config.reset_n_io))
>  		gpio_free(priv->config.reset_n_io);
>  
> -	nci_unregister_device(ndev);
>  	nci_free_device(ndev);
>  	kfree(priv);
>  }


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-29  7:27   ` Krzysztof Kozlowski
@ 2022-04-29  9:13     ` duoming
  2022-05-02  6:34       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 12+ messages in thread
From: duoming @ 2022-04-29  9:13 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

Hello,

On Fri, 29 Apr 2022 09:27:48 +0200 Krzysztof wrote:

> > There are destructive operations such as nfcmrvl_fw_dnld_abort and
> > gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> > gpio and so on could be destructed while the upper layer functions such as
> > nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> > to double-free, use-after-free and null-ptr-deref bugs.
> > 
> > There are three situations that could lead to double-free bugs.
> > 
> > The first situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |  nfcmrvl_nci_unregister_dev
> >  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >   kfree(fw) //(1)             |    fw_dnld_over
> >                               |     release_firmware
> >   ...                         |      kfree(fw) //(2)
> >                               |     ...
> > 
> > The second situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |
> >  mod_timer                    |
> >  (wait a time)                |
> >  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >     release_firmware          |    fw_dnld_over
> >      kfree(fw) //(1)          |     release_firmware
> >      ...                      |      kfree(fw) //(2)
> 
> How exactly the case here is being prevented?
> 
> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?

I think it could be prevented. We use nci_unregister_device() to synchronize, if the
firmware download routine is running, the cleanup routine will wait it to finish. 
The flag "fw_download_in_progress" will be set to false, if the the firmware download
routine is finished. 

Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
in nfcmrvl_nci_unregister_dev() will not execute.

> > 
> > The third situation is shown below:
> > 
> >        (Thread 1)               |       (Thread 2)
> > nfcmrvl_nci_recv_frame          |
> >  if(..->fw_download_in_progress)|
> >   nfcmrvl_fw_dnld_recv_frame    |
> >    queue_work                   |
> >                                 |
> > fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
> >  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
> >   release_firmware              |   fw_dnld_over
> >    kfree(fw) //(1)              |    release_firmware
> >                                 |     kfree(fw) //(2)
> > 
> > The firmware struct is deallocated in position (1) and deallocated
> > in position (2) again.
> > 
> > The crash trace triggered by POC is like below:
> > 
> > [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> > [  122.640457] Call Trace:
> > [  122.640457]  <TASK>
> > [  122.640457]  kfree+0xb0/0x330
> > [  122.640457]  fw_dnld_over+0x28/0xf0
> > [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> > [  122.640457]  nci_uart_tty_close+0x87/0xd0
> > [  122.640457]  tty_ldisc_kill+0x3e/0x80
> > [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> > [  122.640457]  __tty_hangup.part.0+0x316/0x520
> > [  122.640457]  tty_release+0x200/0x670
> > [  122.640457]  __fput+0x110/0x410
> > [  122.640457]  task_work_run+0x86/0xd0
> > [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> > [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> > [  122.640457]  do_syscall_64+0x48/0x90
> > [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  122.640457] RIP: 0033:0x7f68433f6beb
> 
> Please always strip unrelated parts of oops, so minimum the timing. The
> addresses could be removed as well.

Thanks, I will strip unrelated parts of oops.

> > 
> > What's more, there are also use-after-free and null-ptr-deref bugs
> > in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> > set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> > then, we dereference firmware, gpio or the members of priv->fw_dnld in
> > nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> > 
> > This patch reorders destructive operations after nci_unregister_device
> > and uses bool variable in nfc_dev to synchronize between cleanup routine
> > and firmware download routine. The process is shown below.
> > 
> >        (Thread 1)                 |       (Thread 2)
> > nfcmrvl_nci_unregister_dev        |
> >   nci_unregister_device           |
> >     nfc_unregister_device         | nfc_fw_download
> >       device_lock()               |
> >       ...                         |
> >       dev->dev_register = false;  |   ...
> >       device_unlock()             |
> >   ...                             |   device_lock()
> >   (destructive operations)        |   if(!dev->dev_register)
> >                                   |     goto error;
> >                                   | error:
> >                                   |   device_unlock()
> 
> How T2 calls appeared here? They were not present in any of your
> previous process-examples...

The firmware download process is shown below:
 
nfc_genl_fw_download()->nfc_fw_download()->nci_fw_download()->nfcmrvl_nci_fw_download()
->nfcmrvl_fw_dnld_start()

So T2 calls is a part of firmware download routine in nfc core layer.

We use bool variable and device_lock() to synchronize between cleanup routine and
firmware download routine in nfc core layer. The above picture is attempt to show 
this synchronized process.

> > 
> > If the device is detaching, the download function will goto error.
> > If the download function is executing, nci_unregister_device will
> > wait until download function is finished.
> > 
> > Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> > Changes in v5:
> >   - Use bool variable added in patch 1/2 to synchronize.
> > 
> >  drivers/nfc/nfcmrvl/main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
> > index 2fcf545012b..1a5284de434 100644
> > --- a/drivers/nfc/nfcmrvl/main.c
> > +++ b/drivers/nfc/nfcmrvl/main.c
> > @@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
> >  {
> >  	struct nci_dev *ndev = priv->ndev;
> >  
> > +	nci_unregister_device(ndev);
> >  	if (priv->ndev->nfc_dev->fw_download_in_progress)
> >  		nfcmrvl_fw_dnld_abort(priv);
> >  
> > @@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
> >  	if (gpio_is_valid(priv->config.reset_n_io))
> >  		gpio_free(priv->config.reset_n_io);
> >  
> > -	nci_unregister_device(ndev);
> >  	nci_free_device(ndev);
> >  	kfree(priv);
> >  }


Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions
  2022-04-29  7:19   ` Krzysztof Kozlowski
@ 2022-04-29  9:18     ` duoming
  0 siblings, 0 replies; 12+ messages in thread
From: duoming @ 2022-04-29  9:18 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

Hello,

On Fri, 29 Apr 2022 09:19:36 +0200 Krzysztof wrote:

> > The device_is_registered() in nfc core is used to check whether
> > nfc device is registered in netlink related functions such as
> > nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
> > is protected by device_lock, there is still a race condition between
> > device_del() and device_is_registered(). The root cause is that
> > kobject_del() in device_del() is not protected by device_lock.
> > 
> >    (cleanup task)         |     (netlink task)
> >                           |
> > nfc_unregister_device     | nfc_fw_download
> >  device_del               |  device_lock
> >   ...                     |   if (!device_is_registered)//(1)
> >   kobject_del//(2)        |   ...
> >  ...                      |  device_unlock
> > 
> > The device_is_registered() returns the value of state_in_sysfs and
> > the state_in_sysfs is set to zero in kobject_del(). If we pass check in
> > position (1), then set zero in position (2). As a result, the check
> > in position (1) is useless.
> > 
> > This patch uses bool variable instead of device_is_registered() to judge
> > whether the nfc device is registered, which is well synchronized.
> > 
> > Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> > Changes in v5:
> >   - Replace device_is_registered() to bool variable.
> > 
> >  include/net/nfc/nfc.h |  1 +
> >  net/nfc/core.c        | 26 ++++++++++++++------------
> >  2 files changed, 15 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/net/nfc/nfc.h b/include/net/nfc/nfc.h
> > index 5dee575fbe8..7bb4ccb1830 100644
> > --- a/include/net/nfc/nfc.h
> > +++ b/include/net/nfc/nfc.h
> > @@ -167,6 +167,7 @@ struct nfc_dev {
> >  	int n_targets;
> >  	int targets_generation;
> >  	struct device dev;
> > +	bool dev_register;
> >  	bool dev_up;
> >  	bool fw_download_in_progress;
> >  	u8 rf_mode;
> > diff --git a/net/nfc/core.c b/net/nfc/core.c
> > index dc7a2404efd..52147da2286 100644
> > --- a/net/nfc/core.c
> > +++ b/net/nfc/core.c
> > @@ -38,7 +38,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -94,7 +94,7 @@ int nfc_dev_up(struct nfc_dev *dev)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -142,7 +142,7 @@ int nfc_dev_down(struct nfc_dev *dev)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -207,7 +207,7 @@ int nfc_start_poll(struct nfc_dev *dev, u32 im_protocols, u32 tm_protocols)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -246,7 +246,7 @@ int nfc_stop_poll(struct nfc_dev *dev)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -291,7 +291,7 @@ int nfc_dep_link_up(struct nfc_dev *dev, int target_index, u8 comm_mode)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -335,7 +335,7 @@ int nfc_dep_link_down(struct nfc_dev *dev)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -401,7 +401,7 @@ int nfc_activate_target(struct nfc_dev *dev, u32 target_idx, u32 protocol)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -448,7 +448,7 @@ int nfc_deactivate_target(struct nfc_dev *dev, u32 target_idx, u8 mode)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -495,7 +495,7 @@ int nfc_data_exchange(struct nfc_dev *dev, u32 target_idx, struct sk_buff *skb,
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		kfree_skb(skb);
> >  		goto error;
> > @@ -552,7 +552,7 @@ int nfc_enable_se(struct nfc_dev *dev, u32 se_idx)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -601,7 +601,7 @@ int nfc_disable_se(struct nfc_dev *dev, u32 se_idx)
> >  
> >  	device_lock(&dev->dev);
> >  
> > -	if (!device_is_registered(&dev->dev)) {
> > +	if (!dev->dev_register) {
> >  		rc = -ENODEV;
> >  		goto error;
> >  	}
> > @@ -1134,6 +1134,7 @@ int nfc_register_device(struct nfc_dev *dev)
> >  			dev->rfkill = NULL;
> >  		}
> >  	}
> > +	dev->dev_register = true;
> >  	device_unlock(&dev->dev);
> >  
> >  	rc = nfc_genl_device_added(dev);
> > @@ -1166,6 +1167,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
> >  		rfkill_unregister(dev->rfkill);
> >  		rfkill_destroy(dev->rfkill);
> >  	}
> > +	dev->dev_register = false;
> 
> We already have flag for it - dev->shutting_down. Currently it is used
> only in if device implements check_presence but I think it can be easily
> moved to common path.
> 
> Having multiple fields for similar, but slightly different cases, is
> getting us closer and closer to spaghetti code.

Thanks a lot for your suggestion, I will move dev->shutting_down to
common path.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-04-29  9:13     ` duoming
@ 2022-05-02  6:34       ` Krzysztof Kozlowski
  2022-05-02  7:25         ` duoming
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2022-05-02  6:34 UTC (permalink / raw)
  To: duoming
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

On 29/04/2022 11:13, duoming@zju.edu.cn wrote:
> Hello,
> 
> On Fri, 29 Apr 2022 09:27:48 +0200 Krzysztof wrote:
> 
>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
>>> gpio and so on could be destructed while the upper layer functions such as
>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
>>> to double-free, use-after-free and null-ptr-deref bugs.
>>>
>>> There are three situations that could lead to double-free bugs.
>>>
>>> The first situation is shown below:
>>>
>>>    (Thread 1)                 |      (Thread 2)
>>> nfcmrvl_fw_dnld_start         |
>>>  ...                          |  nfcmrvl_nci_unregister_dev
>>>  release_firmware()           |   nfcmrvl_fw_dnld_abort
>>>   kfree(fw) //(1)             |    fw_dnld_over
>>>                               |     release_firmware
>>>   ...                         |      kfree(fw) //(2)
>>>                               |     ...
>>>
>>> The second situation is shown below:
>>>
>>>    (Thread 1)                 |      (Thread 2)
>>> nfcmrvl_fw_dnld_start         |
>>>  ...                          |
>>>  mod_timer                    |
>>>  (wait a time)                |
>>>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
>>>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
>>>     release_firmware          |    fw_dnld_over
>>>      kfree(fw) //(1)          |     release_firmware
>>>      ...                      |      kfree(fw) //(2)
>>
>> How exactly the case here is being prevented?
>>
>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
> 
> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
> firmware download routine is running, the cleanup routine will wait it to finish. 
> The flag "fw_download_in_progress" will be set to false, if the the firmware download
> routine is finished. 

fw_download_in_progress is not synchronized in
nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
see updated fw_download_in_progress.

> 
> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
> in nfcmrvl_nci_unregister_dev() will not execute.

I am sorry, but you cannot move code around hoping it will by itself
solve synchronization issues.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-05-02  6:34       ` Krzysztof Kozlowski
@ 2022-05-02  7:25         ` duoming
  2022-05-04  6:32           ` Krzysztof Kozlowski
  0 siblings, 1 reply; 12+ messages in thread
From: duoming @ 2022-05-02  7:25 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma




> -----原始邮件-----
> 发件人: "Krzysztof Kozlowski" <krzysztof.kozlowski@linaro.org>
> 发送时间: 2022-05-02 14:34:07 (星期一)
> 收件人: duoming@zju.edu.cn
> 抄送: linux-kernel@vger.kernel.org, kuba@kernel.org, gregkh@linuxfoundation.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, alexander.deucher@amd.com, akpm@linux-foundation.org, broonie@kernel.org, netdev@vger.kernel.org, linma@zju.edu.cn
> 主题: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
> 
> On 29/04/2022 11:13, duoming@zju.edu.cn wrote:
> > Hello,
> > 
> > On Fri, 29 Apr 2022 09:27:48 +0200 Krzysztof wrote:
> > 
> >>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> >>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> >>> gpio and so on could be destructed while the upper layer functions such as
> >>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> >>> to double-free, use-after-free and null-ptr-deref bugs.
> >>>
> >>> There are three situations that could lead to double-free bugs.
> >>>
> >>> The first situation is shown below:
> >>>
> >>>    (Thread 1)                 |      (Thread 2)
> >>> nfcmrvl_fw_dnld_start         |
> >>>  ...                          |  nfcmrvl_nci_unregister_dev
> >>>  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >>>   kfree(fw) //(1)             |    fw_dnld_over
> >>>                               |     release_firmware
> >>>   ...                         |      kfree(fw) //(2)
> >>>                               |     ...
> >>>
> >>> The second situation is shown below:
> >>>
> >>>    (Thread 1)                 |      (Thread 2)
> >>> nfcmrvl_fw_dnld_start         |
> >>>  ...                          |
> >>>  mod_timer                    |
> >>>  (wait a time)                |
> >>>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >>>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >>>     release_firmware          |    fw_dnld_over
> >>>      kfree(fw) //(1)          |     release_firmware
> >>>      ...                      |      kfree(fw) //(2)
> >>
> >> How exactly the case here is being prevented?
> >>
> >> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
> >> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
> > 
> > I think it could be prevented. We use nci_unregister_device() to synchronize, if the
> > firmware download routine is running, the cleanup routine will wait it to finish. 
> > The flag "fw_download_in_progress" will be set to false, if the the firmware download
> > routine is finished. 
> 
> fw_download_in_progress is not synchronized in
> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
> see updated fw_download_in_progress.

The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
synchronized with nfc_unregister_device(). If nfc_fw_download() is running, nfc_unregister_device()
will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
fw_download_in_progress. The process is shown below:

        (Thread 1)                                         |       (Thread 2)
 nfcmrvl_nci_unregister_dev                                | nfc_fw_download
   nci_unregister_device                                   |  ...
                                                           |  device_lock()
     ...                                                   |  dev->fw_download_in_progress = false; //(1)
                                                           |  device_unlock()
     nfc_unregister_device                                 | 
   if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) | 
     nfcmrvl_fw_dnld_abort(priv); //not execute            |   

We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
bugs could be prevented.

> > Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
> > will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
> > in nfcmrvl_nci_unregister_dev() will not execute.
> 
> I am sorry, but you cannot move code around hoping it will by itself
> solve synchronization issues.

I think this solution sove synchronization issues. If you still have any questions welcome to ask me.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-05-02  7:25         ` duoming
@ 2022-05-04  6:32           ` Krzysztof Kozlowski
  2022-05-04  9:07             ` duoming
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2022-05-04  6:32 UTC (permalink / raw)
  To: duoming
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

On 02/05/2022 09:25, duoming@zju.edu.cn wrote:
> 
> 
> 
>> -----原始邮件-----
>> 发件人: "Krzysztof Kozlowski" <krzysztof.kozlowski@linaro.org>
>> 发送时间: 2022-05-02 14:34:07 (星期一)
>> 收件人: duoming@zju.edu.cn
>> 抄送: linux-kernel@vger.kernel.org, kuba@kernel.org, gregkh@linuxfoundation.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, alexander.deucher@amd.com, akpm@linux-foundation.org, broonie@kernel.org, netdev@vger.kernel.org, linma@zju.edu.cn
>> 主题: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
>>
>> On 29/04/2022 11:13, duoming@zju.edu.cn wrote:
>>> Hello,
>>>
>>> On Fri, 29 Apr 2022 09:27:48 +0200 Krzysztof wrote:
>>>
>>>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
>>>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
>>>>> gpio and so on could be destructed while the upper layer functions such as
>>>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
>>>>> to double-free, use-after-free and null-ptr-deref bugs.
>>>>>
>>>>> There are three situations that could lead to double-free bugs.
>>>>>
>>>>> The first situation is shown below:
>>>>>
>>>>>    (Thread 1)                 |      (Thread 2)
>>>>> nfcmrvl_fw_dnld_start         |
>>>>>  ...                          |  nfcmrvl_nci_unregister_dev
>>>>>  release_firmware()           |   nfcmrvl_fw_dnld_abort
>>>>>   kfree(fw) //(1)             |    fw_dnld_over
>>>>>                               |     release_firmware
>>>>>   ...                         |      kfree(fw) //(2)
>>>>>                               |     ...
>>>>>
>>>>> The second situation is shown below:
>>>>>
>>>>>    (Thread 1)                 |      (Thread 2)
>>>>> nfcmrvl_fw_dnld_start         |
>>>>>  ...                          |
>>>>>  mod_timer                    |
>>>>>  (wait a time)                |
>>>>>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
>>>>>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
>>>>>     release_firmware          |    fw_dnld_over
>>>>>      kfree(fw) //(1)          |     release_firmware
>>>>>      ...                      |      kfree(fw) //(2)
>>>>
>>>> How exactly the case here is being prevented?
>>>>
>>>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
>>>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
>>>
>>> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
>>> firmware download routine is running, the cleanup routine will wait it to finish. 
>>> The flag "fw_download_in_progress" will be set to false, if the the firmware download
>>> routine is finished. 
>>
>> fw_download_in_progress is not synchronized in
>> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
>> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
>> see updated fw_download_in_progress.
> 
> The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
> synchronized with nfc_unregister_device().

No, it is not. There is no synchronization primitive in
nfc_unregister_device(), at least explicitly.

> If nfc_fw_download() is running, nfc_unregister_device()
> will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
> fw_download_in_progress. The process is shown below:
> 
>         (Thread 1)                                         |       (Thread 2)
>  nfcmrvl_nci_unregister_dev                                | nfc_fw_download
>    nci_unregister_device                                   |  ...
>                                                            |  device_lock()
>      ...                                                   |  dev->fw_download_in_progress = false; //(1)
>                                                            |  device_unlock()
>      nfc_unregister_device                                 | 
>    if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) | 
>      nfcmrvl_fw_dnld_abort(priv); //not execute            |   
> 
> We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
> the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
> bugs could be prevented.

You just repeated the same not answering the question. The
fw_download_in_progress at point (2) can be still true, on that CPU. I
explain it third time so let me rephrase it - the
fw_download_in_progress can be reordered by compiler or CPU to:

T1                                          | T2
nfcmrvl_nci_unregister_dev()
  nci_unregister_device()
    var = fw_download_in_progress; (true)
                                            | nfc_fw_download
                                            | device_lock
                                            | dev->fw_download = false;
                                            | device_unlock
    if (var)                                |
      nfcmrvl_fw_dnld_abort(priv);          |

Every write barrier must be paired with read barrier. Every lock on one
access to variable, must be paired with same lock on other access to
variable .

> 
>>> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
>>> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
>>> in nfcmrvl_nci_unregister_dev() will not execute.
>>
>> I am sorry, but you cannot move code around hoping it will by itself
>> solve synchronization issues.
> 
> I think this solution sove synchronization issues. If you still have any questions welcome to ask me.

No, you still do not get the pint. You cannot move code around because
this itself does not solve missing synchronization primitives and
related issues.


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
  2022-05-04  6:32           ` Krzysztof Kozlowski
@ 2022-05-04  9:07             ` duoming
  0 siblings, 0 replies; 12+ messages in thread
From: duoming @ 2022-05-04  9:07 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, gregkh, davem, edumazet, pabeni,
	alexander.deucher, akpm, broonie, netdev, linma

Hello,

On Wed, 4 May 2022 08:32:15 +0200 Krzysztof wrote:

> >>>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> >>>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> >>>>> gpio and so on could be destructed while the upper layer functions such as
> >>>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> >>>>> to double-free, use-after-free and null-ptr-deref bugs.
> >>>>>
> >>>>> There are three situations that could lead to double-free bugs.
> >>>>>
> >>>>> The first situation is shown below:
> >>>>>
> >>>>>    (Thread 1)                 |      (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start         |
> >>>>>  ...                          |  nfcmrvl_nci_unregister_dev
> >>>>>  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >>>>>   kfree(fw) //(1)             |    fw_dnld_over
> >>>>>                               |     release_firmware
> >>>>>   ...                         |      kfree(fw) //(2)
> >>>>>                               |     ...
> >>>>>
> >>>>> The second situation is shown below:
> >>>>>
> >>>>>    (Thread 1)                 |      (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start         |
> >>>>>  ...                          |
> >>>>>  mod_timer                    |
> >>>>>  (wait a time)                |
> >>>>>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >>>>>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >>>>>     release_firmware          |    fw_dnld_over
> >>>>>      kfree(fw) //(1)          |     release_firmware
> >>>>>      ...                      |      kfree(fw) //(2)
> >>>>
> >>>> How exactly the case here is being prevented?
> >>>>
> >>>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
> >>>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
> >>>
> >>> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
> >>> firmware download routine is running, the cleanup routine will wait it to finish. 
> >>> The flag "fw_download_in_progress" will be set to false, if the the firmware download
> >>> routine is finished. 
> >>
> >> fw_download_in_progress is not synchronized in
> >> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
> >> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
> >> see updated fw_download_in_progress.
> > 
> > The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
> > synchronized with nfc_unregister_device().
> 
> No, it is not. There is no synchronization primitive in
> nfc_unregister_device(), at least explicitly.

There is device_lock() in nfc_fw_download() which could synchronize with nfc_unregister_device().
The code in nfc_unregister_device() is shown below:

nfc_unregister_device()
  ...
  device_lock(&dev->dev);
  ...
  dev->shutting_down = true;
  device_unlock(&dev->dev);

> > If nfc_fw_download() is running, nfc_unregister_device()
> > will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
> > fw_download_in_progress. The process is shown below:
> > 
> >         (Thread 1)                                         |       (Thread 2)
> >  nfcmrvl_nci_unregister_dev                                | nfc_fw_download
> >    nci_unregister_device                                   |  ...
> >                                                            |  device_lock()
> >      ...                                                   |  dev->fw_download_in_progress = false; //(1)
> >                                                            |  device_unlock()
> >      nfc_unregister_device                                 | 
> >    if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) | 
> >      nfcmrvl_fw_dnld_abort(priv); //not execute            |   
> > 
> > We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
> > the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
> > bugs could be prevented.
> 
> You just repeated the same not answering the question. The
> fw_download_in_progress at point (2) can be still true, on that CPU. I
> explain it third time so let me rephrase it - the
> fw_download_in_progress can be reordered by compiler or CPU to:
> 
> T1                                          | T2
> nfcmrvl_nci_unregister_dev()
>   nci_unregister_device()
>     var = fw_download_in_progress; (true)
>                                             | nfc_fw_download
>                                             | device_lock
>                                             | dev->fw_download = false;
>                                             | device_unlock
>     if (var)                                |
>       nfcmrvl_fw_dnld_abort(priv);          |
> 
> Every write barrier must be paired with read barrier. Every lock on one
> access to variable, must be paired with same lock on other access to
> variable .

There is paired write barrier and read barrier "device_lock(&dev->dev)" between 
nfc_unregister_device() and nfc_fw_download(). The shutting_down flag of nfc_dev
protected by device_lock() is set to true in position(2) and check in position(1).

         (Thread 1)                                       |       (Thread 2)
nfcmrvl_nci_unregister_dev                                | nfc_fw_download
  nci_unregister_device                                   |  ...
                                                          |  device_lock(&dev->dev)
    ...                                                   |  if (dev->shutting_down) //(1)
                                                          |    goto error;
                                                          |  ...
                                                          |  dev->fw_download_in_progress = false; //(3)
                                                          |  device_unlock(&dev->dev)
    nfc_unregister_device                                 |
      device_lock(&dev->dev);                             |
      dev->shutting_down = true; //(2)                    | 
      device_unlock(&dev->dev);                           | 
    if (priv->ndev->nfc_dev->fw_download_in_progress)//(4)|   
      nfcmrvl_fw_dnld_abort(priv);                        |

If nfc_fw_download() is running, nfc_unregister_device() will wait nfc_fw_download() to finish,
and the nfc_dev->fw_download_in_progress is set to false in position (3). The check in position(4)
is false, because we have set nfc_dev->fw_download_in_progress to false in position(3).

If we set shutting_down to true in position(2) before check in position(1) the nfc_fw_download()
will goto error.

> >>> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
> >>> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
> >>> in nfcmrvl_nci_unregister_dev() will not execute.
> >>
> >> I am sorry, but you cannot move code around hoping it will by itself
> >> solve synchronization issues.
> > 
> > I think this solution sove synchronization issues. If you still have any questions welcome to ask me.
> 
> No, you still do not get the pint. You cannot move code around because
> this itself does not solve missing synchronization primitives and
> related issues.

I think this solution could solve synchronization issues. If you still have any questions welcome to ask me.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-04  9:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29  1:14 [PATCH net v5 0/2] Replace improper checks and fix bugs in nfc subsystem Duoming Zhou
2022-04-29  1:14 ` [PATCH net v5 1/2] nfc: replace improper check device_is_registered() in netlink related functions Duoming Zhou
2022-04-29  7:03   ` Greg KH
2022-04-29  7:19   ` Krzysztof Kozlowski
2022-04-29  9:18     ` duoming
2022-04-29  1:14 ` [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Duoming Zhou
2022-04-29  7:27   ` Krzysztof Kozlowski
2022-04-29  9:13     ` duoming
2022-05-02  6:34       ` Krzysztof Kozlowski
2022-05-02  7:25         ` duoming
2022-05-04  6:32           ` Krzysztof Kozlowski
2022-05-04  9:07             ` duoming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.