All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix races between nbd setup and module removal
@ 2021-09-03  2:48 Hou Tao
  2021-09-03  2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
  2021-09-03  2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
  0 siblings, 2 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03  2:48 UTC (permalink / raw)
  To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1

Hi,

The patch series aims to fix the races between nbd setup and module
removal which will lead to oops. Patch #1 serializes the concurrently
calling of nbd_genl_connect() and nbd_cleanup(), and patch #2 fixes
race between nbd_alloc_config() and nbd_cleanup().

Any comments are welcome.

Regards,
Tao

Hou Tao (2):
  nbd: call genl_unregister_family() first in nbd_cleanup()
  nbd: fix race between nbd_alloc_config() and module removal

 drivers/block/nbd.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup()
  2021-09-03  2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
@ 2021-09-03  2:48 ` Hou Tao
  2021-09-03  2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
  1 sibling, 0 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03  2:48 UTC (permalink / raw)
  To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1

Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000098
  Oops: 0002 [#1] SMP PTI
  CPU: 1 PID: 31299 Comm: nbd-client Tainted: G            E     5.14.0-rc4
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: 0010:down_write+0x1a/0x50
  Call Trace:
   start_creating+0x89/0x130
   debugfs_create_dir+0x1b/0x130
   nbd_start_device+0x13d/0x390 [nbd]
   nbd_genl_connect+0x42f/0x748 [nbd]
   genl_family_rcv_msg_doit.isra.0+0xec/0x150
   genl_rcv_msg+0xe5/0x1e0
   netlink_rcv_skb+0x55/0x100
   genl_rcv+0x29/0x40
   netlink_unicast+0x1a8/0x250
   netlink_sendmsg+0x21b/0x430
   ____sys_sendmsg+0x2a4/0x2d0
   ___sys_sendmsg+0x81/0xc0
   __sys_sendmsg+0x62/0xb0
   __x64_sys_sendmsg+0x1f/0x30
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  Modules linked in: nbd(E-)

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
v2:
 * update comment and commit message (suggested by eblake@redhat.com)
---
 drivers/block/nbd.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..6d78d7d0c15f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2468,6 +2468,12 @@ static void __exit nbd_cleanup(void)
 	struct nbd_device *nbd;
 	LIST_HEAD(del_list);
 
+	/*
+	 * Unregister netlink interface prior to waiting
+	 * for the completion of netlink commands.
+	 */
+	genl_unregister_family(&nbd_genl_family);
+
 	nbd_dbg_close();
 
 	mutex_lock(&nbd_index_mutex);
@@ -2486,7 +2492,6 @@ static void __exit nbd_cleanup(void)
 	destroy_workqueue(nbd_del_wq);
 
 	idr_destroy(&nbd_index_idr);
-	genl_unregister_family(&nbd_genl_family);
 	unregister_blkdev(NBD_MAJOR, "nbd");
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal
  2021-09-03  2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
  2021-09-03  2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
@ 2021-09-03  2:48 ` Hou Tao
  1 sibling, 0 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03  2:48 UTC (permalink / raw)
  To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1

When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.

The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000040
  Oops: 0000 [#1] SMP PTI
  CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: knbd16-recv recv_work [nbd]
  RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
  Call Trace:
   recv_work+0x3b/0xb0 [nbd]
   process_one_work+0x1ed/0x390
   worker_thread+0x4a/0x3d0
   kthread+0x12a/0x150
   ret_from_fork+0x22/0x30

Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).

Also adding a debug message to check the reference counter
of nbd_config during module removal.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 drivers/block/nbd.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 6d78d7d0c15f..2e2eea3d4512 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1470,15 +1470,20 @@ static struct nbd_config *nbd_alloc_config(void)
 {
 	struct nbd_config *config;
 
+	if (!try_module_get(THIS_MODULE))
+		return ERR_PTR(-ENODEV);
+
 	config = kzalloc(sizeof(struct nbd_config), GFP_NOFS);
-	if (!config)
-		return NULL;
+	if (!config) {
+		module_put(THIS_MODULE);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	atomic_set(&config->recv_threads, 0);
 	init_waitqueue_head(&config->recv_wq);
 	init_waitqueue_head(&config->conn_wait);
 	config->blksize = NBD_DEF_BLKSIZE;
 	atomic_set(&config->live_connections, 0);
-	try_module_get(THIS_MODULE);
 	return config;
 }
 
@@ -1505,12 +1510,13 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
 			mutex_unlock(&nbd->config_lock);
 			goto out;
 		}
-		config = nbd->config = nbd_alloc_config();
-		if (!config) {
-			ret = -ENOMEM;
+		config = nbd_alloc_config();
+		if (IS_ERR(config)) {
+			ret = PTR_ERR(config);
 			mutex_unlock(&nbd->config_lock);
 			goto out;
 		}
+		nbd->config = config;
 		refcount_set(&nbd->config_refs, 1);
 		refcount_inc(&nbd->refs);
 		mutex_unlock(&nbd->config_lock);
@@ -1902,13 +1908,14 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 		nbd_put(nbd);
 		return -EINVAL;
 	}
-	config = nbd->config = nbd_alloc_config();
-	if (!nbd->config) {
+	config = nbd_alloc_config();
+	if (IS_ERR(config)) {
 		mutex_unlock(&nbd->config_lock);
 		nbd_put(nbd);
 		printk(KERN_ERR "nbd: couldn't allocate config\n");
-		return -ENOMEM;
+		return PTR_ERR(config);
 	}
+	nbd->config = config;
 	refcount_set(&nbd->config_refs, 1);
 	set_bit(NBD_RT_BOUND, &config->runtime_flags);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-03  2:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03  2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
2021-09-03  2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
2021-09-03  2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.