* [PATCH 0/2] fix races between nbd setup and module removal
@ 2021-09-03 2:48 Hou Tao
2021-09-03 2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
2021-09-03 2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
0 siblings, 2 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03 2:48 UTC (permalink / raw)
To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1
Hi,
The patch series aims to fix the races between nbd setup and module
removal which will lead to oops. Patch #1 serializes the concurrently
calling of nbd_genl_connect() and nbd_cleanup(), and patch #2 fixes
race between nbd_alloc_config() and nbd_cleanup().
Any comments are welcome.
Regards,
Tao
Hou Tao (2):
nbd: call genl_unregister_family() first in nbd_cleanup()
nbd: fix race between nbd_alloc_config() and module removal
drivers/block/nbd.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
--
2.29.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup()
2021-09-03 2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
@ 2021-09-03 2:48 ` Hou Tao
2021-09-03 2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
1 sibling, 0 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03 2:48 UTC (permalink / raw)
To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1
Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000098
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:down_write+0x1a/0x50
Call Trace:
start_creating+0x89/0x130
debugfs_create_dir+0x1b/0x130
nbd_start_device+0x13d/0x390 [nbd]
nbd_genl_connect+0x42f/0x748 [nbd]
genl_family_rcv_msg_doit.isra.0+0xec/0x150
genl_rcv_msg+0xe5/0x1e0
netlink_rcv_skb+0x55/0x100
genl_rcv+0x29/0x40
netlink_unicast+0x1a8/0x250
netlink_sendmsg+0x21b/0x430
____sys_sendmsg+0x2a4/0x2d0
___sys_sendmsg+0x81/0xc0
__sys_sendmsg+0x62/0xb0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Modules linked in: nbd(E-)
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
v2:
* update comment and commit message (suggested by eblake@redhat.com)
---
drivers/block/nbd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..6d78d7d0c15f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2468,6 +2468,12 @@ static void __exit nbd_cleanup(void)
struct nbd_device *nbd;
LIST_HEAD(del_list);
+ /*
+ * Unregister netlink interface prior to waiting
+ * for the completion of netlink commands.
+ */
+ genl_unregister_family(&nbd_genl_family);
+
nbd_dbg_close();
mutex_lock(&nbd_index_mutex);
@@ -2486,7 +2492,6 @@ static void __exit nbd_cleanup(void)
destroy_workqueue(nbd_del_wq);
idr_destroy(&nbd_index_idr);
- genl_unregister_family(&nbd_genl_family);
unregister_blkdev(NBD_MAJOR, "nbd");
}
--
2.29.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal
2021-09-03 2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
2021-09-03 2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
@ 2021-09-03 2:48 ` Hou Tao
1 sibling, 0 replies; 3+ messages in thread
From: Hou Tao @ 2021-09-03 2:48 UTC (permalink / raw)
To: Josef Bacik, Jens Axboe; +Cc: Christoph Hellwig, linux-block, nbd, houtao1
When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.
The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000040
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: knbd16-recv recv_work [nbd]
RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
Call Trace:
recv_work+0x3b/0xb0 [nbd]
process_one_work+0x1ed/0x390
worker_thread+0x4a/0x3d0
kthread+0x12a/0x150
ret_from_fork+0x22/0x30
Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).
Also adding a debug message to check the reference counter
of nbd_config during module removal.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
drivers/block/nbd.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 6d78d7d0c15f..2e2eea3d4512 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1470,15 +1470,20 @@ static struct nbd_config *nbd_alloc_config(void)
{
struct nbd_config *config;
+ if (!try_module_get(THIS_MODULE))
+ return ERR_PTR(-ENODEV);
+
config = kzalloc(sizeof(struct nbd_config), GFP_NOFS);
- if (!config)
- return NULL;
+ if (!config) {
+ module_put(THIS_MODULE);
+ return ERR_PTR(-ENOMEM);
+ }
+
atomic_set(&config->recv_threads, 0);
init_waitqueue_head(&config->recv_wq);
init_waitqueue_head(&config->conn_wait);
config->blksize = NBD_DEF_BLKSIZE;
atomic_set(&config->live_connections, 0);
- try_module_get(THIS_MODULE);
return config;
}
@@ -1505,12 +1510,13 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
mutex_unlock(&nbd->config_lock);
goto out;
}
- config = nbd->config = nbd_alloc_config();
- if (!config) {
- ret = -ENOMEM;
+ config = nbd_alloc_config();
+ if (IS_ERR(config)) {
+ ret = PTR_ERR(config);
mutex_unlock(&nbd->config_lock);
goto out;
}
+ nbd->config = config;
refcount_set(&nbd->config_refs, 1);
refcount_inc(&nbd->refs);
mutex_unlock(&nbd->config_lock);
@@ -1902,13 +1908,14 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
nbd_put(nbd);
return -EINVAL;
}
- config = nbd->config = nbd_alloc_config();
- if (!nbd->config) {
+ config = nbd_alloc_config();
+ if (IS_ERR(config)) {
mutex_unlock(&nbd->config_lock);
nbd_put(nbd);
printk(KERN_ERR "nbd: couldn't allocate config\n");
- return -ENOMEM;
+ return PTR_ERR(config);
}
+ nbd->config = config;
refcount_set(&nbd->config_refs, 1);
set_bit(NBD_RT_BOUND, &config->runtime_flags);
--
2.29.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-03 2:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 2:48 [PATCH 0/2] fix races between nbd setup and module removal Hou Tao
2021-09-03 2:48 ` [PATCH 1/2] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
2021-09-03 2:48 ` [PATCH 2/2] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.