* [PATCH -next v3 1/6] nbd: call genl_unregister_family() first in nbd_cleanup()
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:13 ` Josef Bacik
2022-05-21 7:37 ` [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal Yu Kuai
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000098
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:down_write+0x1a/0x50
Call Trace:
start_creating+0x89/0x130
debugfs_create_dir+0x1b/0x130
nbd_start_device+0x13d/0x390 [nbd]
nbd_genl_connect+0x42f/0x748 [nbd]
genl_family_rcv_msg_doit.isra.0+0xec/0x150
genl_rcv_msg+0xe5/0x1e0
netlink_rcv_skb+0x55/0x100
genl_rcv+0x29/0x40
netlink_unicast+0x1a8/0x250
netlink_sendmsg+0x21b/0x430
____sys_sendmsg+0x2a4/0x2d0
___sys_sendmsg+0x81/0xc0
__sys_sendmsg+0x62/0xb0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Modules linked in: nbd(E-)
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index ac8b045c777c..a73e853f5833 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2528,6 +2528,12 @@ static void __exit nbd_cleanup(void)
struct nbd_device *nbd;
LIST_HEAD(del_list);
+ /*
+ * Unregister netlink interface prior to waiting
+ * for the completion of netlink commands.
+ */
+ genl_unregister_family(&nbd_genl_family);
+
nbd_dbg_close();
mutex_lock(&nbd_index_mutex);
@@ -2546,7 +2552,6 @@ static void __exit nbd_cleanup(void)
destroy_workqueue(nbd_del_wq);
idr_destroy(&nbd_index_idr);
- genl_unregister_family(&nbd_genl_family);
unregister_blkdev(NBD_MAJOR, "nbd");
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 1/6] nbd: call genl_unregister_family() first in nbd_cleanup()
2022-05-21 7:37 ` [PATCH -next v3 1/6] nbd: call genl_unregister_family() first in nbd_cleanup() Yu Kuai
@ 2022-05-23 14:13 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:13 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:44PM +0800, Yu Kuai wrote:
> Otherwise there may be race between module removal and the handling of
> netlink command, which can lead to the oops as shown below:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098
> Oops: 0002 [#1] SMP PTI
> CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> RIP: 0010:down_write+0x1a/0x50
> Call Trace:
> start_creating+0x89/0x130
> debugfs_create_dir+0x1b/0x130
> nbd_start_device+0x13d/0x390 [nbd]
> nbd_genl_connect+0x42f/0x748 [nbd]
> genl_family_rcv_msg_doit.isra.0+0xec/0x150
> genl_rcv_msg+0xe5/0x1e0
> netlink_rcv_skb+0x55/0x100
> genl_rcv+0x29/0x40
> netlink_unicast+0x1a8/0x250
> netlink_sendmsg+0x21b/0x430
> ____sys_sendmsg+0x2a4/0x2d0
> ___sys_sendmsg+0x81/0xc0
> __sys_sendmsg+0x62/0xb0
> __x64_sys_sendmsg+0x1f/0x30
> do_syscall_64+0x3b/0xc0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> Modules linked in: nbd(E-)
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
2022-05-21 7:37 ` [PATCH -next v3 1/6] nbd: call genl_unregister_family() first in nbd_cleanup() Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:14 ` Josef Bacik
2022-05-21 7:37 ` [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed Yu Kuai
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.
The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000040
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: knbd16-recv recv_work [nbd]
RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
Call Trace:
recv_work+0x3b/0xb0 [nbd]
process_one_work+0x1ed/0x390
worker_thread+0x4a/0x3d0
kthread+0x12a/0x150
ret_from_fork+0x22/0x30
Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).
Also adding a debug message to check the reference counter
of nbd_config during module removal.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a73e853f5833..2ee1e376d5c4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1518,15 +1518,20 @@ static struct nbd_config *nbd_alloc_config(void)
{
struct nbd_config *config;
+ if (!try_module_get(THIS_MODULE))
+ return ERR_PTR(-ENODEV);
+
config = kzalloc(sizeof(struct nbd_config), GFP_NOFS);
- if (!config)
- return NULL;
+ if (!config) {
+ module_put(THIS_MODULE);
+ return ERR_PTR(-ENOMEM);
+ }
+
atomic_set(&config->recv_threads, 0);
init_waitqueue_head(&config->recv_wq);
init_waitqueue_head(&config->conn_wait);
config->blksize_bits = NBD_DEF_BLKSIZE_BITS;
atomic_set(&config->live_connections, 0);
- try_module_get(THIS_MODULE);
return config;
}
@@ -1553,12 +1558,13 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
mutex_unlock(&nbd->config_lock);
goto out;
}
- config = nbd->config = nbd_alloc_config();
- if (!config) {
- ret = -ENOMEM;
+ config = nbd_alloc_config();
+ if (IS_ERR(config)) {
+ ret = PTR_ERR(config);
mutex_unlock(&nbd->config_lock);
goto out;
}
+ nbd->config = config;
refcount_set(&nbd->config_refs, 1);
refcount_inc(&nbd->refs);
mutex_unlock(&nbd->config_lock);
@@ -1964,13 +1970,14 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
nbd_put(nbd);
return -EINVAL;
}
- config = nbd->config = nbd_alloc_config();
- if (!nbd->config) {
+ config = nbd_alloc_config();
+ if (IS_ERR(config)) {
mutex_unlock(&nbd->config_lock);
nbd_put(nbd);
printk(KERN_ERR "nbd: couldn't allocate config\n");
- return -ENOMEM;
+ return PTR_ERR(config);
}
+ nbd->config = config;
refcount_set(&nbd->config_refs, 1);
set_bit(NBD_RT_BOUND, &config->runtime_flags);
@@ -2543,6 +2550,9 @@ static void __exit nbd_cleanup(void)
while (!list_empty(&del_list)) {
nbd = list_first_entry(&del_list, struct nbd_device, list);
list_del_init(&nbd->list);
+ if (refcount_read(&nbd->config_refs))
+ printk(KERN_ERR "nbd: possibly leaking nbd_config (ref %d)\n",
+ refcount_read(&nbd->config_refs));
if (refcount_read(&nbd->refs) != 1)
printk(KERN_ERR "nbd: possibly leaking a device\n");
nbd_put(nbd);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal
2022-05-21 7:37 ` [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal Yu Kuai
@ 2022-05-23 14:14 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:14 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:45PM +0800, Yu Kuai wrote:
> When nbd module is being removing, nbd_alloc_config() may be
> called concurrently by nbd_genl_connect(), although try_module_get()
> will return false, but nbd_alloc_config() doesn't handle it.
>
> The race may lead to the leak of nbd_config and its related
> resources (e.g, recv_workq) and oops in nbd_read_stat() due
> to the unload of nbd module as shown below:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000040
> Oops: 0000 [#1] SMP PTI
> CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Workqueue: knbd16-recv recv_work [nbd]
> RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
> Call Trace:
> recv_work+0x3b/0xb0 [nbd]
> process_one_work+0x1ed/0x390
> worker_thread+0x4a/0x3d0
> kthread+0x12a/0x150
> ret_from_fork+0x22/0x30
>
> Fixing it by checking the return value of try_module_get()
> in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
> assign nbd->config only when nbd_alloc_config() succeeds to ensure
> the value of nbd->config is binary (valid or NULL).
>
> Also adding a debug message to check the reference counter
> of nbd_config during module removal.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
2022-05-21 7:37 ` [PATCH -next v3 1/6] nbd: call genl_unregister_family() first in nbd_cleanup() Yu Kuai
2022-05-21 7:37 ` [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:12 ` Josef Bacik
2022-05-21 7:37 ` [PATCH -next v3 4/6] nbd: fix io hung while disconnecting device Yu Kuai
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
Otherwise io will hung because request will only be completed if the
cmd has the flag 'NBD_CMD_INFLIGHT'.
Fixes: 07175cb1baf4 ("nbd: make sure request completion won't concurrent")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 2ee1e376d5c4..a0d0910dae2a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -403,13 +403,14 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
if (!mutex_trylock(&cmd->lock))
return BLK_EH_RESET_TIMER;
- if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+ if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
mutex_unlock(&cmd->lock);
return BLK_EH_DONE;
}
if (!refcount_inc_not_zero(&nbd->config_refs)) {
cmd->status = BLK_STS_TIMEOUT;
+ __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
mutex_unlock(&cmd->lock);
goto done;
}
@@ -478,6 +479,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
dev_err_ratelimited(nbd_to_dev(nbd), "Connection timed out\n");
set_bit(NBD_RT_TIMEDOUT, &config->runtime_flags);
cmd->status = BLK_STS_IOERR;
+ __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
mutex_unlock(&cmd->lock);
sock_shutdown(nbd);
nbd_config_put(nbd);
@@ -745,7 +747,7 @@ static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
cmd = blk_mq_rq_to_pdu(req);
mutex_lock(&cmd->lock);
- if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+ if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)",
tag, cmd->status, cmd->flags);
ret = -ENOENT;
@@ -854,8 +856,16 @@ static void recv_work(struct work_struct *work)
}
rq = blk_mq_rq_from_pdu(cmd);
- if (likely(!blk_should_fake_timeout(rq->q)))
- blk_mq_complete_request(rq);
+ if (likely(!blk_should_fake_timeout(rq->q))) {
+ bool complete;
+
+ mutex_lock(&cmd->lock);
+ complete = __test_and_clear_bit(NBD_CMD_INFLIGHT,
+ &cmd->flags);
+ mutex_unlock(&cmd->lock);
+ if (complete)
+ blk_mq_complete_request(rq);
+ }
percpu_ref_put(&q->q_usage_counter);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
2022-05-21 7:37 ` [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed Yu Kuai
@ 2022-05-23 14:12 ` Josef Bacik
2022-05-24 1:07 ` Yu Kuai
0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:12 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:46PM +0800, Yu Kuai wrote:
> Otherwise io will hung because request will only be completed if the
> cmd has the flag 'NBD_CMD_INFLIGHT'.
>
> Fixes: 07175cb1baf4 ("nbd: make sure request completion won't concurrent")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> drivers/block/nbd.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 2ee1e376d5c4..a0d0910dae2a 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -403,13 +403,14 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
> if (!mutex_trylock(&cmd->lock))
> return BLK_EH_RESET_TIMER;
>
> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> mutex_unlock(&cmd->lock);
> return BLK_EH_DONE;
> }
>
> if (!refcount_inc_not_zero(&nbd->config_refs)) {
> cmd->status = BLK_STS_TIMEOUT;
> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
> mutex_unlock(&cmd->lock);
> goto done;
> }
> @@ -478,6 +479,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
> dev_err_ratelimited(nbd_to_dev(nbd), "Connection timed out\n");
> set_bit(NBD_RT_TIMEDOUT, &config->runtime_flags);
> cmd->status = BLK_STS_IOERR;
> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
> mutex_unlock(&cmd->lock);
> sock_shutdown(nbd);
> nbd_config_put(nbd);
> @@ -745,7 +747,7 @@ static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
> cmd = blk_mq_rq_to_pdu(req);
>
> mutex_lock(&cmd->lock);
> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
> dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)",
> tag, cmd->status, cmd->flags);
> ret = -ENOENT;
> @@ -854,8 +856,16 @@ static void recv_work(struct work_struct *work)
> }
>
> rq = blk_mq_rq_from_pdu(cmd);
> - if (likely(!blk_should_fake_timeout(rq->q)))
> - blk_mq_complete_request(rq);
> + if (likely(!blk_should_fake_timeout(rq->q))) {
> + bool complete;
> +
> + mutex_lock(&cmd->lock);
> + complete = __test_and_clear_bit(NBD_CMD_INFLIGHT,
> + &cmd->flags);
> + mutex_unlock(&cmd->lock);
> + if (complete)
> + blk_mq_complete_request(rq);
> + }
I'd rather this be handled in nbd_handle_reply. We should return with it
cleared if it's ready to be completed. Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
2022-05-23 14:12 ` Josef Bacik
@ 2022-05-24 1:07 ` Yu Kuai
2022-05-24 1:51 ` Yu Kuai
0 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-24 1:07 UTC (permalink / raw)
To: Josef Bacik; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
在 2022/05/23 22:12, Josef Bacik 写道:
> On Sat, May 21, 2022 at 03:37:46PM +0800, Yu Kuai wrote:
>> Otherwise io will hung because request will only be completed if the
>> cmd has the flag 'NBD_CMD_INFLIGHT'.
>>
>> Fixes: 07175cb1baf4 ("nbd: make sure request completion won't concurrent")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>> drivers/block/nbd.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 2ee1e376d5c4..a0d0910dae2a 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -403,13 +403,14 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
>> if (!mutex_trylock(&cmd->lock))
>> return BLK_EH_RESET_TIMER;
>>
>> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> mutex_unlock(&cmd->lock);
>> return BLK_EH_DONE;
>> }
>>
>> if (!refcount_inc_not_zero(&nbd->config_refs)) {
>> cmd->status = BLK_STS_TIMEOUT;
>> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>> mutex_unlock(&cmd->lock);
>> goto done;
>> }
>> @@ -478,6 +479,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
>> dev_err_ratelimited(nbd_to_dev(nbd), "Connection timed out\n");
>> set_bit(NBD_RT_TIMEDOUT, &config->runtime_flags);
>> cmd->status = BLK_STS_IOERR;
>> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>> mutex_unlock(&cmd->lock);
>> sock_shutdown(nbd);
>> nbd_config_put(nbd);
>> @@ -745,7 +747,7 @@ static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
>> cmd = blk_mq_rq_to_pdu(req);
>>
>> mutex_lock(&cmd->lock);
>> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>> dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)",
>> tag, cmd->status, cmd->flags);
>> ret = -ENOENT;
>> @@ -854,8 +856,16 @@ static void recv_work(struct work_struct *work)
>> }
>>
>> rq = blk_mq_rq_from_pdu(cmd);
>> - if (likely(!blk_should_fake_timeout(rq->q)))
>> - blk_mq_complete_request(rq);
>> + if (likely(!blk_should_fake_timeout(rq->q))) {
>> + bool complete;
>> +
>> + mutex_lock(&cmd->lock);
>> + complete = __test_and_clear_bit(NBD_CMD_INFLIGHT,
>> + &cmd->flags);
>> + mutex_unlock(&cmd->lock);
>> + if (complete)
>> + blk_mq_complete_request(rq);
>> + }
>
> I'd rather this be handled in nbd_handle_reply. We should return with it
> cleared if it's ready to be completed. Thanks,
Hi,
Thanks for your advice, I'll do that in next version. I'll still have to
hold the lock to set the bit again in case blk_should_fake_timeout()
pass...
Thanks,
Kuai
>
> Josef
> .
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
2022-05-24 1:07 ` Yu Kuai
@ 2022-05-24 1:51 ` Yu Kuai
0 siblings, 0 replies; 16+ messages in thread
From: Yu Kuai @ 2022-05-24 1:51 UTC (permalink / raw)
To: Josef Bacik; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
在 2022/05/24 9:07, Yu Kuai 写道:
> 在 2022/05/23 22:12, Josef Bacik 写道:
>> On Sat, May 21, 2022 at 03:37:46PM +0800, Yu Kuai wrote:
>>> Otherwise io will hung because request will only be completed if the
>>> cmd has the flag 'NBD_CMD_INFLIGHT'.
>>>
>>> Fixes: 07175cb1baf4 ("nbd: make sure request completion won't
>>> concurrent")
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>> drivers/block/nbd.c | 18 ++++++++++++++----
>>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>> index 2ee1e376d5c4..a0d0910dae2a 100644
>>> --- a/drivers/block/nbd.c
>>> +++ b/drivers/block/nbd.c
>>> @@ -403,13 +403,14 @@ static enum blk_eh_timer_return
>>> nbd_xmit_timeout(struct request *req,
>>> if (!mutex_trylock(&cmd->lock))
>>> return BLK_EH_RESET_TIMER;
>>> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>>> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>>> mutex_unlock(&cmd->lock);
>>> return BLK_EH_DONE;
>>> }
>>> if (!refcount_inc_not_zero(&nbd->config_refs)) {
>>> cmd->status = BLK_STS_TIMEOUT;
>>> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>>> mutex_unlock(&cmd->lock);
>>> goto done;
>>> }
>>> @@ -478,6 +479,7 @@ static enum blk_eh_timer_return
>>> nbd_xmit_timeout(struct request *req,
>>> dev_err_ratelimited(nbd_to_dev(nbd), "Connection timed out\n");
>>> set_bit(NBD_RT_TIMEDOUT, &config->runtime_flags);
>>> cmd->status = BLK_STS_IOERR;
>>> + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
>>> mutex_unlock(&cmd->lock);
>>> sock_shutdown(nbd);
>>> nbd_config_put(nbd);
>>> @@ -745,7 +747,7 @@ static struct nbd_cmd *nbd_handle_reply(struct
>>> nbd_device *nbd, int index,
>>> cmd = blk_mq_rq_to_pdu(req);
>>> mutex_lock(&cmd->lock);
>>> - if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>>> + if (!test_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
>>> dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d
>>> (status %u flags %lu)",
>>> tag, cmd->status, cmd->flags);
>>> ret = -ENOENT;
>>> @@ -854,8 +856,16 @@ static void recv_work(struct work_struct *work)
>>> }
>>> rq = blk_mq_rq_from_pdu(cmd);
>>> - if (likely(!blk_should_fake_timeout(rq->q)))
>>> - blk_mq_complete_request(rq);
>>> + if (likely(!blk_should_fake_timeout(rq->q))) {
>>> + bool complete;
>>> +
>>> + mutex_lock(&cmd->lock);
>>> + complete = __test_and_clear_bit(NBD_CMD_INFLIGHT,
>>> + &cmd->flags);
>>> + mutex_unlock(&cmd->lock);
>>> + if (complete)
>>> + blk_mq_complete_request(rq);
>>> + }
>>
>> I'd rather this be handled in nbd_handle_reply. We should return with it
>> cleared if it's ready to be completed. Thanks,
> Hi,
>
> Thanks for your advice, I'll do that in next version. I'll still have to
> hold the lock to set the bit again in case blk_should_fake_timeout()
> pass...
Hi, Josef
I just found out that this way is problematic:
t1: t2:
recv_work
nbd_handle_reply
__clear_bit
nbd_xmit_timeout
test_bit(NBD_CMD_INFLIGHT, &cmd->flags) -> fail
return BLK_EH_DONE -> rq can't complete
blk_should_fake_timeout -> true
__set_bit
__clear_bit and then __set_bit from recv_work leaves a window, and
concurrent nbd_xmit_timeout() may lead to that request can't be
completed through both timeout and recv_work().
Do you think it's ok to keep the current implementation with some
comments to explain the above scenario?
Thanks,
Kuai
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH -next v3 4/6] nbd: fix io hung while disconnecting device
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
` (2 preceding siblings ...)
2022-05-21 7:37 ` [PATCH -next v3 3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:15 ` Josef Bacik
2022-05-21 7:37 ` [PATCH -next v3 5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Yu Kuai
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
In our tests, "qemu-nbd" triggers a io hung:
INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x480/0x1050
? _raw_spin_lock_irqsave+0x3e/0xb0
schedule+0x9c/0x1b0
blk_mq_freeze_queue_wait+0x9d/0xf0
? ipi_rseq+0x70/0x70
blk_mq_freeze_queue+0x2b/0x40
nbd_add_socket+0x6b/0x270 [nbd]
nbd_ioctl+0x383/0x510 [nbd]
blkdev_ioctl+0x18e/0x3e0
__x64_sys_ioctl+0xac/0x120
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0
"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:
block nbd0: Send disconnect failed -32
Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1543fd("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.
Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a0d0910dae2a..ec736cc52134 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1429,7 +1429,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd)
static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
- sock_shutdown(nbd);
+ nbd_clear_sock(nbd);
__invalidate_device(bdev, true);
nbd_bdev_reset(nbd);
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 4/6] nbd: fix io hung while disconnecting device
2022-05-21 7:37 ` [PATCH -next v3 4/6] nbd: fix io hung while disconnecting device Yu Kuai
@ 2022-05-23 14:15 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:15 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:47PM +0800, Yu Kuai wrote:
> In our tests, "qemu-nbd" triggers a io hung:
>
> INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
> Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000
> Call Trace:
> <TASK>
> __schedule+0x480/0x1050
> ? _raw_spin_lock_irqsave+0x3e/0xb0
> schedule+0x9c/0x1b0
> blk_mq_freeze_queue_wait+0x9d/0xf0
> ? ipi_rseq+0x70/0x70
> blk_mq_freeze_queue+0x2b/0x40
> nbd_add_socket+0x6b/0x270 [nbd]
> nbd_ioctl+0x383/0x510 [nbd]
> blkdev_ioctl+0x18e/0x3e0
> __x64_sys_ioctl+0xac/0x120
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fd8ff706577
> RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
> RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
> RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
> R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
> R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0
>
> "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
> message was found:
>
> block nbd0: Send disconnect failed -32
>
> Which indicate that something is wrong with the server. Then,
> "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
> requests after commit 2516ab1543fd("nbd: only clear the queue on device
> teardown"). And in the meantime, request can't complete through timeout
> because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
> means such request will never be completed in this situation.
>
> Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
> complete multiple times, switch back to call nbd_clear_sock() in
> nbd_clear_sock_ioctl(), so that inflight requests can be cleared.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH -next v3 5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
` (3 preceding siblings ...)
2022-05-21 7:37 ` [PATCH -next v3 4/6] nbd: fix io hung while disconnecting device Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:15 ` Josef Bacik
2022-05-21 7:37 ` [PATCH -next v3 6/6] nbd: use pr_err to output error message Yu Kuai
2022-05-28 12:20 ` [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Jens Axboe
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
From: Zhang Wensheng <zhangwensheng5@huawei.com>
When 'index' is a big numbers, it may become negative which forced
to 'int'. then 'index << part_shift' might overflow to a positive
value that is not greater than '0xfffff', then sysfs might complains
about duplicate creation. Because of this, move the 'index' judgment
to the front will fix it and be better.
Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices")
Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index ec736cc52134..349bc3da878d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1814,17 +1814,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
refcount_set(&nbd->refs, 0);
INIT_LIST_HEAD(&nbd->list);
disk->major = NBD_MAJOR;
-
- /* Too big first_minor can cause duplicate creation of
- * sysfs files/links, since index << part_shift might overflow, or
- * MKDEV() expect that the max bits of first_minor is 20.
- */
disk->first_minor = index << part_shift;
- if (disk->first_minor < index || disk->first_minor > MINORMASK) {
- err = -EINVAL;
- goto out_free_work;
- }
-
disk->minors = 1 << part_shift;
disk->fops = &nbd_fops;
disk->private_data = nbd;
@@ -1929,8 +1919,19 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
if (!netlink_capable(skb, CAP_SYS_ADMIN))
return -EPERM;
- if (info->attrs[NBD_ATTR_INDEX])
+ if (info->attrs[NBD_ATTR_INDEX]) {
index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
+
+ /*
+ * Too big first_minor can cause duplicate creation of
+ * sysfs files/links, since index << part_shift might overflow, or
+ * MKDEV() expect that the max bits of first_minor is 20.
+ */
+ if (index < 0 || index > MINORMASK >> part_shift) {
+ printk(KERN_ERR "nbd: illegal input index %d\n", index);
+ return -EINVAL;
+ }
+ }
if (!info->attrs[NBD_ATTR_SOCKETS]) {
printk(KERN_ERR "nbd: must specify at least one socket\n");
return -EINVAL;
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
2022-05-21 7:37 ` [PATCH -next v3 5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Yu Kuai
@ 2022-05-23 14:15 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:15 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:48PM +0800, Yu Kuai wrote:
> From: Zhang Wensheng <zhangwensheng5@huawei.com>
>
> When 'index' is a big numbers, it may become negative which forced
> to 'int'. then 'index << part_shift' might overflow to a positive
> value that is not greater than '0xfffff', then sysfs might complains
> about duplicate creation. Because of this, move the 'index' judgment
> to the front will fix it and be better.
>
> Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices")
> Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()")
> Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH -next v3 6/6] nbd: use pr_err to output error message
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
` (4 preceding siblings ...)
2022-05-21 7:37 ` [PATCH -next v3 5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add() Yu Kuai
@ 2022-05-21 7:37 ` Yu Kuai
2022-05-23 14:16 ` Josef Bacik
2022-05-28 12:20 ` [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Jens Axboe
6 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2022-05-21 7:37 UTC (permalink / raw)
To: josef, axboe, yukuai3, ming.lei; +Cc: linux-block, nbd, linux-kernel, yi.zhang
Instead of using the long printk(KERN_ERR "nbd: ...") to
output error message, defining pr_fmt and using
the short pr_err("") to do that. The replacemen is done
by using the following command:
sed -i 's/printk(KERN_ERR "nbd: /pr_err("/g' \
drivers/block/nbd.c
This patch also rewrap to 80 columns where possible.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/block/nbd.c | 40 ++++++++++++++++++----------------------
1 file changed, 18 insertions(+), 22 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 349bc3da878d..07f3c139a3d7 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1928,16 +1928,16 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
* MKDEV() expect that the max bits of first_minor is 20.
*/
if (index < 0 || index > MINORMASK >> part_shift) {
- printk(KERN_ERR "nbd: illegal input index %d\n", index);
+ pr_err("illegal input index %d\n", index);
return -EINVAL;
}
}
if (!info->attrs[NBD_ATTR_SOCKETS]) {
- printk(KERN_ERR "nbd: must specify at least one socket\n");
+ pr_err("must specify at least one socket\n");
return -EINVAL;
}
if (!info->attrs[NBD_ATTR_SIZE_BYTES]) {
- printk(KERN_ERR "nbd: must specify a size in bytes for the device\n");
+ pr_err("must specify a size in bytes for the device\n");
return -EINVAL;
}
again:
@@ -1973,7 +1973,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
nbd_put(nbd);
if (index == -1)
goto again;
- printk(KERN_ERR "nbd: nbd%d already in use\n", index);
+ pr_err("nbd%d already in use\n", index);
return -EBUSY;
}
if (WARN_ON(nbd->config)) {
@@ -1985,7 +1985,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(config)) {
mutex_unlock(&nbd->config_lock);
nbd_put(nbd);
- printk(KERN_ERR "nbd: couldn't allocate config\n");
+ pr_err("couldn't allocate config\n");
return PTR_ERR(config);
}
nbd->config = config;
@@ -2041,7 +2041,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
struct nlattr *socks[NBD_SOCK_MAX+1];
if (nla_type(attr) != NBD_SOCK_ITEM) {
- printk(KERN_ERR "nbd: socks must be embedded in a SOCK_ITEM attr\n");
+ pr_err("socks must be embedded in a SOCK_ITEM attr\n");
ret = -EINVAL;
goto out;
}
@@ -2050,7 +2050,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
nbd_sock_policy,
info->extack);
if (ret != 0) {
- printk(KERN_ERR "nbd: error processing sock list\n");
+ pr_err("error processing sock list\n");
ret = -EINVAL;
goto out;
}
@@ -2122,7 +2122,7 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
return -EPERM;
if (!info->attrs[NBD_ATTR_INDEX]) {
- printk(KERN_ERR "nbd: must specify an index to disconnect\n");
+ pr_err("must specify an index to disconnect\n");
return -EINVAL;
}
index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
@@ -2130,14 +2130,12 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
nbd = idr_find(&nbd_index_idr, index);
if (!nbd) {
mutex_unlock(&nbd_index_mutex);
- printk(KERN_ERR "nbd: couldn't find device at index %d\n",
- index);
+ pr_err("couldn't find device at index %d\n", index);
return -EINVAL;
}
if (!refcount_inc_not_zero(&nbd->refs)) {
mutex_unlock(&nbd_index_mutex);
- printk(KERN_ERR "nbd: device at index %d is going down\n",
- index);
+ pr_err("device at index %d is going down\n", index);
return -EINVAL;
}
mutex_unlock(&nbd_index_mutex);
@@ -2162,7 +2160,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
return -EPERM;
if (!info->attrs[NBD_ATTR_INDEX]) {
- printk(KERN_ERR "nbd: must specify a device to reconfigure\n");
+ pr_err("must specify a device to reconfigure\n");
return -EINVAL;
}
index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
@@ -2170,8 +2168,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
nbd = idr_find(&nbd_index_idr, index);
if (!nbd) {
mutex_unlock(&nbd_index_mutex);
- printk(KERN_ERR "nbd: couldn't find a device at index %d\n",
- index);
+ pr_err("couldn't find a device at index %d\n", index);
return -EINVAL;
}
if (nbd->backend) {
@@ -2192,8 +2189,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
}
if (!refcount_inc_not_zero(&nbd->refs)) {
mutex_unlock(&nbd_index_mutex);
- printk(KERN_ERR "nbd: device at index %d is going down\n",
- index);
+ pr_err("device at index %d is going down\n", index);
return -EINVAL;
}
mutex_unlock(&nbd_index_mutex);
@@ -2257,7 +2253,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
struct nlattr *socks[NBD_SOCK_MAX+1];
if (nla_type(attr) != NBD_SOCK_ITEM) {
- printk(KERN_ERR "nbd: socks must be embedded in a SOCK_ITEM attr\n");
+ pr_err("socks must be embedded in a SOCK_ITEM attr\n");
ret = -EINVAL;
goto out;
}
@@ -2266,7 +2262,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
nbd_sock_policy,
info->extack);
if (ret != 0) {
- printk(KERN_ERR "nbd: error processing sock list\n");
+ pr_err("error processing sock list\n");
ret = -EINVAL;
goto out;
}
@@ -2483,7 +2479,7 @@ static int __init nbd_init(void)
BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
if (max_part < 0) {
- printk(KERN_ERR "nbd: max_part must be >= 0\n");
+ pr_err("max_part must be >= 0\n");
return -EINVAL;
}
@@ -2562,10 +2558,10 @@ static void __exit nbd_cleanup(void)
nbd = list_first_entry(&del_list, struct nbd_device, list);
list_del_init(&nbd->list);
if (refcount_read(&nbd->config_refs))
- printk(KERN_ERR "nbd: possibly leaking nbd_config (ref %d)\n",
+ pr_err("possibly leaking nbd_config (ref %d)\n",
refcount_read(&nbd->config_refs));
if (refcount_read(&nbd->refs) != 1)
- printk(KERN_ERR "nbd: possibly leaking a device\n");
+ pr_err("possibly leaking a device\n");
nbd_put(nbd);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 6/6] nbd: use pr_err to output error message
2022-05-21 7:37 ` [PATCH -next v3 6/6] nbd: use pr_err to output error message Yu Kuai
@ 2022-05-23 14:16 ` Josef Bacik
0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-23 14:16 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, ming.lei, linux-block, nbd, linux-kernel, yi.zhang
On Sat, May 21, 2022 at 03:37:49PM +0800, Yu Kuai wrote:
> Instead of using the long printk(KERN_ERR "nbd: ...") to
> output error message, defining pr_fmt and using
> the short pr_err("") to do that. The replacemen is done
> by using the following command:
>
> sed -i 's/printk(KERN_ERR "nbd: /pr_err("/g' \
> drivers/block/nbd.c
>
> This patch also rewrap to 80 columns where possible.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH -next v3 0/6] nbd: bugfix and cleanup patches
2022-05-21 7:37 [PATCH -next v3 0/6] nbd: bugfix and cleanup patches Yu Kuai
` (5 preceding siblings ...)
2022-05-21 7:37 ` [PATCH -next v3 6/6] nbd: use pr_err to output error message Yu Kuai
@ 2022-05-28 12:20 ` Jens Axboe
6 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2022-05-28 12:20 UTC (permalink / raw)
To: yukuai3, ming.lei, josef; +Cc: linux-block, nbd, linux-kernel, yi.zhang
On Sat, 21 May 2022 15:37:43 +0800, Yu Kuai wrote:
> Changes in v3:
> - rewrap to 80 columns where possible in patch 6
> Changes in v2:
> - in patch 3, instead of clear and then reset the flag if rq is not
> completed, test first and clear if rq is going to complete.
>
> path 1-2 fix races between nbd setup and module removal.
> patch 3 fix io can't be completed in some error path.
> patch 4 fix io hung when disconnecting failed.
> patch 5 fix sysfs warning about duplicate creation.
> patch 6 use pr_err to output error message.
>
> [...]
Applied, thanks!
[1/6] nbd: call genl_unregister_family() first in nbd_cleanup()
commit: 06c4da89c24e7023ea448cadf8e9daf06a0aae6e
[2/6] nbd: fix race between nbd_alloc_config() and module removal
commit: c55b2b983b0fa012942c3eb16384b2b722caa810
[3/6] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
commit: 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
[4/6] nbd: fix io hung while disconnecting device
commit: 09dadb5985023e27d4740ebd17e6fea4640110e5
[5/6] nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
commit: 858f1bf65d3d9c00b5e2d8ca87dc79ed88267c98
[6/6] nbd: use pr_err to output error message
commit: 1243172d5894e2d8f277ee3c278180792de5c521
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread