linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	<linux-block@vger.kernel.org>, <nbd@other.debian.org>
Subject: Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal
Date: Mon, 6 Sep 2021 18:08:54 +0800	[thread overview]
Message-ID: <ce3e1ea8-ebda-4372-42ce-e8a4b2d12514@huawei.com> (raw)
In-Reply-To: <20210906093051.GC30790@lst.de>

Hi,

On 9/6/2021 5:30 PM, Christoph Hellwig wrote:
> On Sat, Sep 04, 2021 at 08:25:19PM +0800, Hou Tao wrote:
>> When nbd module is being removing, nbd_alloc_config() may be
>> called concurrently by nbd_genl_connect(), although try_module_get()
>> will return false, but nbd_alloc_config() doesn't handle it.
>>
>> The race may lead to the leak of nbd_config and its related
>> resources (e.g, recv_workq) and oops in nbd_read_stat() due
>> to the unload of nbd module as shown below:
>>
>>   BUG: kernel NULL pointer dereference, address: 0000000000000040
>>   Oops: 0000 [#1] SMP PTI
>>   CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>   Workqueue: knbd16-recv recv_work [nbd]
>>   RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
>>   Call Trace:
>>    recv_work+0x3b/0xb0 [nbd]
>>    process_one_work+0x1ed/0x390
>>    worker_thread+0x4a/0x3d0
>>    kthread+0x12a/0x150
>>    ret_from_fork+0x22/0x30
>>
>> Fixing it by checking the return value of try_module_get()
>> in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
>> assign nbd->config only when nbd_alloc_config() succeeds to ensure
>> the value of nbd->config is binary (valid or NULL).
>>
>> Also adding a debug message to check the reference counter
>> of nbd_config during module removal.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  drivers/block/nbd.c | 28 +++++++++++++++++++---------
>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index cedd3648e1a7..fa6c069b79dc 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -1473,15 +1473,20 @@ static struct nbd_config *nbd_alloc_config(void)
>>  {
>>  	struct nbd_config *config;
>>  
>> +	if (!try_module_get(THIS_MODULE))
>> +		return ERR_PTR(-ENODEV);
> try_module_get(THIS_MODULE) is an indicator for an unsafe pattern.  If
> we don't already have a reference it could never close the race.
>
> Looking at the callers:
>
>  - nbd_open like all block device operations must have a reference
>    already.
Yes. nbd_open() has already taken a reference in dentry_open().
>  - for nbd_genl_connect I'm not an expert, but given that struct
>    nbd_genl_family has a module member I suspect the networkinh
>    code already takes a reference.

That was my original though, but the fact is netlink code doesn't take a module reference

in genl_family_rcv_msg_doit() and netlink uses genl_lock_all() to serialize between module removal

and nbd_connect_genl_ops calling, so I think use try_module_get() is OK here.

Regards,

Tao


> So this should be able to use __module_get.

> .

  reply	other threads:[~2021-09-06 10:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:25 [PATCH v2 0/3] fix races between nbd setup and module removal Hou Tao
2021-09-04 12:25 ` [PATCH v2 1/3] nbd: use pr_err to output error message Hou Tao
2021-09-06  9:27   ` Christoph Hellwig
2021-09-04 12:25 ` [PATCH v2 2/3] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
2021-09-06  9:27   ` Christoph Hellwig
2021-09-04 12:25 ` [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
2021-09-06  9:30   ` Christoph Hellwig
2021-09-06 10:08     ` Hou Tao [this message]
2021-09-06 10:25       ` Christoph Hellwig
2021-09-07  3:04         ` Hou Tao
2021-09-08 13:03           ` Hou Tao
2021-09-09  6:40           ` Christoph Hellwig
2021-09-13  4:32             ` Hou Tao
2021-09-13 15:25               ` Christoph Hellwig
2021-09-14 11:42               ` Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce3e1ea8-ebda-4372-42ce-e8a4b2d12514@huawei.com \
    --to=houtao1@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=nbd@other.debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).