All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	<linux-block@vger.kernel.org>, <nbd@other.debian.org>
Subject: Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal
Date: Wed, 8 Sep 2021 21:03:28 +0800	[thread overview]
Message-ID: <319b5ef6-3d73-8795-e252-3c35fbe1b5bc@huawei.com> (raw)
In-Reply-To: <730dae5e-5af8-3554-18bf-e22ff576e2b1@huawei.com>

Hi Christoph,

Any comments for this patch ?

On 9/7/2021 11:04 AM, Hou Tao wrote:
> Hi,
>
> On 9/6/2021 6:25 PM, Christoph Hellwig wrote:
>> On Mon, Sep 06, 2021 at 06:08:54PM +0800, Hou Tao wrote:
>>>>> +	if (!try_module_get(THIS_MODULE))
>>>>> +		return ERR_PTR(-ENODEV);
>>>> try_module_get(THIS_MODULE) is an indicator for an unsafe pattern.  If
>>>> we don't already have a reference it could never close the race.
>>>>
>>>> Looking at the callers:
>>>>
>>>>  - nbd_open like all block device operations must have a reference
>>>>    already.
>>> Yes. nbd_open() has already taken a reference in dentry_open().
>>>>  - for nbd_genl_connect I'm not an expert, but given that struct
>>>>    nbd_genl_family has a module member I suspect the networkinh
>>>>    code already takes a reference.
>>> That was my original though, but the fact is netlink code doesn't take a module reference
>>>
>>> in genl_family_rcv_msg_doit() and netlink uses genl_lock_all() to serialize between module removal
>>>
>>> and nbd_connect_genl_ops calling, so I think use try_module_get() is OK here.
>> How it this going to work?  If there was a race you just shortened it,
>> but it can still happen before you call try_module_get.  So I think we
>> need to look into how the netlink calling conventions are supposed to
>> look and understand the issues there first.
>> .
> Let me explain first. The reason it works is due to genl_lock_all() in netlink code.
>
> If the module removal happens before calling try_module_get(), nbd_cleanup() will
>
> call genl_unregister_family() first, and then genl_lock_all(). genl_lock_all() will
>
> prevent ops in nbd_connect_genl_ops() from being called, because the calling
>
> of nbd ops happens in genl_rcv() which needs to acquire cb_lock first.
>
>
> process A                                       process B
>
> module removal
>
> genl_unregister_family()
>
>   genl_lock_all()
>
>     down_write(&cb_lock)
>
>                                                 receive a new netlink message
>
>                                                 genl_rcv()
>
>                                                    // will wait for the removal of nbd ops
>
>                                                    down_read(&cb_lock)
>
> If nbd_alloc_config() happens before the module removal, genl_rcv() must
>
> have been acquired cb_lock & genl_mutex, so nbd_cleanup() will block in
>
> genl_unregister_family(). When nbd_alloc_config() calls try_module_get(),
>
> it will find out the module is dying, so fail nbd_genl_connect().
>
>
> process A                                     process B
>
> a new netlink message
>
> genl_rcv()
>
>   down_read(&cb_lock)
>
>     mutex_lock(&genl_mutex)
>
>       nbd_genl_connect()
>
>         nbd_alloc_config()
>
>                                                module removal
>
>                                                genl_unregister_family
>
>           // module is dying, so fail
>
>           try_module_get()
>
>                                                  genl_lock_all()
>
>                                                    // wait for the completion of nbd ops
>
>                                                    down_write(&cb_lock)
>
> I have checked multiple genl_ops, it seems that the reason why these genl_ops
>
> don't need try_module_get() is that these ops don't create new object through
>
> genl_ops and just control it. However genl_family_rcv_msg_dumpit() will try to
>
> call try_module_get(), but according to the history (6dc878a8ca39 "netlink: add reference of module in netlink_dump_start"),
>
> it is because inet_diag_handler_cmd() will call __netlink_dump_start().
>
> Regards,
>
> Tao
>
>
> .

  reply	other threads:[~2021-09-08 13:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:25 [PATCH v2 0/3] fix races between nbd setup and module removal Hou Tao
2021-09-04 12:25 ` [PATCH v2 1/3] nbd: use pr_err to output error message Hou Tao
2021-09-06  9:27   ` Christoph Hellwig
2021-09-04 12:25 ` [PATCH v2 2/3] nbd: call genl_unregister_family() first in nbd_cleanup() Hou Tao
2021-09-06  9:27   ` Christoph Hellwig
2021-09-04 12:25 ` [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal Hou Tao
2021-09-06  9:30   ` Christoph Hellwig
2021-09-06 10:08     ` Hou Tao
2021-09-06 10:25       ` Christoph Hellwig
2021-09-07  3:04         ` Hou Tao
2021-09-08 13:03           ` Hou Tao [this message]
2021-09-09  6:40           ` Christoph Hellwig
2021-09-13  4:32             ` Hou Tao
2021-09-13 15:25               ` Christoph Hellwig
2021-09-14 11:42               ` Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=319b5ef6-3d73-8795-e252-3c35fbe1b5bc@huawei.com \
    --to=houtao1@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=nbd@other.debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.