From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752828AbcFOGc3 (ORCPT ); Wed, 15 Jun 2016 02:32:29 -0400 Received: from metis.ext.4.pengutronix.de ([92.198.50.35]:54901 "EHLO metis.ext.4.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751529AbcFOGas (ORCPT ); Wed, 15 Jun 2016 02:30:48 -0400 From: Markus Pargmann To: Pranay Srivastava Cc: nbd-general@lists.sourceforge.net, linux-kernel@vger.kernel.org, Wouter Verhelst Subject: Re: [PATCH v2 4/5]nbd: make nbd device wait for its users. Date: Wed, 15 Jun 2016 08:30:45 +0200 Message-ID: <11733279.HlKjcK63G4@adelgunde> User-Agent: KMail/4.14.1 (Linux/4.5.0-0.bpo.2-amd64; KDE/4.14.2; x86_64; ; ) In-Reply-To: References: <3898019.JRqDjBPssX@adelgunde> <2280543.pBfpMHAWFW@adelgunde> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart7409402.DglFrlgW9q"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-SA-Exim-Connect-IP: 2001:67c:670:100:a61f:72ff:fe68:75ba X-SA-Exim-Mail-From: mpa@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --nextPart7409402.DglFrlgW9q Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Hi Pranay, On Tuesday 14 June 2016 15:03:40 Pranay Srivastava wrote: > Hi Markus, >=20 > On Tue, Jun 14, 2016 at 2:29 PM, Markus Pargmann = wrote: > > > > On Thursday 02 June 2016 13:25:00 Pranay Kr. Srivastava wrote: > > > When a timeout occurs or a recv fails, then > > > instead of abruplty killing nbd block device > > > wait for it's users to finish. > > > > > > This is more required when filesystem(s) like > > > ext2 or ext3 don't expect their buffer heads to > > > disappear while the filesystem is mounted. > > > > > > Each open of a nbd device is refcounted, while > > > the userland program [nbd-client] doing the > > > NBD_DO_IT ioctl would now wait for any other users > > > of this device before invalidating the nbd device. > > > > > > Signed-off-by: Pranay Kr. Srivastava > > > --- > > > drivers/block/nbd.c | 58 +++++++++++++++++++++++++++++++++++++++= ++++++++++++++ > > > 1 file changed, 58 insertions(+) > > > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > > > index d1d898d..4da40dc 100644 > > > --- a/drivers/block/nbd.c > > > +++ b/drivers/block/nbd.c > > > @@ -70,10 +70,13 @@ struct nbd_device { > > > #if IS_ENABLED(CONFIG_DEBUG_FS) > > > struct dentry *dbg_dir; > > > #endif > > > + atomic_t inuse; > > > /* > > > *This is specifically for calling sock_shutdown, for now. > > > */ > > > struct work_struct ws_shutdown; > > > + struct kref users; > > > + struct completion user_completion; > > > }; > > > > > > #if IS_ENABLED(CONFIG_DEBUG_FS) > > > @@ -104,6 +107,7 @@ static DEFINE_SPINLOCK(nbd_lock); > > > * Shutdown function for nbd_dev work struct. > > > */ > > > static void nbd_ws_func_shutdown(struct work_struct *); > > > +static void nbd_kref_release(struct kref *); > > > > > > static inline struct device *nbd_to_dev(struct nbd_device *nbd) > > > { > > > @@ -682,6 +686,8 @@ static void nbd_reset(struct nbd_device *nbd)= > > > nbd->flags =3D 0; > > > nbd->xmit_timeout =3D 0; > > > INIT_WORK(&nbd->ws_shutdown, nbd_ws_func_shutdown); > > > + init_completion(&nbd->user_completion); > > > + kref_init(&nbd->users); > > > queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->qu= eue); > > > del_timer_sync(&nbd->timeout_timer); > > > } > > > @@ -815,6 +821,14 @@ static int __nbd_ioctl(struct block_device *= bdev, struct nbd_device *nbd, > > > kthread_stop(thread); > > > > > > sock_shutdown(nbd); > > > + /* > > > + * kref_init initializes with ref count as 1, > > > + * nbd_client, or the user-land program executing > > > + * this ioctl will make the refcount to 2[at least]= > > > + * so subtracting 2 from refcount. > > > + */ > > > + kref_sub(&nbd->users, 2, nbd_kref_release); > > > > Why don't you use a kref_put? >=20 > Ok, so I'll try to explain as I've understood the problem. >=20 > When the module is loaded the kref is initialized to 1. >=20 > Suppose now, someone has started nbd-client [nbdC-1] , then this > nbd-client will increase the ref count to 2. So far so good... >=20 > Now let's say this device is being shutdown via nbd-client[nbdC-2]. >=20 > nbdC-1 will subtract the refcount by two, it has to do in NBD_DO_IT > since device file will not > be closed until after ioctl is over, and it'll wait_for_completion. >=20 > nbdC-2 now closes it's use of device file, this makes the refcount as= > zero and completion > is triggered with nbdC-1 completed. >=20 > Now we don't want to trigger kref_put when nbdC-1 closes the device > file so kref_put needs > to be conditional in this regard so for that in_use is used. >=20 >=20 > > > > > + wait_for_completion(&nbd->user_completion); > > > mutex_lock(&nbd->tx_lock); > > > nbd_clear_que(nbd); > > > kill_bdev(bdev); > > > @@ -865,13 +879,56 @@ static int nbd_ioctl(struct block_device *b= dev, fmode_t mode, > > > > > > return error; > > > } > > > +static void nbd_kref_release(struct kref *kref_users) > > > +{ > > > + struct nbd_device *nbd =3D container_of(kref_users, struct = nbd_device, > > > + users); > > > > Not indented to opening bracket. > > > > > + pr_debug("Releasing kref [%s]\n", __func__); > > > + atomic_set(&nbd->inuse, 0); > > > + complete(&nbd->user_completion); > > > + > > > +} > > > + > > > +static int nbd_open(struct block_device *bdev, fmode_t mode) > > > +{ > > > + struct nbd_device *nbd_dev =3D bdev->bd_disk->private_data;= > > > + > > > + if (kref_get_unless_zero(&nbd_dev->users)) > > > + atomic_set(&nbd_dev->inuse, 1); > > > + > > > + pr_debug("Opening nbd_dev %s. Active users =3D %u\n", > > > + bdev->bd_disk->disk_name, > > > + atomic_read(&nbd_dev->users.refcount) - 1);= > > > > Indent to opening bracket. > > > > > + return 0; > > > +} > > > + > > > +static void nbd_release(struct gendisk *disk, fmode_t mode) > > > +{ > > > + struct nbd_device *nbd_dev =3D disk->private_data; > > > + /* > > > + *kref_init initializes ref count to 1, so we > > > + *we check for refcount to be 2 for a final put. > > > + * > > > + *kref needs to be re-initialized just here as the > > > + *other process holding it must see the ref count as 2. > > > + */ > > > + if (atomic_read(&nbd_dev->inuse)) > > > + kref_put(&nbd_dev->users, nbd_kref_release); > > >=20 > > What is this inuse atomic for? Everyone that releases the nbd devic= e > > will need to execute a kref_put(). >=20 > To do away with inuse, perhaps we can do >=20 > kref_get just before leaving the NBD_DO_IT? so that when device file > is closed everyone > would do a kref_put? However there's a small race window while the > kref is being initialized, > and another process [not just nbd-client] is trying to open the devic= e. >=20 > Do you think it's better to do this by introducing a spin_lock instea= d > of atomic? >=20 > Let me know if my understanding is correct. Thanks for the explanations. I think my understanding was off by one ;)= . I didn't realize that the DO_IT thread from the userspace has the block= device open as well. I thought a bit about this, does it make sense to delay the essential cleanup steps until really all open file handles were closed? So that even if the DO_IT thread exits, the block device is still there. Only i= f the file is closed everything is cleaned up. Maybe this makes the code simpler and we can directly use krefs without any strange constructs. What do you think? This would also allow the client to setup a new socket as long as it does not close the nbd file handle. Could this behavior be potentially problematic for any client implementation? Does it solve our other issue with setting up a new sockets for an existing nbd blockdevice? Cc Wouter Best Regards, Markus >=20 >=20 > > > > Best Regards, > > > > Markus > > > > > + > > > + pr_debug("Closing nbd_dev %s. Active users =3D %u\n", > > > + disk->disk_name, > > > + atomic_read(&nbd_dev->users.refcount) - 1);= > > > +} > > > > > > static const struct block_device_operations nbd_fops =3D { > > > .owner =3D THIS_MODULE, > > > .ioctl =3D nbd_ioctl, > > > .compat_ioctl =3D nbd_ioctl, > > > + .open =3D nbd_open, > > > + .release =3D nbd_release > > > }; > > > > > > + > > > static void nbd_ws_func_shutdown(struct work_struct *ws_nbd) > > > { > > > struct nbd_device *nbd_dev =3D container_of(ws_nbd, struct = nbd_device, > > > @@ -1107,6 +1164,7 @@ static int __init nbd_init(void) > > > disk->fops =3D &nbd_fops; > > > disk->private_data =3D &nbd_dev[i]; > > > sprintf(disk->disk_name, "nbd%d", i); > > > + atomic_set(&nbd_dev[i].inuse, 0); > > > nbd_reset(&nbd_dev[i]); > > > add_disk(disk); > > > } > > > > > > > -- > > Pengutronix e.K. | = | > > Industrial Linux Solutions | http://www.pengutronix= .de/ | > > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917= =2D0 | > > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917= =2D5555 | >=20 >=20 >=20 >=20 >=20 =2D-=20 Pengutronix e.K. | = | Industrial Linux Solutions | http://www.pengutronix.de/= | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 = | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-555= 5 | --nextPart7409402.DglFrlgW9q Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXYPYVAAoJENnm3voMNZulgnYP/AxRLmQdFYlWzVSxEi9zushf GOmv+M7U+wXd8XbUAQqZvZ4wtlG301noXU7BAqp62oZXAvjyWMpml9UKJKRlCD4e 7QgFrMThzIF/a0x2vWjag8Y3bG7m4WdToQ+J5sfQb5/wRRIxoVM/GqaVsuC5htt6 4Yyu3opXG1CVsQ/tFqgtiRPfzLj/S62JyftQciWwDuPa8/PcRW1wO5EgSkwaohKA N+Hi4UwRkeX0mGb5gdQrux3TkwlWAYPLTNBkNJlTOeHpXAc8QNAp0oOqrHNamsjj lRLV0k7YiAdio1utJiHeXAjv3PtMCT7Xva4TQOyStkJJ4wlt5PNICXHimXopGMMg 0mvL9UhOfVhNYFPr1QZkdg98mESjGif+3xjDTfHpbjhwkDyGue5V2mE3wnfzTOQw lEiFzLLYunmuOpCZ/grihMYde3BpQqeUQNS2LXn7SjhNj7ckSUClgFWe/kUbRvww AT7R3fKJrZCuVCUprR4droOnNOYyF6BGQknNbwrP5u/9wI46OrMCiIEoExJbcy0z //QwdpdyuuvP6tm0dKRv1uFwabAoLes903T2Jyl7UWwYl/p5yUHAdZfEMvoiJz9f fHeAcfHVBa1mR/4+B5uEjzESGvZR7jkPIHNgcAjWZ15BloehbYKxppAcvITJhGXp 5j3PA7HAGZf6eB3zsMbM =uimx -----END PGP SIGNATURE----- --nextPart7409402.DglFrlgW9q--