All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-block@vger.kernel.org,
	linux-scsi <linux-scsi@vger.kernel.org>,
	Jens Axboe <axboe@fb.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>, Tejun Heo <tj@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: Time to make dynamically allocated devt the default for scsi disks?
Date: Sat, 13 Aug 2016 09:29:07 -0700	[thread overview]
Message-ID: <CAPcyv4gDw49gZBO=Y+K9krgth68OUHcs8urGDNXiZzLNa6kotg@mail.gmail.com> (raw)
In-Reply-To: <1471101800.2397.9.camel@HansenPartnership.com>

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]

On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:
>> On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <
>> dan.j.williams@intel.com> wrote:
>> > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
>> > <James.Bottomley@hansenpartnership.com> wrote:
>> > > On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
>> > > > Before spending effort trying to flush the destruction of old
>> > > > bdi
>> > > > instances before new ones are registered, is it rather time to
>> > > > complete the conversion of sd to only use dynamically allocated
>> > > > devt?
>> > >
>> > > Do we have to go that far?  Surely your fix is extensible: the
>> > > only
>> > > reason it doesn't work for us is that the gendisk holds the
>> > > parent
>> > > without a reference, so we can free the SCSI device before its
>> > > child
>> > > gendisk (good job no-one actually uses gendisk->parent after
>> > > we've
>> > > released it ...).  If we fix that it would mean SCSI can't
>> > > release the
>> > > sdev until after the queue is dead and the bdi namespace
>> > > released, so
>> > > isn't something like this the easy fix?
>> > >
>> > > James
>> > >
>> > > ---
>> > >
>> > > diff --git a/block/genhd.c b/block/genhd.c
>> > > index fcd6d4f..54ae4ae 100644
>> > > --- a/block/genhd.c
>> > > +++ b/block/genhd.c
>> > > @@ -514,7 +514,7 @@ static void register_disk(struct device
>> > > *parent, struct gendisk *disk)
>> > >         struct hd_struct *part;
>> > >         int err;
>> > >
>> > > -       ddev->parent = parent;
>> > > +       ddev->parent = get_device(parent);
>> > >
>> > >         dev_set_name(ddev, "%s", disk->disk_name);
>> > >
>> > > @@ -1144,6 +1144,7 @@ static void disk_release(struct device
>> > > *dev)
>> > >         hd_free_part(&disk->part0);
>> > >         if (disk->queue)
>> > >                 blk_put_queue(disk->queue);
>> > > +       put_device(dev->parent);
>> > >         kfree(disk);
>> > >  }
>> > >  struct class block_class = {
>> >
>> > Looks ok at first glance to me.
>> >
>> > We do hold a reference on the parent device, but it gets dropped at
>> > device_unregister() time and this moves it out to the final put.
>
> We do?  Where?

Yes, register_disk() does "ddev->parent = parent" and then
"device_add(ddev)".  device_add() takes the parent reference.

>
>> > However, this does leave static devt block-device-drivers that
>> > register a disk without a parent device susceptible to the race...
>> > I think those exist given all the drivers still using add_disk()
>> > after commit 52c44d93c26f "block: remove ->driverfs_dev".
>
> It does?  The race is the fact that the parent can be removed before
> the child meaning if the parent name is re-registered before the child
> dies we get a duplicate name in bdi space.

No, the race is that the *name* of the parent isn't released until the
child is both unregistered and put.  The device core is already
ensuring that the parent is not released until all descendants have
been removed.

>
>> So I tried the attached and it makes the libnvdimm unit tests start
>> crashing.
>
> Well, the attached is clearly buggy, isn't it?  You're trying to do a
> get on the parent before the parent is actually set.

Ah, yes, thank you.  Fixed up v2 attached that passes my tests.

> Why don't you
> just try the incremental patch I sent instead of trying to rework it?

I reworked it because it is the bdi that holds this extra dependency
on the disk's parent, not the disk itself.

>
>>   A couple crash logs attached.  Not yet sure what assumption
>> is getting violated, but how about that conversion of scsi to use
>> dynamic devt? ;-)
>
> It's completely orthogonal.  The problem is in hierarchy lifetimes:
> switching from static to dynamic allocation won't change that at all.
>  You don't see this problem in nvme because the parent control device's
> lifetime belongs to the controller not the disk.  In SCSI the parent is
> our representation of the SCSI device whose lifetime is governed at the
> SCSI level and effectively represents the disk.
>

No, it's only the name.  We could achieve the same by teaching the
block core to manage the "sd_index_ida" instead of the sd driver
itself, but the v2-patch attached works and does not introduce that
layering violation.

[-- Attachment #2: patch-v2 --]
[-- Type: application/octet-stream, Size: 4786 bytes --]

block: extend the lifetime of disk parent devices

From: Dan Williams <dan.j.williams@intel.com>

Commit df08c32ce3be "block: fix bdi vs gendisk lifetime mismatch" fixed
the case where the bdi is named from a gendisk with a dynamically
allocated devt.  However this leaves the bug in place hole for drivers
using a static devt like sd.  James observes that for scsi we can
address the issue by building on the initial fix to extend the lifetime
of the disk device until the bdi is released.  This effectively makes
the lifetime of a statically allocated devt identical to that of a
dynamically allocated devt.

However, this leaves a hole for block device drivers that use a
statically allocated devt, but do not specify a parent device. Those
drivers if they exist, should move to dynamically allocated devt, or
register a parent device if they are on a hotplug capable bus.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Suggested-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c               |   12 +++++++-----
 include/linux/backing-dev.h |    2 +-
 mm/backing-dev.c            |   13 +++++++++++--
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..845c15d7357a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -506,7 +506,7 @@ static int exact_lock(dev_t devt, void *data)
 	return 0;
 }
 
-static void register_disk(struct device *parent, struct gendisk *disk)
+static void register_disk(struct gendisk *disk)
 {
 	struct device *ddev = disk_to_dev(disk);
 	struct block_device *bdev;
@@ -514,8 +514,6 @@ static void register_disk(struct device *parent, struct gendisk *disk)
 	struct hd_struct *part;
 	int err;
 
-	ddev->parent = parent;
-
 	dev_set_name(ddev, "%s", disk->disk_name);
 
 	/* delay uevents, until we scanned partition table */
@@ -602,6 +600,7 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 		WARN_ON(1);
 		return;
 	}
+	disk_to_dev(disk)->parent = parent;
 	disk_to_dev(disk)->devt = devt;
 
 	/* ->major and ->first_minor aren't supposed to be
@@ -614,11 +613,11 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 
 	/* Register BDI before referencing it from bdev */
 	bdi = &disk->queue->backing_dev_info;
-	bdi_register_owner(bdi, disk_to_dev(disk));
+	bdi_register_disk(bdi, disk);
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
 			    exact_match, exact_lock, disk);
-	register_disk(parent, disk);
+	register_disk(disk);
 	blk_register_queue(disk);
 
 	/*
@@ -1144,6 +1143,9 @@ static void disk_release(struct device *dev)
 	hd_free_part(&disk->part0);
 	if (disk->queue)
 		blk_put_queue(disk->queue);
+
+	/* see bdi_register_disk() */
+	put_device(dev->parent);
 	kfree(disk);
 }
 struct class block_class = {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..df9e1a766157 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,7 +24,7 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner);
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk);
 void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8fde443f36d7..b621c8e8cd68 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -825,8 +825,9 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
 }
 EXPORT_SYMBOL(bdi_register_dev);
 
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk)
 {
+	struct device *owner = disk_to_dev(disk);
 	int rc;
 
 	rc = bdi_register(bdi, NULL, "%u:%u", MAJOR(owner->devt),
@@ -835,9 +836,17 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 		return rc;
 	bdi->owner = owner;
 	get_device(owner);
+
+	/*
+	 * For statically allocated devt disks, like scsi, the disk's
+	 * parent holds the lifetime for the devt.  Prevent the parent
+	 * from releasing the devt for reuse until the disk is released.
+	 */
+	get_device(owner->parent);
+
 	return 0;
 }
-EXPORT_SYMBOL(bdi_register_owner);
+EXPORT_SYMBOL(bdi_register_disk);
 
 /*
  * Remove bdi from bdi_list, and ensure that it is no longer visible

  reply	other threads:[~2016-08-13 16:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
2016-08-12 21:35 ` Bart Van Assche
2016-08-12 21:35   ` Bart Van Assche
2016-08-12 23:32   ` Dan Williams
2016-08-13  0:17 ` James Bottomley
2016-08-13  0:29   ` Dan Williams
2016-08-13  4:57     ` Dan Williams
2016-08-13 15:23       ` James Bottomley
2016-08-13 16:29         ` Dan Williams [this message]
2016-08-13 17:43           ` James Bottomley
2016-08-13 18:27             ` Dan Williams
2016-08-13 20:38               ` Dan Williams
2016-08-14 17:20               ` James Bottomley
2016-08-14 18:08                 ` Dan Williams
2016-08-14 18:23                   ` Dan Williams
2016-08-15 20:11                     ` Bart Van Assche
2016-08-29 18:16                 ` Bart Van Assche
2016-08-29 18:16                   ` Bart Van Assche
2016-08-30 20:43                   ` Dan Williams
2016-08-30 20:43                     ` Dan Williams
2016-08-30 20:53                     ` Bart Van Assche
2016-09-01 15:10                   ` James Bottomley
2016-08-13 12:13 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gDw49gZBO=Y+K9krgth68OUHcs8urGDNXiZzLNa6kotg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=axboe@fb.com \
    --cc=dave.hansen@intel.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.