All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-block@vger.kernel.org,
	linux-scsi <linux-scsi@vger.kernel.org>,
	Jens Axboe <axboe@fb.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>, Tejun Heo <tj@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: Time to make dynamically allocated devt the default for scsi disks?
Date: Sun, 14 Aug 2016 10:20:52 -0700	[thread overview]
Message-ID: <1471195252.2355.18.camel@HansenPartnership.com> (raw)
In-Reply-To: <CAPcyv4gHq=RraHgcJGx_txk7YJHC8YQ3gCT4CHL0hvzsGpU1wg@mail.gmail.com>

On Sat, 2016-08-13 at 11:27 -0700, Dan Williams wrote:
> On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
> > > On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
> > > <James.Bottomley@hansenpartnership.com> wrote:
> > > > It does?  The race is the fact that the parent can be removed
> > > > before the child meaning if the parent name is re-registered 
> > > > before the child dies we get a duplicate name in bdi space.
> > > 
> > > No, the race is that the *name* of the parent isn't released 
> > > until the child is both unregistered and put.  The device core is
> > > already ensuring that the parent is not released until all 
> > > descendants have been removed.
> > 
> > We're both saying the same thing: the problem is that, with
> > df08c32ce3be the bdi name lifetime is tied to the lifetime of the
> > gendisk.  However, the parent of the gendisk currently is only tied 
> > to the visibility lifetime of the gendisk, not the final put 
> > lifetime, so it doesn't see this.
> > 
> > > > 
> > > > > So I tried the attached and it makes the libnvdimm unit tests
> > > > > start crashing.
> > > > 
> > > > Well, the attached is clearly buggy, isn't it?  You're trying 
> > > > to do a get on the parent before the parent is actually set.
> > > 
> > > Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
> > > 
> > > > Why don't you just try the incremental patch I sent instead of
> > > > trying to rework it?
> > > 
> > > I reworked it because it is the bdi that holds this extra 
> > > dependency on the disk's parent, not the disk itself.
> > 
> > Philosophically I don't like this approach.  The dependency goes
> > 
> > bdi->gendisk->parent
> 
> I'm arguing that there is no bdi->gendisk dependency.

You created one with your bdi->owner field.  Just because you didn't
call it a parent doesn't mean it wasn't one.  Arguably the whole bdi
thing is strangely done because gendisk treats it like a class and
that's how it behaves, it just doesn't have a proper class structure
(which is why gendisk creates the link that would be done by the class
infrastructure)

> The dependency is:
> 
> bdi->devt

devt isn't a device (in the struct device sense).  It exists as
effectively an embedded component of the bdi.  As far as I can tell
there's no reason for it to be separately allocated, it could be
properly embedded as is the normal pattern.

> It just so happens that block-dynamic devt is released in
> disk_release().
> 
> > Making the bdi manage the parent lifetime is an unusual pattern.
> >  Making the parent stay around until the last reference to gendisk 
> > is put is the usual one.
> 
> What's unusual is the bdi's dependency on the allocated name, not the
> gendisk itself.

A name is just a resource belonging to an object.  The object it
belongs to is the bdi and the bdi is parented by the owner field (and a
hokey link) to the gendisk.

> > > > >   A couple crash logs attached.  Not yet sure what assumption
> > > > > is getting violated, but how about that conversion of scsi to 
> > > > > use dynamic devt? ;-)
> > > > 
> > > > It's completely orthogonal.  The problem is in hierarchy 
> > > > lifetimes: switching from static to dynamic allocation won't 
> > > > change that at all.  You don't see this problem in nvme because 
> > > > the parent control device's lifetime belongs to the controller 
> > > > not the disk.  In SCSI the parent is our representation of the 
> > > > SCSI device whose lifetime is governed at the SCSI level and 
> > > > effectively represents the disk.
> > > > 
> > > 
> > > No, it's only the name.  We could achieve the same by teaching 
> > > the block core to manage the "sd_index_ida" instead of the sd 
> > > driver itself, but the v2-patch attached works and does not 
> > > introduce that layering violation.
> > 
> > Um, so this patch doesn't fix the problem. It merely makes the 
> > lifetime rules correct so the problem can then be fixed at the scsi
> > level.
> 
> You're right that this patch does not fix the problem, I missed that
> the scsi_disk is not the parent of the gendisk, so this patch does
> nothing to delay scsi_disk_release.  What I think is the real fix is
> to make the devt properly reference counted and prevent
> ida_remove(&sd_index_ida, sdkp->index); from being called until all
> objects derived from that allocation are done with it.

OK, this is another philosophical difference, I suppose: since bdi is
already so complex and non-standard, I really don't think adding more
non standard stuff is a good idea.  The simplest way to fix it is

   1. The two line patch I already sent to make the bdi hold the owner
      ->parent until release
   2. Parent the gendisk to scsi_disk->dev.  The name release is already
      in the correct place, so this is a 3 line patch.

These are established patterns, so they're both understandable to
anyone who reads the code.  The answer to any other BDI lifetime
problem is to free the name in the parent release.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..222771d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	}
 
 	blk_pm_runtime_init(sdp->request_queue, dev);
-	device_add_disk(dev, gd);
+	/*
+	 * previously the parent of the gendisk was the scsi device.  It
+	 * was moved to fix lifetime rules, so now we install a symlink
+	 * to the new location of the block class directory
+	 */
+	device_add_disk(&sdkp->dev, gd);
+	WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
 	if (sdkp->capacity)
 		sd_dif_config_host(sdkp);
 
@@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
 
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
+	sysfs_remove_link(&dev->kobj, "block");
 	device_del(&sdkp->dev);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);

  parent reply	other threads:[~2016-08-14 17:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
2016-08-12 21:35 ` Bart Van Assche
2016-08-12 21:35   ` Bart Van Assche
2016-08-12 23:32   ` Dan Williams
2016-08-13  0:17 ` James Bottomley
2016-08-13  0:29   ` Dan Williams
2016-08-13  4:57     ` Dan Williams
2016-08-13 15:23       ` James Bottomley
2016-08-13 16:29         ` Dan Williams
2016-08-13 17:43           ` James Bottomley
2016-08-13 18:27             ` Dan Williams
2016-08-13 20:38               ` Dan Williams
2016-08-14 17:20               ` James Bottomley [this message]
2016-08-14 18:08                 ` Dan Williams
2016-08-14 18:23                   ` Dan Williams
2016-08-15 20:11                     ` Bart Van Assche
2016-08-29 18:16                 ` Bart Van Assche
2016-08-29 18:16                   ` Bart Van Assche
2016-08-30 20:43                   ` Dan Williams
2016-08-30 20:43                     ` Dan Williams
2016-08-30 20:53                     ` Bart Van Assche
2016-09-01 15:10                   ` James Bottomley
2016-08-13 12:13 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1471195252.2355.18.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=axboe@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.