All of lore.kernel.org
 help / color / mirror / Atom feed
* Time to make dynamically allocated devt the default for scsi disks?
@ 2016-08-12 21:29 Dan Williams
  2016-08-12 21:35   ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-12 21:29 UTC (permalink / raw)
  To: linux-block, linux-scsi
  Cc: Jens Axboe, Martin K. Petersen, Christoph Hellwig, Tejun Heo,
	Dave Hansen, Jej B

...or, for that matter, any block device driver on a bus that supports hotplug?

In 4.8 Jens merged the following fix for a crash that was triggered by
repeatedly reconfiguring a libnvdimm namespace causing it to destroy
and create disks (rapid hotplug).

    df08c32ce3be block: fix bdi vs gendisk lifetime mismatch

At the time I realized that the fix only addressed block device
drivers that use a dynamic devt and theorized that we still have a
problem with static devt drivers like sd.  Lo and behold Dave was
bitten by this exact scenario:

 WARNING: CPU: 23 PID: 9672 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:16'
 [..]
 CPU: 23 PID: 9672 Comm: kworker/u66:2 Tainted: G          I
4.6.3-300.fc24.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 3.0.0 01/31/2011
 Workqueue: events_unbound async_run_entry_fn
  0000000000000286 000000009447a7a4 ffff88004b187a50 ffffffff813dbcff
  ffff88004b187aa0 0000000000000000 ffff88004b187a90 ffffffff810a740b
  0000001f0aa24000 ffff88080aa24000 ffff880809de2260 ffff880e131cbac8
 Call Trace:
  [<ffffffff813dbcff>] dump_stack+0x63/0x84
  [<ffffffff810a740b>] __warn+0xcb/0xf0
  [<ffffffff810a748f>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff812cd462>] sysfs_warn_dup+0x62/0x80
  [<ffffffff812cd547>] sysfs_create_dir_ns+0x77/0x90
  [<ffffffff813de677>] kobject_add_internal+0xa7/0x330
  [<ffffffff813e7fca>] ? vsnprintf+0x20a/0x500
  [<ffffffff813dee25>] kobject_add+0x75/0xd0
  [<ffffffff815174a3>] ? device_private_init+0x23/0x70
  [<ffffffff817db102>] ? mutex_lock+0x12/0x30
  [<ffffffff8151777b>] device_add+0x28b/0x670
  [<ffffffff81517d50>] device_create_groups_vargs+0xe0/0xf0
  [<ffffffff81517d7c>] device_create_vargs+0x1c/0x20
  [<ffffffff811dd82c>] bdi_register+0x8c/0x180
  [<ffffffff811dd945>] bdi_register_dev+0x25/0x30
  [<ffffffff813bd3b1>] add_disk+0x171/0x490
  [<ffffffff815270b1>] ? update_autosuspend+0x51/0x60
  [<ffffffff8156a7c2>] sd_probe_async+0x112/0x1c0
  [<ffffffff810c997a>] async_run_entry_fn+0x4a/0x140
  [<ffffffff810c0d54>] process_one_work+0x184/0x440
  [<ffffffff810c105e>] worker_thread+0x4e/0x480
  [<ffffffff810c1010>] ? process_one_work+0x440/0x440
  [<ffffffff810c6f48>] kthread+0xd8/0xf0
  [<ffffffff817dd602>] ret_from_fork+0x22/0x40
  [<ffffffff810c6e70>] ? kthread_worker_fn+0x180/0x180

Before spending effort trying to flush the destruction of old bdi
instances before new ones are registered, is it rather time to
complete the conversion of sd to only use dynamically allocated devt?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
@ 2016-08-12 21:35   ` Bart Van Assche
  2016-08-13  0:17 ` James Bottomley
  2016-08-13 12:13 ` Tejun Heo
  2 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-12 21:35 UTC (permalink / raw)
  To: Dan Williams, linux-block, linux-scsi
  Cc: Jens Axboe, Martin K. Petersen, Christoph Hellwig, Tejun Heo,
	Dave Hansen, Jej B

On 08/12/2016 02:29 PM, Dan Williams wrote:
> ...or, for that matter, any block device driver on a bus that supports hotplug?
>
> In 4.8 Jens merged the following fix for a crash that was triggered by
> repeatedly reconfiguring a libnvdimm namespace causing it to destroy
> and create disks (rapid hotplug).
>
>     df08c32ce3be block: fix bdi vs gendisk lifetime mismatch
>
> At the time I realized that the fix only addressed block device
> drivers that use a dynamic devt and theorized that we still have a
> problem with static devt drivers like sd.  Lo and behold Dave was
> bitten by this exact scenario:
>
>  WARNING: CPU: 23 PID: 9672 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
>  sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:16'
>  [..]
>  CPU: 23 PID: 9672 Comm: kworker/u66:2 Tainted: G          I
> 4.6.3-300.fc24.x86_64 #1
> [ ... ]
>
> Before spending effort trying to flush the destruction of old bdi
> instances before new ones are registered, is it rather time to
> complete the conversion of sd to only use dynamically allocated devt?

Hello Dan,

Sorry but I'm not familiar with dynamically allocated devt's. But I 
would appreciate it if Dave could test the following patch: "Fix an sd 
reregistration race, v5" (https://patchwork.kernel.org/patch/9178155/).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
@ 2016-08-12 21:35   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-12 21:35 UTC (permalink / raw)
  To: Dan Williams, linux-block, linux-scsi
  Cc: Jens Axboe, Martin K. Petersen, Christoph Hellwig, Tejun Heo,
	Dave Hansen, Jej B

On 08/12/2016 02:29 PM, Dan Williams wrote:
> ...or, for that matter, any block device driver on a bus that supports hotplug?
>
> In 4.8 Jens merged the following fix for a crash that was triggered by
> repeatedly reconfiguring a libnvdimm namespace causing it to destroy
> and create disks (rapid hotplug).
>
>     df08c32ce3be block: fix bdi vs gendisk lifetime mismatch
>
> At the time I realized that the fix only addressed block device
> drivers that use a dynamic devt and theorized that we still have a
> problem with static devt drivers like sd.  Lo and behold Dave was
> bitten by this exact scenario:
>
>  WARNING: CPU: 23 PID: 9672 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
>  sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:16'
>  [..]
>  CPU: 23 PID: 9672 Comm: kworker/u66:2 Tainted: G          I
> 4.6.3-300.fc24.x86_64 #1
> [ ... ]
>
> Before spending effort trying to flush the destruction of old bdi
> instances before new ones are registered, is it rather time to
> complete the conversion of sd to only use dynamically allocated devt?

Hello Dan,

Sorry but I'm not familiar with dynamically allocated devt's. But I 
would appreciate it if Dave could test the following patch: "Fix an sd 
reregistration race, v5" (https://patchwork.kernel.org/patch/9178155/).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-12 21:35   ` Bart Van Assche
  (?)
@ 2016-08-12 23:32   ` Dan Williams
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-12 23:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen, Jej B

On Fri, Aug 12, 2016 at 2:35 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 08/12/2016 02:29 PM, Dan Williams wrote:
>>
>> ...or, for that matter, any block device driver on a bus that supports
>> hotplug?
>>
>> In 4.8 Jens merged the following fix for a crash that was triggered by
>> repeatedly reconfiguring a libnvdimm namespace causing it to destroy
>> and create disks (rapid hotplug).
>>
>>     df08c32ce3be block: fix bdi vs gendisk lifetime mismatch
>>
>> At the time I realized that the fix only addressed block device
>> drivers that use a dynamic devt and theorized that we still have a
>> problem with static devt drivers like sd.  Lo and behold Dave was
>> bitten by this exact scenario:
>>
>>  WARNING: CPU: 23 PID: 9672 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
>>  sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:16'
>>  [..]
>>  CPU: 23 PID: 9672 Comm: kworker/u66:2 Tainted: G          I
>> 4.6.3-300.fc24.x86_64 #1
>> [ ... ]
>>
>> Before spending effort trying to flush the destruction of old bdi
>> instances before new ones are registered, is it rather time to
>> complete the conversion of sd to only use dynamically allocated devt?
>
>
> Hello Dan,
>
> Sorry but I'm not familiar with dynamically allocated devt's. But I would
> appreciate it if Dave could test the following patch: "Fix an sd
> reregistration race, v5" (https://patchwork.kernel.org/patch/9178155/).
>

Hi Bart,

See the BLOCK_EXT_MAJOR paths in blk_{alloc|free}_devt() and the
bdi_register_owner() helper. I think bdi_register_owner() addresses
this problem in a way that easier to comprehend than attempting
heroics in scsi to manipulate reference counts as that patch is doing.
If sd can tolerate being dynamic all the time then perhaps we can
abandon static block device devts across the board?  The concern is if
we have legacy userspace that is not prepared for a scsi disk to be at
a major number other than 8.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
  2016-08-12 21:35   ` Bart Van Assche
@ 2016-08-13  0:17 ` James Bottomley
  2016-08-13  0:29   ` Dan Williams
  2016-08-13 12:13 ` Tejun Heo
  2 siblings, 1 reply; 23+ messages in thread
From: James Bottomley @ 2016-08-13  0:17 UTC (permalink / raw)
  To: Dan Williams, linux-block, linux-scsi
  Cc: Jens Axboe, Martin K. Petersen, Christoph Hellwig, Tejun Heo,
	Dave Hansen

On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
> Before spending effort trying to flush the destruction of old bdi
> instances before new ones are registered, is it rather time to
> complete the conversion of sd to only use dynamically allocated devt?

Do we have to go that far?  Surely your fix is extensible: the only
reason it doesn't work for us is that the gendisk holds the parent
without a reference, so we can free the SCSI device before its child
gendisk (good job no-one actually uses gendisk->parent after we've
released it ...).  If we fix that it would mean SCSI can't release the
sdev until after the queue is dead and the bdi namespace released, so
isn't something like this the easy fix?

James

---

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4f..54ae4ae 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -514,7 +514,7 @@ static void register_disk(struct device *parent, struct gendisk *disk)
 	struct hd_struct *part;
 	int err;
 
-	ddev->parent = parent;
+	ddev->parent = get_device(parent);
 
 	dev_set_name(ddev, "%s", disk->disk_name);
 
@@ -1144,6 +1144,7 @@ static void disk_release(struct device *dev)
 	hd_free_part(&disk->part0);
 	if (disk->queue)
 		blk_put_queue(disk->queue);
+	put_device(dev->parent);
 	kfree(disk);
 }
 struct class block_class = {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13  0:17 ` James Bottomley
@ 2016-08-13  0:29   ` Dan Williams
  2016-08-13  4:57     ` Dan Williams
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2016-08-13  0:29 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
>> Before spending effort trying to flush the destruction of old bdi
>> instances before new ones are registered, is it rather time to
>> complete the conversion of sd to only use dynamically allocated devt?
>
> Do we have to go that far?  Surely your fix is extensible: the only
> reason it doesn't work for us is that the gendisk holds the parent
> without a reference, so we can free the SCSI device before its child
> gendisk (good job no-one actually uses gendisk->parent after we've
> released it ...).  If we fix that it would mean SCSI can't release the
> sdev until after the queue is dead and the bdi namespace released, so
> isn't something like this the easy fix?
>
> James
>
> ---
>
> diff --git a/block/genhd.c b/block/genhd.c
> index fcd6d4f..54ae4ae 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -514,7 +514,7 @@ static void register_disk(struct device *parent, struct gendisk *disk)
>         struct hd_struct *part;
>         int err;
>
> -       ddev->parent = parent;
> +       ddev->parent = get_device(parent);
>
>         dev_set_name(ddev, "%s", disk->disk_name);
>
> @@ -1144,6 +1144,7 @@ static void disk_release(struct device *dev)
>         hd_free_part(&disk->part0);
>         if (disk->queue)
>                 blk_put_queue(disk->queue);
> +       put_device(dev->parent);
>         kfree(disk);
>  }
>  struct class block_class = {

Looks ok at first glance to me.

We do hold a reference on the parent device, but it gets dropped at
device_unregister() time and this moves it out to the final put.
However, this does leave static devt block-device-drivers that
register a disk without a parent device susceptible to the race... I
think those exist given all the drivers still using add_disk() after
commit 52c44d93c26f "block: remove ->driverfs_dev".

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13  0:29   ` Dan Williams
@ 2016-08-13  4:57     ` Dan Williams
  2016-08-13 15:23       ` James Bottomley
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2016-08-13  4:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
>>> Before spending effort trying to flush the destruction of old bdi
>>> instances before new ones are registered, is it rather time to
>>> complete the conversion of sd to only use dynamically allocated devt?
>>
>> Do we have to go that far?  Surely your fix is extensible: the only
>> reason it doesn't work for us is that the gendisk holds the parent
>> without a reference, so we can free the SCSI device before its child
>> gendisk (good job no-one actually uses gendisk->parent after we've
>> released it ...).  If we fix that it would mean SCSI can't release the
>> sdev until after the queue is dead and the bdi namespace released, so
>> isn't something like this the easy fix?
>>
>> James
>>
>> ---
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index fcd6d4f..54ae4ae 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -514,7 +514,7 @@ static void register_disk(struct device *parent, struct gendisk *disk)
>>         struct hd_struct *part;
>>         int err;
>>
>> -       ddev->parent = parent;
>> +       ddev->parent = get_device(parent);
>>
>>         dev_set_name(ddev, "%s", disk->disk_name);
>>
>> @@ -1144,6 +1144,7 @@ static void disk_release(struct device *dev)
>>         hd_free_part(&disk->part0);
>>         if (disk->queue)
>>                 blk_put_queue(disk->queue);
>> +       put_device(dev->parent);
>>         kfree(disk);
>>  }
>>  struct class block_class = {
>
> Looks ok at first glance to me.
>
> We do hold a reference on the parent device, but it gets dropped at
> device_unregister() time and this moves it out to the final put.
> However, this does leave static devt block-device-drivers that
> register a disk without a parent device susceptible to the race... I
> think those exist given all the drivers still using add_disk() after
> commit 52c44d93c26f "block: remove ->driverfs_dev".

So I tried the attached and it makes the libnvdimm unit tests start
crashing.  A couple crash logs attached.  Not yet sure what assumption
is getting violated, but how about that conversion of scsi to use
dynamic devt? ;-)

[-- Attachment #2: crash2.txt --]
[-- Type: text/plain, Size: 9503 bytes --]

[   30.149817] =========================
[   30.150729] [ BUG: held lock freed! ]
[   30.151644] 4.8.0-rc1+ #19 Tainted: G           O   
[   30.152717] -------------------------
[   30.153635] lt-libndctl/863 is freeing memory ffff880310b7e128-ffff880310b7e927, with a lock still held there!
[   30.155719]  (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40
[   30.157927] 6 locks held by lt-libndctl/863:
[   30.158917]  #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff8128c537>] __sb_start_write+0xb7/0xf0
[   30.161267]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8131c191>] kernfs_fop_write+0x101/0x1c0
[   30.163545]  #2:  (s_active#3){++++.+}, at: [<ffffffff8131c199>] kernfs_fop_write+0x109/0x1c0
[   30.165893]  #3:  (&dev->mutex){......}, at: [<ffffffff815f7a3f>] unbind_store+0xff/0x160
[   30.168124]  #4:  (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40
[   30.170435]  #5:  (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40
[   30.172755] 
[   30.172755] stack backtrace:
[   30.174156] CPU: 0 PID: 863 Comm: lt-libndctl Tainted: G           O    4.8.0-rc1+ #19
[   30.175988] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   30.178260]  0000000000000086 0000000099c8ecbb ffff8803394538d0 ffffffff814b4f23
[   30.180325]  ffff8803380c4a90 ffff880310b7e128 ffff880339453908 ffffffff81107653
[   30.182389]  ffff880310b7e128 0000000000000282 ffff88033b002940 ffffea000c42de00
[   30.184456] Call Trace:
[   30.185232]  [<ffffffff814b4f23>] dump_stack+0x85/0xc2
[   30.186328]  [<ffffffff81107653>] debug_check_no_locks_freed+0x153/0x160
[   30.187616]  [<ffffffffa003fb1e>] ? namespace_pmem_release+0x2e/0x40 [libnvdimm]
[   30.189391]  [<ffffffff8125e039>] kfree+0xc9/0x270
[   30.190445]  [<ffffffffa003fb1e>] namespace_pmem_release+0x2e/0x40 [libnvdimm]
[   30.192197]  [<ffffffff815f4042>] device_release+0x32/0x90
[   30.193333]  [<ffffffff814b7a1a>] kobject_release+0x6a/0x170
[   30.194494]  [<ffffffff814b78e7>] kobject_put+0x27/0x50
[   30.195602]  [<ffffffff815f4377>] put_device+0x17/0x20
[   30.196696]  [<ffffffff81493290>] disk_release+0xc0/0x100
[   30.197828]  [<ffffffff815f4042>] device_release+0x32/0x90
[   30.198966]  [<ffffffff814b7a1a>] kobject_release+0x6a/0x170
[   30.200126]  [<ffffffff814b78e7>] kobject_put+0x27/0x50
[   30.201230]  [<ffffffff815f4377>] put_device+0x17/0x20
[   30.202328]  [<ffffffff81214e0a>] bdi_unregister+0x20a/0x290
[   30.203482]  [<ffffffff81107289>] ? trace_hardirqs_on_caller+0x129/0x1b0
[   30.204769]  [<ffffffff8147e2ac>] blk_cleanup_queue+0x18c/0x280
[   30.205957]  [<ffffffffa00a117e>] pmem_release_queue+0xe/0x10 [nd_pmem]
[   30.207228]  [<ffffffff815fd49f>] devm_action_release+0xf/0x20
[   30.208407]  [<ffffffff815fdf79>] release_nodes+0x129/0x230
[   30.209556]  [<ffffffff815fe2ec>] devres_release_all+0x3c/0x60
[   30.210736]  [<ffffffff815f9249>] __device_release_driver+0xa9/0x160
[   30.211979]  [<ffffffff815f9325>] device_release_driver+0x25/0x40
[   30.213187]  [<ffffffff815f88a4>] bus_remove_device+0x114/0x190
[   30.214377]  [<ffffffff815f4978>] device_del+0x148/0x270
[   30.215490]  [<ffffffff815f4380>] ? put_device+0x20/0x20
[   30.216609]  [<ffffffffa003f770>] ? nd_region_remove+0x80/0x80 [libnvdimm]
[   30.217914]  [<ffffffff815f4aba>] device_unregister+0x1a/0x60
[   30.219085]  [<ffffffffa003b828>] nd_device_unregister+0x48/0x50 [libnvdimm]
[   30.220412]  [<ffffffffa003f780>] child_unregister+0x10/0x20 [libnvdimm]
[   30.221696]  [<ffffffff815f4470>] device_for_each_child+0x50/0x90
[   30.222910]  [<ffffffffa003f718>] nd_region_remove+0x28/0x80 [libnvdimm]
[   30.224195]  [<ffffffffa003bbe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm]
[   30.225488]  [<ffffffff815f9241>] __device_release_driver+0xa1/0x160
[   30.226727]  [<ffffffff815f9325>] device_release_driver+0x25/0x40
[   30.227947]  [<ffffffff815f7a4f>] unbind_store+0x10f/0x160
[   30.229081]  [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30
[   30.230215]  [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60
[   30.231352]  [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0
[   30.232530]  [<ffffffff81288977>] __vfs_write+0x37/0x160
[   30.233643]  [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30
[   30.234814]  [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90
[   30.235988]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   30.237169]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   30.238344]  [<ffffffff812890d8>] vfs_write+0xb8/0x1b0
[   30.239444]  [<ffffffff8128a568>] SyS_write+0x58/0xc0
[   30.240534]  [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[   30.693773] BUG: unable to handle kernel paging request at 0000000040070088

[..]


[   30.695296] IP: [<ffffffff819886ae>] klist_put+0xe/0x90
[   30.696501] PGD 3383e2067 PUD 0 
[   30.697532] Oops: 0000 [#1] SMP
[   30.698403] Dumping ftrace buffer:
[   30.699306]    (ftrace buffer empty)
[   30.700228] Modules linked in: nd_blk(O) nfit_test(O) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security ebtable_filter ebtables ip6table_filter ip6_tables dax_pmem(O) nd_pmem(O) nd_btt(O) dax(O) nd_e820(O) nfit(O) tpm_tis libnvdimm(O) tpm_tis_core tpm serio_raw nfit_test_iomap(O)
[   30.712449] CPU: 0 PID: 863 Comm: lt-libndctl Tainted: G           O    4.8.0-rc1+ #19
[   30.714325] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   30.716650] task: ffff8803380c4240 task.stack: ffff880339450000
[   30.717868] RIP: 0010:[<ffffffff819886ae>]  [<ffffffff819886ae>] klist_put+0xe/0x90
[   30.719774] RSP: 0018:ffff880339453b40  EFLAGS: 00010246
[   30.720917] RAX: 0000000000000000 RBX: ffff8803380c4240 RCX: ffff8803380c4240
[   30.722277] RDX: ffffffff819887dd RSI: 0000000000000001 RDI: 0000000040070088
[   30.723647] RBP: ffff880339453b60 R08: ffffffff81f4eba0 R09: 0000000000000000
[   30.725026] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000040070088
[   30.726395] R13: ffffffffa004d3e0 R14: 0000000000000001 R15: fffffffffffffff2
[   30.727763] FS:  00007f253e15f840(0000) GS:ffff88033fc00000(0000) knlGS:0000000000000000
[   30.729641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.730842] CR2: 0000000040070088 CR3: 00000003395d3000 CR4: 00000000000006f0
[   30.732212] Stack:
[   30.732941]  ffff8803380c4240 0000000040070088 ffffffffa004d3e0 ffff880310b79050
[   30.735047]  ffff880339453bc8 ffffffff819887ea ffff8803104ac038 ffff880324ba2848
[   30.737156]  ffffffff81f4eba0 ffffffff81f4eba0 0000000040070088 ffff8803380c4240
[   30.739249] Call Trace:
[   30.740029]  [<ffffffff819887ea>] klist_remove+0x7a/0xe0
[   30.741166]  [<ffffffff815f9291>] __device_release_driver+0xf1/0x160
[   30.742434]  [<ffffffff815f9325>] device_release_driver+0x25/0x40
[   30.743668]  [<ffffffff815f88a4>] bus_remove_device+0x114/0x190
[   30.744892]  [<ffffffff815f4978>] device_del+0x148/0x270
[   30.746038]  [<ffffffff815f4380>] ? put_device+0x20/0x20
[   30.747185]  [<ffffffffa003f770>] ? nd_region_remove+0x80/0x80 [libnvdimm]
[   30.748520]  [<ffffffff815f4aba>] device_unregister+0x1a/0x60
[   30.749715]  [<ffffffffa003b828>] nd_device_unregister+0x48/0x50 [libnvdimm]
[   30.751088]  [<ffffffffa003f780>] child_unregister+0x10/0x20 [libnvdimm]
[   30.752401]  [<ffffffff815f4470>] device_for_each_child+0x50/0x90
[   30.753650]  [<ffffffffa003f718>] nd_region_remove+0x28/0x80 [libnvdimm]
[   30.754979]  [<ffffffffa003bbe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm]
[   30.756307]  [<ffffffff815f9241>] __device_release_driver+0xa1/0x160
[   30.757583]  [<ffffffff815f9325>] device_release_driver+0x25/0x40
[   30.758828]  [<ffffffff815f7a4f>] unbind_store+0x10f/0x160
[   30.759986]  [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30
[   30.761145]  [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60
[   30.762299]  [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0
[   30.763502]  [<ffffffff81288977>] __vfs_write+0x37/0x160
[   30.764641]  [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30
[   30.765845]  [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90
[   30.767055]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   30.768265]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   30.769477]  [<ffffffff812890d8>] vfs_write+0xb8/0x1b0
[   30.770592]  [<ffffffff8128a568>] SyS_write+0x58/0xc0
[   30.771697]  [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[   30.772979] Code: f5 55 5d 00 01 e8 b3 6e 72 ff e9 63 ff ff ff e8 69 f5 00 00 e9 72 ff ff ff 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 41 89 f6 <4c> 8b 27 48 89 fb 49 83 e4 fe 4c 89 e7 4d 8b 6c 24 50 e8 3b f2 
[   30.781430] RIP  [<ffffffff819886ae>] klist_put+0xe/0x90
[   30.782642]  RSP <ffff880339453b40>
[   30.783550] CR2: 0000000040070088
[   30.784495] ---[ end trace 0dcefe91828fba52 ]---
[   30.786917] Kernel panic - not syncing: Fatal exception
[   30.788400] Dumping ftrace buffer:
[   30.789305]    (ftrace buffer empty)
[   30.790225] Kernel Offset: disabled
[   30.791137] Rebooting in 5 seconds..

[-- Attachment #3: crash1.txt --]
[-- Type: text/plain, Size: 3691 bytes --]

[   34.255986] general protection fault: 0000 [#1] SMP
[   34.257026] Dumping ftrace buffer:
[   34.257859]    (ftrace buffer empty)
[   34.258713] Modules linked in: nd_blk(O) nfit_test(O) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables nd_pmem(O) dax_pmem(O) dax(O) nd_btt(O) serio_raw nd_e820(O) nfit(O) libnvdimm(O) tpm_tis tpm_tis_core tpm nfit_test_iomap(O)
[   34.269688] CPU: 1 PID: 871 Comm: lt-libndctl Tainted: G           O    4.8.0-rc1+ #19
[   34.271364] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   34.273450] task: ffff880338b64240 task.stack: ffff880338a80000
[   34.274564] RIP: 0010:[<ffffffff819888ae>]  [<ffffffff819888ae>] klist_next+0x5e/0x120
[   34.276327] RSP: 0018:ffff880338a83c90  EFLAGS: 00010217
[   34.277371] RAX: ffff880313384278 RBX: 6b6b6b6b6b6b6b63 RCX: c2482a36b45bcd4d
[   34.278614] RDX: 0000000000000001 RSI: 0000000082f7649c RDI: ffff880313384248
[   34.279864] RBP: ffff880338a83cb8 R08: 0000000000000000 R09: 0000000000000000
[   34.281114] R10: 0000000000000000 R11: 0000000000003436 R12: ffff880338a83cc8
[   34.282356] R13: 0000000000000000 R14: ffff880313384a80 R15: 0000000000000000
[   34.283593] FS:  00007fa1a7e06840(0000) GS:ffff88033fd00000(0000) knlGS:0000000000000000
[   34.285306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.286404] CR2: 0000561d5afc9208 CR3: 000000033891c000 CR4: 00000000000006e0
[   34.287654] Stack:
[   34.288339]  0000000000000000 0000000000000000 ffffffffa002e770 ffffffffa017d100
[   34.290258]  fffffffffffffff2 ffff880338a83cf8 ffffffff815f447b ffff880313384248
[   34.301588]  0000000000000000 0000000070757190 ffff880312c49868 ffff880312c49868
[   34.303488] Call Trace:
[   34.304208]  [<ffffffffa002e770>] ? nd_region_remove+0x80/0x80 [libnvdimm]
[   34.305428]  [<ffffffff815f447b>] device_for_each_child+0x5b/0x90
[   34.306566]  [<ffffffffa002e718>] nd_region_remove+0x28/0x80 [libnvdimm]
[   34.307776]  [<ffffffffa002abe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm]
[   34.308983]  [<ffffffff815f9241>] __device_release_driver+0xa1/0x160
[   34.310148]  [<ffffffff815f9325>] device_release_driver+0x25/0x40
[   34.311273]  [<ffffffff815f7a4f>] unbind_store+0x10f/0x160
[   34.312330]  [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30
[   34.313391]  [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60
[   34.314451]  [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0
[   34.315557]  [<ffffffff81288977>] __vfs_write+0x37/0x160
[   34.316610]  [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30
[   34.317704]  [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90
[   34.318805]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   34.319906]  [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0
[   34.321010]  [<ffffffff812890d8>] vfs_write+0xb8/0x1b0
[   34.322029]  [<ffffffff8128a568>] SyS_write+0x58/0xc0
[   34.323039]  [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[   34.324208] Code: 8d 58 f8 f0 41 83 6e 18 01 74 60 49 8b 3c 24 45 31 ed 48 8d 47 30 45 31 ff 49 c7 44 24 08 00 00 00 00 48 39 c3 0f 84 b4 00 00 00 <f6> 03 01 75 70 b8 01 00 00 00 f0 0f c1 43 18 83 c0 01 83 f8 01 
[   34.331684] RIP  [<ffffffff819888ae>] klist_next+0x5e/0x120
[   34.332798]  RSP <ffff880338a83c90>
[   34.333659] ---[ end trace db64022aaca42534 ]---

[-- Attachment #4: patch --]
[-- Type: application/octet-stream, Size: 3906 bytes --]

block: extend the lifetime of disk parent devices

From: Dan Williams <dan.j.williams@intel.com>

Commit df08c32ce3be "block: fix bdi vs gendisk lifetime mismatch" fixed
the case where the bdi is named from a gendisk with a dynamically
allocated devt.  However this leaves the bug in place hole for drivers
using a static devt like sd.  James observes that for scsi we can
address the issue by building on the initial fix to extend the lifetime
of the disk device until the bdi is released.  This effectively makes
the lifetime of a statically allocated devt identical to that of a
dynamically allocated devt.

However, this leaves a hole for block device drivers that use a
statically allocated devt, but do not specify a parent device. Those
drivers if they exist, should move to dynamically allocated devt, or
register a parent device if they are on a hotplug capable bus.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Suggested-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c               |    5 ++++-
 include/linux/backing-dev.h |    2 +-
 mm/backing-dev.c            |   13 +++++++++++--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..4c3ca8424535 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -614,7 +614,7 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 
 	/* Register BDI before referencing it from bdev */
 	bdi = &disk->queue->backing_dev_info;
-	bdi_register_owner(bdi, disk_to_dev(disk));
+	bdi_register_disk(bdi, disk);
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
 			    exact_match, exact_lock, disk);
@@ -1144,6 +1144,9 @@ static void disk_release(struct device *dev)
 	hd_free_part(&disk->part0);
 	if (disk->queue)
 		blk_put_queue(disk->queue);
+
+	/* see bdi_register_disk() */
+	put_device(dev->parent);
 	kfree(disk);
 }
 struct class block_class = {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..df9e1a766157 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,7 +24,7 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner);
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk);
 void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8fde443f36d7..b621c8e8cd68 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -825,8 +825,9 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
 }
 EXPORT_SYMBOL(bdi_register_dev);
 
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk)
 {
+	struct device *owner = disk_to_dev(disk);
 	int rc;
 
 	rc = bdi_register(bdi, NULL, "%u:%u", MAJOR(owner->devt),
@@ -835,9 +836,17 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 		return rc;
 	bdi->owner = owner;
 	get_device(owner);
+
+	/*
+	 * For statically allocated devt disks, like scsi, the disk's
+	 * parent holds the lifetime for the devt.  Prevent the parent
+	 * from releasing the devt for reuse until the disk is released.
+	 */
+	get_device(owner->parent);
+
 	return 0;
 }
-EXPORT_SYMBOL(bdi_register_owner);
+EXPORT_SYMBOL(bdi_register_disk);
 
 /*
  * Remove bdi from bdi_list, and ensure that it is no longer visible

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
  2016-08-12 21:35   ` Bart Van Assche
  2016-08-13  0:17 ` James Bottomley
@ 2016-08-13 12:13 ` Tejun Heo
  2 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2016-08-13 12:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Dave Hansen, Jej B

Hello, Dan.

On Fri, Aug 12, 2016 at 02:29:30PM -0700, Dan Williams wrote:
> Before spending effort trying to flush the destruction of old bdi
> instances before new ones are registered, is it rather time to
> complete the conversion of sd to only use dynamically allocated devt?

I think that probably would be too disruptive to userland, both
programs and humans, that has been seeing the 8:N numbers forever now.
I haven't been following this particular issue but in general I think
we should switch to the regular "a depdendee object is pinned till all
the dependents are gone" pattern.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13  4:57     ` Dan Williams
@ 2016-08-13 15:23       ` James Bottomley
  2016-08-13 16:29         ` Dan Williams
  0 siblings, 1 reply; 23+ messages in thread
From: James Bottomley @ 2016-08-13 15:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:
> On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <
> dan.j.williams@intel.com> wrote:
> > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > > On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
> > > > Before spending effort trying to flush the destruction of old
> > > > bdi
> > > > instances before new ones are registered, is it rather time to
> > > > complete the conversion of sd to only use dynamically allocated
> > > > devt?
> > > 
> > > Do we have to go that far?  Surely your fix is extensible: the
> > > only
> > > reason it doesn't work for us is that the gendisk holds the
> > > parent
> > > without a reference, so we can free the SCSI device before its
> > > child
> > > gendisk (good job no-one actually uses gendisk->parent after
> > > we've
> > > released it ...).  If we fix that it would mean SCSI can't
> > > release the
> > > sdev until after the queue is dead and the bdi namespace
> > > released, so
> > > isn't something like this the easy fix?
> > > 
> > > James
> > > 
> > > ---
> > > 
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index fcd6d4f..54ae4ae 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -514,7 +514,7 @@ static void register_disk(struct device
> > > *parent, struct gendisk *disk)
> > >         struct hd_struct *part;
> > >         int err;
> > > 
> > > -       ddev->parent = parent;
> > > +       ddev->parent = get_device(parent);
> > > 
> > >         dev_set_name(ddev, "%s", disk->disk_name);
> > > 
> > > @@ -1144,6 +1144,7 @@ static void disk_release(struct device
> > > *dev)
> > >         hd_free_part(&disk->part0);
> > >         if (disk->queue)
> > >                 blk_put_queue(disk->queue);
> > > +       put_device(dev->parent);
> > >         kfree(disk);
> > >  }
> > >  struct class block_class = {
> > 
> > Looks ok at first glance to me.
> > 
> > We do hold a reference on the parent device, but it gets dropped at
> > device_unregister() time and this moves it out to the final put.

We do?  Where?

> > However, this does leave static devt block-device-drivers that
> > register a disk without a parent device susceptible to the race... 
> > I think those exist given all the drivers still using add_disk()
> > after commit 52c44d93c26f "block: remove ->driverfs_dev".

It does?  The race is the fact that the parent can be removed before
the child meaning if the parent name is re-registered before the child
dies we get a duplicate name in bdi space.

> So I tried the attached and it makes the libnvdimm unit tests start
> crashing.

Well, the attached is clearly buggy, isn't it?  You're trying to do a
get on the parent before the parent is actually set.  Why don't you
just try the incremental patch I sent instead of trying to rework it?

>   A couple crash logs attached.  Not yet sure what assumption
> is getting violated, but how about that conversion of scsi to use
> dynamic devt? ;-)

It's completely orthogonal.  The problem is in hierarchy lifetimes:
switching from static to dynamic allocation won't change that at all. 
 You don't see this problem in nvme because the parent control device's
lifetime belongs to the controller not the disk.  In SCSI the parent is
our representation of the SCSI device whose lifetime is governed at the
SCSI level and effectively represents the disk.

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13 15:23       ` James Bottomley
@ 2016-08-13 16:29         ` Dan Williams
  2016-08-13 17:43           ` James Bottomley
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2016-08-13 16:29 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]

On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:
>> On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <
>> dan.j.williams@intel.com> wrote:
>> > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
>> > <James.Bottomley@hansenpartnership.com> wrote:
>> > > On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
>> > > > Before spending effort trying to flush the destruction of old
>> > > > bdi
>> > > > instances before new ones are registered, is it rather time to
>> > > > complete the conversion of sd to only use dynamically allocated
>> > > > devt?
>> > >
>> > > Do we have to go that far?  Surely your fix is extensible: the
>> > > only
>> > > reason it doesn't work for us is that the gendisk holds the
>> > > parent
>> > > without a reference, so we can free the SCSI device before its
>> > > child
>> > > gendisk (good job no-one actually uses gendisk->parent after
>> > > we've
>> > > released it ...).  If we fix that it would mean SCSI can't
>> > > release the
>> > > sdev until after the queue is dead and the bdi namespace
>> > > released, so
>> > > isn't something like this the easy fix?
>> > >
>> > > James
>> > >
>> > > ---
>> > >
>> > > diff --git a/block/genhd.c b/block/genhd.c
>> > > index fcd6d4f..54ae4ae 100644
>> > > --- a/block/genhd.c
>> > > +++ b/block/genhd.c
>> > > @@ -514,7 +514,7 @@ static void register_disk(struct device
>> > > *parent, struct gendisk *disk)
>> > >         struct hd_struct *part;
>> > >         int err;
>> > >
>> > > -       ddev->parent = parent;
>> > > +       ddev->parent = get_device(parent);
>> > >
>> > >         dev_set_name(ddev, "%s", disk->disk_name);
>> > >
>> > > @@ -1144,6 +1144,7 @@ static void disk_release(struct device
>> > > *dev)
>> > >         hd_free_part(&disk->part0);
>> > >         if (disk->queue)
>> > >                 blk_put_queue(disk->queue);
>> > > +       put_device(dev->parent);
>> > >         kfree(disk);
>> > >  }
>> > >  struct class block_class = {
>> >
>> > Looks ok at first glance to me.
>> >
>> > We do hold a reference on the parent device, but it gets dropped at
>> > device_unregister() time and this moves it out to the final put.
>
> We do?  Where?

Yes, register_disk() does "ddev->parent = parent" and then
"device_add(ddev)".  device_add() takes the parent reference.

>
>> > However, this does leave static devt block-device-drivers that
>> > register a disk without a parent device susceptible to the race...
>> > I think those exist given all the drivers still using add_disk()
>> > after commit 52c44d93c26f "block: remove ->driverfs_dev".
>
> It does?  The race is the fact that the parent can be removed before
> the child meaning if the parent name is re-registered before the child
> dies we get a duplicate name in bdi space.

No, the race is that the *name* of the parent isn't released until the
child is both unregistered and put.  The device core is already
ensuring that the parent is not released until all descendants have
been removed.

>
>> So I tried the attached and it makes the libnvdimm unit tests start
>> crashing.
>
> Well, the attached is clearly buggy, isn't it?  You're trying to do a
> get on the parent before the parent is actually set.

Ah, yes, thank you.  Fixed up v2 attached that passes my tests.

> Why don't you
> just try the incremental patch I sent instead of trying to rework it?

I reworked it because it is the bdi that holds this extra dependency
on the disk's parent, not the disk itself.

>
>>   A couple crash logs attached.  Not yet sure what assumption
>> is getting violated, but how about that conversion of scsi to use
>> dynamic devt? ;-)
>
> It's completely orthogonal.  The problem is in hierarchy lifetimes:
> switching from static to dynamic allocation won't change that at all.
>  You don't see this problem in nvme because the parent control device's
> lifetime belongs to the controller not the disk.  In SCSI the parent is
> our representation of the SCSI device whose lifetime is governed at the
> SCSI level and effectively represents the disk.
>

No, it's only the name.  We could achieve the same by teaching the
block core to manage the "sd_index_ida" instead of the sd driver
itself, but the v2-patch attached works and does not introduce that
layering violation.

[-- Attachment #2: patch-v2 --]
[-- Type: application/octet-stream, Size: 4786 bytes --]

block: extend the lifetime of disk parent devices

From: Dan Williams <dan.j.williams@intel.com>

Commit df08c32ce3be "block: fix bdi vs gendisk lifetime mismatch" fixed
the case where the bdi is named from a gendisk with a dynamically
allocated devt.  However this leaves the bug in place hole for drivers
using a static devt like sd.  James observes that for scsi we can
address the issue by building on the initial fix to extend the lifetime
of the disk device until the bdi is released.  This effectively makes
the lifetime of a statically allocated devt identical to that of a
dynamically allocated devt.

However, this leaves a hole for block device drivers that use a
statically allocated devt, but do not specify a parent device. Those
drivers if they exist, should move to dynamically allocated devt, or
register a parent device if they are on a hotplug capable bus.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Suggested-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c               |   12 +++++++-----
 include/linux/backing-dev.h |    2 +-
 mm/backing-dev.c            |   13 +++++++++++--
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..845c15d7357a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -506,7 +506,7 @@ static int exact_lock(dev_t devt, void *data)
 	return 0;
 }
 
-static void register_disk(struct device *parent, struct gendisk *disk)
+static void register_disk(struct gendisk *disk)
 {
 	struct device *ddev = disk_to_dev(disk);
 	struct block_device *bdev;
@@ -514,8 +514,6 @@ static void register_disk(struct device *parent, struct gendisk *disk)
 	struct hd_struct *part;
 	int err;
 
-	ddev->parent = parent;
-
 	dev_set_name(ddev, "%s", disk->disk_name);
 
 	/* delay uevents, until we scanned partition table */
@@ -602,6 +600,7 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 		WARN_ON(1);
 		return;
 	}
+	disk_to_dev(disk)->parent = parent;
 	disk_to_dev(disk)->devt = devt;
 
 	/* ->major and ->first_minor aren't supposed to be
@@ -614,11 +613,11 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 
 	/* Register BDI before referencing it from bdev */
 	bdi = &disk->queue->backing_dev_info;
-	bdi_register_owner(bdi, disk_to_dev(disk));
+	bdi_register_disk(bdi, disk);
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
 			    exact_match, exact_lock, disk);
-	register_disk(parent, disk);
+	register_disk(disk);
 	blk_register_queue(disk);
 
 	/*
@@ -1144,6 +1143,9 @@ static void disk_release(struct device *dev)
 	hd_free_part(&disk->part0);
 	if (disk->queue)
 		blk_put_queue(disk->queue);
+
+	/* see bdi_register_disk() */
+	put_device(dev->parent);
 	kfree(disk);
 }
 struct class block_class = {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..df9e1a766157 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,7 +24,7 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner);
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk);
 void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8fde443f36d7..b621c8e8cd68 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -825,8 +825,9 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
 }
 EXPORT_SYMBOL(bdi_register_dev);
 
-int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
+int bdi_register_disk(struct backing_dev_info *bdi, struct gendisk *disk)
 {
+	struct device *owner = disk_to_dev(disk);
 	int rc;
 
 	rc = bdi_register(bdi, NULL, "%u:%u", MAJOR(owner->devt),
@@ -835,9 +836,17 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 		return rc;
 	bdi->owner = owner;
 	get_device(owner);
+
+	/*
+	 * For statically allocated devt disks, like scsi, the disk's
+	 * parent holds the lifetime for the devt.  Prevent the parent
+	 * from releasing the devt for reuse until the disk is released.
+	 */
+	get_device(owner->parent);
+
 	return 0;
 }
-EXPORT_SYMBOL(bdi_register_owner);
+EXPORT_SYMBOL(bdi_register_disk);
 
 /*
  * Remove bdi from bdi_list, and ensure that it is no longer visible

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13 16:29         ` Dan Williams
@ 2016-08-13 17:43           ` James Bottomley
  2016-08-13 18:27             ` Dan Williams
  0 siblings, 1 reply; 23+ messages in thread
From: James Bottomley @ 2016-08-13 17:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
> On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > It does?  The race is the fact that the parent can be removed 
> > before the child meaning if the parent name is re-registered before 
> > the child dies we get a duplicate name in bdi space.
> 
> No, the race is that the *name* of the parent isn't released until 
> the child is both unregistered and put.  The device core is already
> ensuring that the parent is not released until all descendants have
> been removed.

We're both saying the same thing: the problem is that, with
df08c32ce3be the bdi name lifetime is tied to the lifetime of the
gendisk.  However, the parent of the gendisk currently is only tied to
the visibility lifetime of the gendisk, not the final put lifetime, so
it doesn't see this.

> > 
> > > So I tried the attached and it makes the libnvdimm unit tests 
> > > start crashing.
> > 
> > Well, the attached is clearly buggy, isn't it?  You're trying to do 
> > a get on the parent before the parent is actually set.
> 
> Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
> 
> > Why don't you just try the incremental patch I sent instead of 
> > trying to rework it?
> 
> I reworked it because it is the bdi that holds this extra dependency
> on the disk's parent, not the disk itself.

Philosophically I don't like this approach.  The dependency goes 

bdi->gendisk->parent

Making the bdi manage the parent lifetime is an unusual pattern. 
 Making the parent stay around until the last reference to gendisk is
put is the usual one.

> > >   A couple crash logs attached.  Not yet sure what assumption
> > > is getting violated, but how about that conversion of scsi to use
> > > dynamic devt? ;-)
> > 
> > It's completely orthogonal.  The problem is in hierarchy lifetimes:
> > switching from static to dynamic allocation won't change that at 
> > all.  You don't see this problem in nvme because the parent control
> > device's lifetime belongs to the controller not the disk.  In SCSI 
> > the parent is our representation of the SCSI device whose lifetime 
> > is governed at the SCSI level and effectively represents the disk.
> > 
> 
> No, it's only the name.  We could achieve the same by teaching the
> block core to manage the "sd_index_ida" instead of the sd driver
> itself, but the v2-patch attached works and does not introduce that
> layering violation.

Um, so this patch doesn't fix the problem. It merely makes the lifetime
rules correct so the problem can then be fixed at the scsi level.

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13 17:43           ` James Bottomley
@ 2016-08-13 18:27             ` Dan Williams
  2016-08-13 20:38               ` Dan Williams
  2016-08-14 17:20               ` James Bottomley
  0 siblings, 2 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-13 18:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
>> On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > It does?  The race is the fact that the parent can be removed
>> > before the child meaning if the parent name is re-registered before
>> > the child dies we get a duplicate name in bdi space.
>>
>> No, the race is that the *name* of the parent isn't released until
>> the child is both unregistered and put.  The device core is already
>> ensuring that the parent is not released until all descendants have
>> been removed.
>
> We're both saying the same thing: the problem is that, with
> df08c32ce3be the bdi name lifetime is tied to the lifetime of the
> gendisk.  However, the parent of the gendisk currently is only tied to
> the visibility lifetime of the gendisk, not the final put lifetime, so
> it doesn't see this.
>
>> >
>> > > So I tried the attached and it makes the libnvdimm unit tests
>> > > start crashing.
>> >
>> > Well, the attached is clearly buggy, isn't it?  You're trying to do
>> > a get on the parent before the parent is actually set.
>>
>> Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
>>
>> > Why don't you just try the incremental patch I sent instead of
>> > trying to rework it?
>>
>> I reworked it because it is the bdi that holds this extra dependency
>> on the disk's parent, not the disk itself.
>
> Philosophically I don't like this approach.  The dependency goes
>
> bdi->gendisk->parent

I'm arguing that there is no bdi->gendisk dependency.

The dependency is:

bdi->devt

It just so happens that block-dynamic devt is released in disk_release().

> Making the bdi manage the parent lifetime is an unusual pattern.
>  Making the parent stay around until the last reference to gendisk is
> put is the usual one.

What's unusual is the bdi's dependency on the allocated name, not the
gendisk itself.

>> > >   A couple crash logs attached.  Not yet sure what assumption
>> > > is getting violated, but how about that conversion of scsi to use
>> > > dynamic devt? ;-)
>> >
>> > It's completely orthogonal.  The problem is in hierarchy lifetimes:
>> > switching from static to dynamic allocation won't change that at
>> > all.  You don't see this problem in nvme because the parent control
>> > device's lifetime belongs to the controller not the disk.  In SCSI
>> > the parent is our representation of the SCSI device whose lifetime
>> > is governed at the SCSI level and effectively represents the disk.
>> >
>>
>> No, it's only the name.  We could achieve the same by teaching the
>> block core to manage the "sd_index_ida" instead of the sd driver
>> itself, but the v2-patch attached works and does not introduce that
>> layering violation.
>
> Um, so this patch doesn't fix the problem. It merely makes the lifetime
> rules correct so the problem can then be fixed at the scsi level.

You're right that this patch does not fix the problem, I missed that
the scsi_disk is not the parent of the gendisk, so this patch does
nothing to delay scsi_disk_release.  What I think is the real fix is
to make the devt properly reference counted and prevent
ida_remove(&sd_index_ida, sdkp->index); from being called until all
objects derived from that allocation are done with it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13 18:27             ` Dan Williams
@ 2016-08-13 20:38               ` Dan Williams
  2016-08-14 17:20               ` James Bottomley
  1 sibling, 0 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-13 20:38 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

On Sat, Aug 13, 2016 at 11:27 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
[..]
>> Um, so this patch doesn't fix the problem. It merely makes the lifetime
>> rules correct so the problem can then be fixed at the scsi level.
>
> You're right that this patch does not fix the problem, I missed that
> the scsi_disk is not the parent of the gendisk, so this patch does
> nothing to delay scsi_disk_release.  What I think is the real fix is
> to make the devt properly reference counted and prevent
> ida_remove(&sd_index_ida, sdkp->index); from being called until all
> objects derived from that allocation are done with it.

So, here's an RFC along these lines (only compile tested).   This
approach provides a generic way to coordinate the lifetime of the bdi
name versus the devt allocated by the block device driver.

[-- Attachment #2: patch-v3 --]
[-- Type: application/octet-stream, Size: 8923 bytes --]

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..08f3e709a8d1 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -396,6 +396,19 @@ static int blk_mangle_minor(int minor)
 	return minor;
 }
 
+static void release_blk_ext_minor(int minor)
+{
+	spin_lock_bh(&ext_devt_lock);
+	idr_remove(&ext_devt_idr, blk_mangle_minor(minor));
+	spin_unlock_bh(&ext_devt_lock);
+}
+
+static void blk_disk_devt_release(struct disk_devt *disk_devt)
+{
+	release_blk_ext_minor(disk_devt->first_minor);
+	kfree(disk_devt);
+}
+
 /**
  * blk_alloc_devt - allocate a dev_t for a partition
  * @part: partition to allocate dev_t for
@@ -413,6 +426,7 @@ static int blk_mangle_minor(int minor)
 int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 {
 	struct gendisk *disk = part_to_disk(part);
+	struct disk_devt *disk_devt = NULL;
 	int idx;
 
 	/* in consecutive minor range? */
@@ -421,6 +435,13 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 		return 0;
 	}
 
+	if (part == &disk->part0) {
+		WARN_ON(disk->disk_devt);
+		disk_devt = kzalloc(sizeof(*disk_devt), GFP_KERNEL);
+		if (!disk_devt)
+			return -ENOMEM;
+	}
+
 	/* allocate ext devt */
 	idr_preload(GFP_KERNEL);
 
@@ -429,8 +450,17 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 	spin_unlock_bh(&ext_devt_lock);
 
 	idr_preload_end();
-	if (idx < 0)
+	if (idx < 0) {
+		kfree(disk_devt);
 		return idx == -ENOSPC ? -EBUSY : idx;
+	}
+
+	if (disk_devt) {
+		disk_devt->major = BLOCK_EXT_MAJOR;
+		disk_devt->first_minor = blk_mangle_minor(idx);
+		disk_devt->release = blk_disk_devt_release;
+		kref_init(&disk_devt->kref);
+	}
 
 	*devt = MKDEV(BLOCK_EXT_MAJOR, blk_mangle_minor(idx));
 	return 0;
@@ -450,11 +480,8 @@ void blk_free_devt(dev_t devt)
 	if (devt == MKDEV(0, 0))
 		return;
 
-	if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
-		spin_lock_bh(&ext_devt_lock);
-		idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
-		spin_unlock_bh(&ext_devt_lock);
-	}
+	if (MAJOR(devt) == BLOCK_EXT_MAJOR)
+		release_blk_ext_minor(MINOR(devt));
 }
 
 static char *bdevt_str(dev_t devt, char *buf)
@@ -588,6 +615,13 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 	dev_t devt;
 	int retval;
 
+	/* temporary while we convert drivers to use struct disk_devt */
+	if (disk->disk_devt) {
+		disk->minors = disk->disk_devt->minors;
+		disk->first_minor = disk->disk_devt->first_minor;
+		disk->major = disk->disk_devt->major;
+	}
+
 	/* minors == 0 indicates to use ext devt from part0 and should
 	 * be accompanied with EXT_DEVT flag.  Make sure all
 	 * parameters make sense.
@@ -1137,7 +1171,7 @@ static void disk_release(struct device *dev)
 {
 	struct gendisk *disk = dev_to_disk(dev);
 
-	blk_free_devt(dev->devt);
+	put_disk_devt(disk->disk_devt);
 	disk_release_events(disk);
 	kfree(disk->random);
 	disk_replace_part_tbl(disk, NULL);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852ad5aa3..f7fb4f42a0b6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2961,6 +2961,8 @@ static int sd_format_disk_name(char *prefix, int index, char *buf, int buflen)
 static void sd_probe_async(void *data, async_cookie_t cookie)
 {
 	struct scsi_disk *sdkp = data;
+	struct scsi_disk_devt *sdevt;
+	struct disk_devt *disk_devt;
 	struct scsi_device *sdp;
 	struct gendisk *gd;
 	u32 index;
@@ -2968,12 +2970,14 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 
 	sdp = sdkp->device;
 	gd = sdkp->disk;
-	index = sdkp->index;
+	disk_devt = gd->disk_devt;
+	sdevt = to_sdevt(disk_devt);
+	index = sdevt->index;;
 	dev = &sdp->sdev_gendev;
 
-	gd->major = sd_major((index & 0xf0) >> 4);
-	gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
-	gd->minors = SD_MINORS;
+	disk_devt->major = sd_major((index & 0xf0) >> 4);
+	disk_devt->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
+	disk_devt->minors = SD_MINORS;
 
 	gd->fops = &sd_fops;
 	gd->private_data = &sdkp->driver;
@@ -3012,6 +3016,16 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	put_device(&sdkp->dev);
 }
 
+static void scsi_devt_release(struct disk_devt *disk_devt)
+{
+	struct scsi_disk_devt *sdevt = to_sdevt(disk_devt);
+
+	spin_lock(&sd_index_lock);
+	ida_remove(&sd_index_ida, sdevt->index);
+	spin_unlock(&sd_index_lock);
+	kfree(sdevt);
+}
+
 /**
  *	sd_probe - called during driver initialization and whenever a
  *	new scsi device is attached to the system. It is called once
@@ -3033,6 +3047,8 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 static int sd_probe(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
+	struct scsi_disk_devt *sdevt;
+	struct disk_devt *disk_devt;
 	struct scsi_disk *sdkp;
 	struct gendisk *gd;
 	int index;
@@ -3055,9 +3071,13 @@ static int sd_probe(struct device *dev)
 	if (!gd)
 		goto out_free;
 
+	sdevt = kzalloc(sizeof(*sdevt), GFP_KERNEL);
+	if (!sdevt)
+		goto out_put;
+
 	do {
 		if (!ida_pre_get(&sd_index_ida, GFP_KERNEL))
-			goto out_put;
+			goto out_free_sdevt;
 
 		spin_lock(&sd_index_lock);
 		error = ida_get_new(&sd_index_ida, &index);
@@ -3066,7 +3086,7 @@ static int sd_probe(struct device *dev)
 
 	if (error) {
 		sdev_printk(KERN_WARNING, sdp, "sd_probe: memory exhausted.\n");
-		goto out_put;
+		goto out_free_sdevt;
 	}
 
 	error = sd_format_disk_name("sd", index, gd->disk_name, DISK_NAME_LEN);
@@ -3075,10 +3095,16 @@ static int sd_probe(struct device *dev)
 		goto out_free_index;
 	}
 
+	sdevt->index = index;
+	disk_devt = &sdevt->disk_devt;
+	disk_devt->release = scsi_devt_release;
+	kref_init(&disk_devt->kref);
+	gd->disk_devt = disk_devt;
+	get_disk_devt(disk_devt);
+	
 	sdkp->device = sdp;
 	sdkp->driver = &sd_template;
 	sdkp->disk = gd;
-	sdkp->index = index;
 	atomic_set(&sdkp->openers, 0);
 	atomic_set(&sdkp->device->ioerr_cnt, 0);
 
@@ -3111,6 +3137,8 @@ static int sd_probe(struct device *dev)
 	spin_lock(&sd_index_lock);
 	ida_remove(&sd_index_ida, index);
 	spin_unlock(&sd_index_lock);
+ out_free_sdevt:
+	kfree(sdevt);
  out_put:
 	put_disk(gd);
  out_free:
@@ -3171,10 +3199,7 @@ static void scsi_disk_release(struct device *dev)
 	struct scsi_disk *sdkp = to_scsi_disk(dev);
 	struct gendisk *disk = sdkp->disk;
 	
-	spin_lock(&sd_index_lock);
-	ida_remove(&sd_index_ida, sdkp->index);
-	spin_unlock(&sd_index_lock);
-
+	put_disk_devt(disk->disk_devt);
 	disk->private_data = NULL;
 	put_disk(disk);
 	put_device(&sdkp->device->sdev_gendev);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 765a6f1ac1b7..863ade230f3f 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -59,6 +59,16 @@ enum {
 	SD_LBP_DISABLE,		/* Discard disabled due to failed cmd */
 };
 
+struct scsi_disk_devt {
+	struct disk_devt disk_devt;
+	u32 index;
+};
+
+static inline struct scsi_disk_devt *to_sdevt(struct disk_devt *disk_devt)
+{
+	return container_of(disk_devt, struct scsi_disk_devt, disk_devt);
+}
+
 struct scsi_disk {
 	struct scsi_driver *driver;	/* always &sd_template */
 	struct scsi_device *device;
@@ -72,7 +82,6 @@ struct scsi_disk {
 	u32		max_unmap_blocks;
 	u32		unmap_granularity;
 	u32		unmap_alignment;
-	u32		index;
 	unsigned int	physical_block_size;
 	unsigned int	max_medium_access_timeouts;
 	unsigned int	medium_access_timed_out;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 1dbf52f9c24b..8ba8db005ed6 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -177,15 +177,51 @@ struct blk_integrity {
 
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
 
-struct gendisk {
-	/* major, first_minor and minors are input parameters only,
-	 * don't use directly.  Use disk_devt() and disk_max_parts().
-	 */
+struct disk_devt {
 	int major;			/* major number of driver */
 	int first_minor;
 	int minors;                     /* maximum number of minors, =1 for
                                          * disks that can't be partitioned. */
+	struct kref kref;
+	void (*release)(struct disk_devt *);
+};
+
+static inline void disk_devt_release(struct kref *kref)
+{
+	struct disk_devt *disk_devt;
+
+	disk_devt = container_of(kref, struct disk_devt, kref);
+	disk_devt->release(disk_devt);
+}
+
+static inline void put_disk_devt(struct disk_devt *disk_devt)
+{
+	if (disk_devt)
+		kref_put(&disk_devt->kref, disk_devt_release);
+}
 
+static inline struct disk_devt *get_disk_devt(struct disk_devt *disk_devt)
+{
+	if (disk_devt)
+		kref_get(&disk_devt->kref);
+	return disk_devt;
+}
+
+struct gendisk {
+	/*
+	 * major, first_minor and minors are deprecated, use ->disk_devt to
+	 * communicate the devt to the block core and use disk_devt() and
+	 * disk_max_parts() to retrieve these values.
+	 */
+	int major;
+	int first_minor;
+	int minors; 
+
+	/*
+	 * disk_devt is an optional input parameter for drivers that do not use
+	 * a devt allocated by the block core.
+	 */
+	struct disk_devt *disk_devt;
 	char disk_name[DISK_NAME_LEN];	/* name of major driver */
 	char *(*devnode)(struct gendisk *gd, umode_t *mode);
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-13 18:27             ` Dan Williams
  2016-08-13 20:38               ` Dan Williams
@ 2016-08-14 17:20               ` James Bottomley
  2016-08-14 18:08                 ` Dan Williams
  2016-08-29 18:16                   ` Bart Van Assche
  1 sibling, 2 replies; 23+ messages in thread
From: James Bottomley @ 2016-08-14 17:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Sat, 2016-08-13 at 11:27 -0700, Dan Williams wrote:
> On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
> > > On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
> > > <James.Bottomley@hansenpartnership.com> wrote:
> > > > It does?  The race is the fact that the parent can be removed
> > > > before the child meaning if the parent name is re-registered 
> > > > before the child dies we get a duplicate name in bdi space.
> > > 
> > > No, the race is that the *name* of the parent isn't released 
> > > until the child is both unregistered and put.  The device core is
> > > already ensuring that the parent is not released until all 
> > > descendants have been removed.
> > 
> > We're both saying the same thing: the problem is that, with
> > df08c32ce3be the bdi name lifetime is tied to the lifetime of the
> > gendisk.  However, the parent of the gendisk currently is only tied 
> > to the visibility lifetime of the gendisk, not the final put 
> > lifetime, so it doesn't see this.
> > 
> > > > 
> > > > > So I tried the attached and it makes the libnvdimm unit tests
> > > > > start crashing.
> > > > 
> > > > Well, the attached is clearly buggy, isn't it?  You're trying 
> > > > to do a get on the parent before the parent is actually set.
> > > 
> > > Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
> > > 
> > > > Why don't you just try the incremental patch I sent instead of
> > > > trying to rework it?
> > > 
> > > I reworked it because it is the bdi that holds this extra 
> > > dependency on the disk's parent, not the disk itself.
> > 
> > Philosophically I don't like this approach.  The dependency goes
> > 
> > bdi->gendisk->parent
> 
> I'm arguing that there is no bdi->gendisk dependency.

You created one with your bdi->owner field.  Just because you didn't
call it a parent doesn't mean it wasn't one.  Arguably the whole bdi
thing is strangely done because gendisk treats it like a class and
that's how it behaves, it just doesn't have a proper class structure
(which is why gendisk creates the link that would be done by the class
infrastructure)

> The dependency is:
> 
> bdi->devt

devt isn't a device (in the struct device sense).  It exists as
effectively an embedded component of the bdi.  As far as I can tell
there's no reason for it to be separately allocated, it could be
properly embedded as is the normal pattern.

> It just so happens that block-dynamic devt is released in
> disk_release().
> 
> > Making the bdi manage the parent lifetime is an unusual pattern.
> >  Making the parent stay around until the last reference to gendisk 
> > is put is the usual one.
> 
> What's unusual is the bdi's dependency on the allocated name, not the
> gendisk itself.

A name is just a resource belonging to an object.  The object it
belongs to is the bdi and the bdi is parented by the owner field (and a
hokey link) to the gendisk.

> > > > >   A couple crash logs attached.  Not yet sure what assumption
> > > > > is getting violated, but how about that conversion of scsi to 
> > > > > use dynamic devt? ;-)
> > > > 
> > > > It's completely orthogonal.  The problem is in hierarchy 
> > > > lifetimes: switching from static to dynamic allocation won't 
> > > > change that at all.  You don't see this problem in nvme because 
> > > > the parent control device's lifetime belongs to the controller 
> > > > not the disk.  In SCSI the parent is our representation of the 
> > > > SCSI device whose lifetime is governed at the SCSI level and 
> > > > effectively represents the disk.
> > > > 
> > > 
> > > No, it's only the name.  We could achieve the same by teaching 
> > > the block core to manage the "sd_index_ida" instead of the sd 
> > > driver itself, but the v2-patch attached works and does not 
> > > introduce that layering violation.
> > 
> > Um, so this patch doesn't fix the problem. It merely makes the 
> > lifetime rules correct so the problem can then be fixed at the scsi
> > level.
> 
> You're right that this patch does not fix the problem, I missed that
> the scsi_disk is not the parent of the gendisk, so this patch does
> nothing to delay scsi_disk_release.  What I think is the real fix is
> to make the devt properly reference counted and prevent
> ida_remove(&sd_index_ida, sdkp->index); from being called until all
> objects derived from that allocation are done with it.

OK, this is another philosophical difference, I suppose: since bdi is
already so complex and non-standard, I really don't think adding more
non standard stuff is a good idea.  The simplest way to fix it is

   1. The two line patch I already sent to make the bdi hold the owner
      ->parent until release
   2. Parent the gendisk to scsi_disk->dev.  The name release is already
      in the correct place, so this is a 3 line patch.

These are established patterns, so they're both understandable to
anyone who reads the code.  The answer to any other BDI lifetime
problem is to free the name in the parent release.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..222771d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	}
 
 	blk_pm_runtime_init(sdp->request_queue, dev);
-	device_add_disk(dev, gd);
+	/*
+	 * previously the parent of the gendisk was the scsi device.  It
+	 * was moved to fix lifetime rules, so now we install a symlink
+	 * to the new location of the block class directory
+	 */
+	device_add_disk(&sdkp->dev, gd);
+	WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
 	if (sdkp->capacity)
 		sd_dif_config_host(sdkp);
 
@@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
 
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
+	sysfs_remove_link(&dev->kobj, "block");
 	device_del(&sdkp->dev);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-14 17:20               ` James Bottomley
@ 2016-08-14 18:08                 ` Dan Williams
  2016-08-14 18:23                   ` Dan Williams
  2016-08-29 18:16                   ` Bart Van Assche
  1 sibling, 1 reply; 23+ messages in thread
From: Dan Williams @ 2016-08-14 18:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Sun, Aug 14, 2016 at 10:20 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Sat, 2016-08-13 at 11:27 -0700, Dan Williams wrote:
>> On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
>> > > On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
>> > > <James.Bottomley@hansenpartnership.com> wrote:
>> > > > It does?  The race is the fact that the parent can be removed
>> > > > before the child meaning if the parent name is re-registered
>> > > > before the child dies we get a duplicate name in bdi space.
>> > >
>> > > No, the race is that the *name* of the parent isn't released
>> > > until the child is both unregistered and put.  The device core is
>> > > already ensuring that the parent is not released until all
>> > > descendants have been removed.
>> >
>> > We're both saying the same thing: the problem is that, with
>> > df08c32ce3be the bdi name lifetime is tied to the lifetime of the
>> > gendisk.  However, the parent of the gendisk currently is only tied
>> > to the visibility lifetime of the gendisk, not the final put
>> > lifetime, so it doesn't see this.
>> >
>> > > >
>> > > > > So I tried the attached and it makes the libnvdimm unit tests
>> > > > > start crashing.
>> > > >
>> > > > Well, the attached is clearly buggy, isn't it?  You're trying
>> > > > to do a get on the parent before the parent is actually set.
>> > >
>> > > Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
>> > >
>> > > > Why don't you just try the incremental patch I sent instead of
>> > > > trying to rework it?
>> > >
>> > > I reworked it because it is the bdi that holds this extra
>> > > dependency on the disk's parent, not the disk itself.
>> >
>> > Philosophically I don't like this approach.  The dependency goes
>> >
>> > bdi->gendisk->parent
>>
>> I'm arguing that there is no bdi->gendisk dependency.
>
> You created one with your bdi->owner field.  Just because you didn't
> call it a parent doesn't mean it wasn't one.  Arguably the whole bdi
> thing is strangely done because gendisk treats it like a class and
> that's how it behaves, it just doesn't have a proper class structure
> (which is why gendisk creates the link that would be done by the class
> infrastructure)
>
>> The dependency is:
>>
>> bdi->devt
>
> devt isn't a device (in the struct device sense).  It exists as
> effectively an embedded component of the bdi.  As far as I can tell
> there's no reason for it to be separately allocated, it could be
> properly embedded as is the normal pattern.
>
>> It just so happens that block-dynamic devt is released in
>> disk_release().
>>
>> > Making the bdi manage the parent lifetime is an unusual pattern.
>> >  Making the parent stay around until the last reference to gendisk
>> > is put is the usual one.
>>
>> What's unusual is the bdi's dependency on the allocated name, not the
>> gendisk itself.
>
> A name is just a resource belonging to an object.  The object it
> belongs to is the bdi and the bdi is parented by the owner field (and a
> hokey link) to the gendisk.
>
>> > > > >   A couple crash logs attached.  Not yet sure what assumption
>> > > > > is getting violated, but how about that conversion of scsi to
>> > > > > use dynamic devt? ;-)
>> > > >
>> > > > It's completely orthogonal.  The problem is in hierarchy
>> > > > lifetimes: switching from static to dynamic allocation won't
>> > > > change that at all.  You don't see this problem in nvme because
>> > > > the parent control device's lifetime belongs to the controller
>> > > > not the disk.  In SCSI the parent is our representation of the
>> > > > SCSI device whose lifetime is governed at the SCSI level and
>> > > > effectively represents the disk.
>> > > >
>> > >
>> > > No, it's only the name.  We could achieve the same by teaching
>> > > the block core to manage the "sd_index_ida" instead of the sd
>> > > driver itself, but the v2-patch attached works and does not
>> > > introduce that layering violation.
>> >
>> > Um, so this patch doesn't fix the problem. It merely makes the
>> > lifetime rules correct so the problem can then be fixed at the scsi
>> > level.
>>
>> You're right that this patch does not fix the problem, I missed that
>> the scsi_disk is not the parent of the gendisk, so this patch does
>> nothing to delay scsi_disk_release.  What I think is the real fix is
>> to make the devt properly reference counted and prevent
>> ida_remove(&sd_index_ida, sdkp->index); from being called until all
>> objects derived from that allocation are done with it.
>
> OK, this is another philosophical difference, I suppose: since bdi is
> already so complex and non-standard, I really don't think adding more
> non standard stuff is a good idea.  The simplest way to fix it is
>
>    1. The two line patch I already sent to make the bdi hold the owner
>       ->parent until release
>    2. Parent the gendisk to scsi_disk->dev.  The name release is already
>       in the correct place, so this is a 3 line patch.
>
> These are established patterns, so they're both understandable to
> anyone who reads the code.  The answer to any other BDI lifetime
> problem is to free the name in the parent release.
>
> James
>
> ---
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d3e852a..222771d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
>         }
>
>         blk_pm_runtime_init(sdp->request_queue, dev);
> -       device_add_disk(dev, gd);
> +       /*
> +        * previously the parent of the gendisk was the scsi device.  It
> +        * was moved to fix lifetime rules, so now we install a symlink
> +        * to the new location of the block class directory
> +        */
> +       device_add_disk(&sdkp->dev, gd);
> +       WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
>         if (sdkp->capacity)
>                 sd_dif_config_host(sdkp);
>
> @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
>
>         async_synchronize_full_domain(&scsi_sd_pm_domain);
>         async_synchronize_full_domain(&scsi_sd_probe_domain);
> +       sysfs_remove_link(&dev->kobj, "block");
>         device_del(&sdkp->dev);
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);

I like it.  I still think the bdi registration code should be in
charge of taking the extra reference on the disk device's parent to
isolate / make clear why the extra reference is being acquired, but I
won't lose sleep if Jens takes your smaller change.

Thanks James!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-14 18:08                 ` Dan Williams
@ 2016-08-14 18:23                   ` Dan Williams
  2016-08-15 20:11                     ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2016-08-14 18:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen, Bart Van Assche

[ adding Bart back to the cc ]

On Sun, Aug 14, 2016 at 11:08 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Aug 14, 2016 at 10:20 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
[..]
> I like it.  I still think the bdi registration code should be in
> charge of taking the extra reference on the disk device's parent to
> isolate / make clear why the extra reference is being acquired, but I
> won't lose sleep if Jens takes your smaller change.
>
> Thanks James!

Bart, do you have a test configuration already set up for this case.
Can you give the 2 patches from James a try?

https://patchwork.kernel.org/patch/9278201/
https://patchwork.kernel.org/patch/9279513/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-14 18:23                   ` Dan Williams
@ 2016-08-15 20:11                     ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-15 20:11 UTC (permalink / raw)
  To: Dan Williams, James Bottomley
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On 08/14/2016 11:23 AM, Dan Williams wrote:
> [ adding Bart back to the cc ]
>
> On Sun, Aug 14, 2016 at 11:08 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Sun, Aug 14, 2016 at 10:20 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
> [..]
>> I like it.  I still think the bdi registration code should be in
>> charge of taking the extra reference on the disk device's parent to
>> isolate / make clear why the extra reference is being acquired, but I
>> won't lose sleep if Jens takes your smaller change.
>>
>> Thanks James!
>
> Bart, do you have a test configuration already set up for this case.
> Can you give the 2 patches from James a try?
>
> https://patchwork.kernel.org/patch/9278201/
> https://patchwork.kernel.org/patch/9279513/

Hello Dan,

The "sysfs: cannot create duplicate filename" issue is something I ran 
into sporadically before I started using my patch that fixes this issue. 
I have not yet found a systematic way to trigger this behavior. Anyway, 
I will drop my patch and will start testing James' two patches. It will 
take a few days to test these patches thoroughly.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-14 17:20               ` James Bottomley
@ 2016-08-29 18:16                   ` Bart Van Assche
  2016-08-29 18:16                   ` Bart Van Assche
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-29 18:16 UTC (permalink / raw)
  To: James Bottomley, Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

T24gMDgvMTQvMjAxNiAxMDoyMSBBTSwgSmFtZXMgQm90dG9tbGV5IHdyb3RlOg0KPiBkaWZm
IC0tZ2l0IGEvZHJpdmVycy9zY3NpL3NkLmMgYi9kcml2ZXJzL3Njc2kvc2QuYw0KPiBpbmRl
eCBkM2U4NTJhLi4yMjI3NzFkIDEwMDY0NA0KPiAtLS0gYS9kcml2ZXJzL3Njc2kvc2QuYw0K
PiArKysgYi9kcml2ZXJzL3Njc2kvc2QuYw0KPiBAQCAtMzAwMCw3ICszMDAwLDEzIEBAIHN0
YXRpYyB2b2lkIHNkX3Byb2JlX2FzeW5jKHZvaWQgKmRhdGEsIGFzeW5jX2Nvb2tpZV90IGNv
b2tpZSkNCj4gIAl9DQo+ICANCj4gIAlibGtfcG1fcnVudGltZV9pbml0KHNkcC0+cmVxdWVz
dF9xdWV1ZSwgZGV2KTsNCj4gLQlkZXZpY2VfYWRkX2Rpc2soZGV2LCBnZCk7DQo+ICsJLyoN
Cj4gKwkgKiBwcmV2aW91c2x5IHRoZSBwYXJlbnQgb2YgdGhlIGdlbmRpc2sgd2FzIHRoZSBz
Y3NpIGRldmljZS4gIEl0DQo+ICsJICogd2FzIG1vdmVkIHRvIGZpeCBsaWZldGltZSBydWxl
cywgc28gbm93IHdlIGluc3RhbGwgYSBzeW1saW5rDQo+ICsJICogdG8gdGhlIG5ldyBsb2Nh
dGlvbiBvZiB0aGUgYmxvY2sgY2xhc3MgZGlyZWN0b3J5DQo+ICsJICovDQo+ICsJZGV2aWNl
X2FkZF9kaXNrKCZzZGtwLT5kZXYsIGdkKTsNCj4gKwlXQVJOX09OKHN5c2ZzX2FkZF9saW5r
X3RvX2dyb3VwKCZkZXYtPmtvYmosICJibG9jayIsICZzZGtwLT5kZXYua29iaiwgImJsb2Nr
IikpOw0KPiAgCWlmIChzZGtwLT5jYXBhY2l0eSkNCj4gIAkJc2RfZGlmX2NvbmZpZ19ob3N0
KHNka3ApOw0KPiAgDQo+IEBAIC0zMTQyLDYgKzMxNDgsNyBAQCBzdGF0aWMgaW50IHNkX3Jl
bW92ZShzdHJ1Y3QgZGV2aWNlICpkZXYpDQo+ICANCj4gIAlhc3luY19zeW5jaHJvbml6ZV9m
dWxsX2RvbWFpbigmc2NzaV9zZF9wbV9kb21haW4pOw0KPiAgCWFzeW5jX3N5bmNocm9uaXpl
X2Z1bGxfZG9tYWluKCZzY3NpX3NkX3Byb2JlX2RvbWFpbik7DQo+ICsJc3lzZnNfcmVtb3Zl
X2xpbmsoJmRldi0+a29iaiwgImJsb2NrIik7DQo+ICAJZGV2aWNlX2RlbCgmc2RrcC0+ZGV2
KTsNCj4gIAlkZWxfZ2VuZGlzayhzZGtwLT5kaXNrKTsNCj4gIAlzZF9zaHV0ZG93bihkZXYp
Ow0KDQpIZWxsbyBKYW1lcywNCg0KU29ycnkgdGhhdCBpdCB0b29rIHNvIGxvbmcgYmVmb3Jl
IEkgY291bGQgdGVzdCB0aGlzIHBhdGNoIGFuZA0KdGhlIHByZXZpb3VzIHBhdGNoIHRoYXQg
d2FzIHBvc3RlZCBpbiB0aGlzIGUtbWFpbCB0aHJlYWQuIEJ1dCBJDQpkaWQgc28gZWFybGll
ciB0aGlzIG1vcm5pbmcuIFdoYXQgSSBzZWUgaXMgdGhhdCB0aGUgZm9sbG93aW5nDQp3YXJu
aW5nIG1lc3NhZ2UgaXMgcmVwb3J0ZWQgZnJlcXVlbnRseToNCg0KV0FSTklORzogQ1BVOiAx
IFBJRDogMTM2IGF0IGRyaXZlcnMvc2NzaS9zZC5jOjMwMDkgc2RfcHJvYmVfYXN5bmMrMHgx
Y2UvMHgxZTANCk1vZHVsZXMgbGlua2VkIGluOiBpYl9zcnAgbGliY3JjMzJjIHNjc2lfdHJh
bnNwb3J0X3NycCB0YXJnZXRfY29yZV9wc2NzaSB0YXJnZXRfY29yZV9maWxlIGliX3NycHQg
dGFyZ2V0X2NvcmVfaWJsb2NrIHRhcmdldF9jb3JlX21vZCBicmQgZG1fbXVsdGlwYXRoIHNj
c2lfZGhfcmRhYyBzY3NpX2RoX2VtYyBzY3NpX2RoX2FsdWEgbmV0Y29uc29sZSB4dF9DSEVD
S1NVTSBpcHRhYmxlX21hbmdsZSBpcHRfTUFTUVVFUkFERSBuZl9uYXRfbWFzcXVlcmFkZV9p
cHY0IGlwdGFibGVfbmF0IG5mX25hdF9pcHY0IG5mX25hdCBuZl9jb25udHJhY2tfaXB2NCBu
Zl9kZWZyYWdfaXB2NCB4dF9jb25udHJhY2sgbmZfY29ubnRyYWNrIGlwdF9SRUpFQ1QgeHRf
dGNwdWRwIHR1biBlYnRhYmxlX2ZpbHRlciBlYnRhYmxlcyBpcDZ0YWJsZV9maWx0ZXIgaXA2
X3RhYmxlcyBpcHRhYmxlX2ZpbHRlciBpcF90YWJsZXMgeF90YWJsZXMgYWZfcGFja2V0IGJy
aWRnZSBzdHAgbGxjIGlzY3NpX2liZnQgaXNjc2lfYm9vdF9zeXNmcyBpYl9pcG9pYiByZG1h
X3VjbSBpYl91Y20gaWJfdXZlcmJzIGliX3VtYWQgbXNyIHJkbWFfY20gY29uZmlnZnMgaWJf
Y20gaXdfY20gbWx4NF9pYiBpYl9jb3JlIHNiX2VkYWMgZWRhY19jb3JlIHg4Nl9wa2dfdGVt
cF90aGVybWFsIGludGVsX3Bvd2VyY2xhbXAgY29yZXRlbXAga3ZtX2ludGVsIGt2bSBkbV9t
b2QgaXJxYnlwYXNzIGlwbWlfc3NpZiBjcmN0MTBkaWZfcGNsbXVsIGlwbWlfZGV2aW50ZiBj
cmMzMl9wY2xtdWwgZ2hhc2hfY2xtdWxuaV9pbnRlbCBhZXNuaV9pbnRlbCBpVENPX3dkdCBh
ZXNfeDg2XzY0IGxydyBpVENPX3ZlbmRvcl9zdXBwb3J0IGRjZGJhcyBnZjEyOG11bCB0ZzMg
bWx4NF9jb3JlIGdsdWVfaGVscGVyIGFibGtfaGVscGVyIHB0cCBjcnlwdGQgZmplcyBpcG1p
X3NpIHBjc3BrciBwcHNfY29yZSBsaWJwaHkgaXBtaV9tc2doYW5kbGVyIG1laV9tZSB0cG1f
dGlzIHRwbV90aXNfY29yZSBtZWkgdHBtIGxwY19pY2ggc2hwY2hwIG1mZF9jb3JlIHdtaSBi
dXR0b24gaGlkX2dlbmVyaWMgdXNiaGlkIG1nYWcyMDAgaTJjX2FsZ29fYml0IGRybV9rbXNf
aGVscGVyIHN5c2NvcHlhcmVhIHN5c2ZpbGxyZWN0IHN5c2ltZ2JsdCBmYl9zeXNfZm9wcyB0
dG0gZHJtIHNyX21vZCBjZHJvbSBjcmMzMmNfaW50ZWwgZWhjaV9wY2kgZWhjaV9oY2QgdXNi
Y29yZSB1c2JfY29tbW9uIHNnIFtsYXN0IHVubG9hZGVkOiBzY3NpX3RyYW5zcG9ydF9zcnBd
DQpDUFU6IDEgUElEOiAxMzYgQ29tbToga3dvcmtlci91NjQ6OCBUYWludGVkOiBHICAgICAg
ICBXICAgICAgIDQuOC4wLXJjNC1kYmcrICMyDQpIYXJkd2FyZSBuYW1lOiBEZWxsIEluYy4g
UG93ZXJFZGdlIFI0MzAvMDNYS0RWLCBCSU9TIDEuMC4yIDExLzE3LzIwMTQNCldvcmtxdWV1
ZTogZXZlbnRzX3VuYm91bmQgYXN5bmNfcnVuX2VudHJ5X2ZuDQpDYWxsIFRyYWNlOg0KIFs8
ZmZmZmZmZmY4MTMyNDdjNT5dIGR1bXBfc3RhY2srMHg2OC8weDkzDQogWzxmZmZmZmZmZjgx
MDYyZjk2Pl0gX193YXJuKzB4YzYvMHhlMA0KIFs8ZmZmZmZmZmY4MTA2MzA2OD5dIHdhcm5f
c2xvd3BhdGhfbnVsbCsweDE4LzB4MjANCiBbPGZmZmZmZmZmODE0ODE0NWU+XSBzZF9wcm9i
ZV9hc3luYysweDFjZS8weDFlMA0KIFs8ZmZmZmZmZmY4MTA4OWQ2ND5dIGFzeW5jX3J1bl9l
bnRyeV9mbisweDM0LzB4MTQwDQogWzxmZmZmZmZmZjgxMDgwMzU1Pl0gcHJvY2Vzc19vbmVf
d29yaysweDFmNS8weDY5MA0KIFs8ZmZmZmZmZmY4MTA4MDgzOT5dIHdvcmtlcl90aHJlYWQr
MHg0OS8weDQ5MA0KIFs8ZmZmZmZmZmY4MTA4NzAxYT5dIGt0aHJlYWQrMHhlYS8weDEwMA0K
IFs8ZmZmZmZmZmY4MTYyZDNiZj5dIHJldF9mcm9tX2ZvcmsrMHgxZi8weDQwDQoNClRoZSB0
ZXN0IEkgcmFuIGlzIGF2YWlsYWJsZSBhdCBodHRwczovL2dpdGh1Yi5jb20vYnZhbmFzc2No
ZS9zcnAtdGVzdC4NCg0KQmFydC4NCg0K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
@ 2016-08-29 18:16                   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-29 18:16 UTC (permalink / raw)
  To: James Bottomley, Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On 08/14/2016 10:21 AM, James Bottomley wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d3e852a..222771d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
>  	}
>  
>  	blk_pm_runtime_init(sdp->request_queue, dev);
> -	device_add_disk(dev, gd);
> +	/*
> +	 * previously the parent of the gendisk was the scsi device.  It
> +	 * was moved to fix lifetime rules, so now we install a symlink
> +	 * to the new location of the block class directory
> +	 */
> +	device_add_disk(&sdkp->dev, gd);
> +	WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
>  	if (sdkp->capacity)
>  		sd_dif_config_host(sdkp);
>  
> @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
>  
>  	async_synchronize_full_domain(&scsi_sd_pm_domain);
>  	async_synchronize_full_domain(&scsi_sd_probe_domain);
> +	sysfs_remove_link(&dev->kobj, "block");
>  	device_del(&sdkp->dev);
>  	del_gendisk(sdkp->disk);
>  	sd_shutdown(dev);

Hello James,

Sorry that it took so long before I could test this patch and
the previous patch that was posted in this e-mail thread. But I
did so earlier this morning. What I see is that the following
warning message is reported frequently:

WARNING: CPU: 1 PID: 136 at drivers/scsi/sd.c:3009 sd_probe_async+0x1ce/0x1e0
Modules linked in: ib_srp libcrc32c scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod brd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bridge stp llc iscsi_ibft iscsi_boot_sysfs ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad msr rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dm_mod irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf crc32_pclmul ghash_clmulni_intel aesni_intel iTCO_wdt aes_x86_64 lrw iTCO_vendor_support dcdbas gf128mul tg3 mlx4_core glue_helper ablk_helper ptp cryptd fjes ipmi_si pcspkr pps_core libphy ipmi_msghandler mei_me tpm_tis tpm_tis_core mei tpm lpc_ich shpchp mfd_core wmi button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg [last unloaded: scsi_transport_srp]
CPU: 1 PID: 136 Comm: kworker/u64:8 Tainted: G        W       4.8.0-rc4-dbg+ #2
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 [<ffffffff813247c5>] dump_stack+0x68/0x93
 [<ffffffff81062f96>] __warn+0xc6/0xe0
 [<ffffffff81063068>] warn_slowpath_null+0x18/0x20
 [<ffffffff8148145e>] sd_probe_async+0x1ce/0x1e0
 [<ffffffff81089d64>] async_run_entry_fn+0x34/0x140
 [<ffffffff81080355>] process_one_work+0x1f5/0x690
 [<ffffffff81080839>] worker_thread+0x49/0x490
 [<ffffffff8108701a>] kthread+0xea/0x100
 [<ffffffff8162d3bf>] ret_from_fork+0x1f/0x40

The test I ran is available at https://github.com/bvanassche/srp-test.

Bart.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-29 18:16                   ` Bart Van Assche
@ 2016-08-30 20:43                     ` Dan Williams
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-30 20:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-block, linux-scsi, Jens Axboe,
	Martin K. Petersen, Christoph Hellwig, Tejun Heo, Dave Hansen

On Mon, Aug 29, 2016 at 11:16 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 08/14/2016 10:21 AM, James Bottomley wrote:
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d3e852a..222771d 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cook=
ie_t cookie)
>>       }
>>
>>       blk_pm_runtime_init(sdp->request_queue, dev);
>> -     device_add_disk(dev, gd);
>> +     /*
>> +      * previously the parent of the gendisk was the scsi device.  It
>> +      * was moved to fix lifetime rules, so now we install a symlink
>> +      * to the new location of the block class directory
>> +      */
>> +     device_add_disk(&sdkp->dev, gd);
>> +     WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.ko=
bj, "block"));
>>       if (sdkp->capacity)
>>               sd_dif_config_host(sdkp);
>>
>> @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
>>
>>       async_synchronize_full_domain(&scsi_sd_pm_domain);
>>       async_synchronize_full_domain(&scsi_sd_probe_domain);
>> +     sysfs_remove_link(&dev->kobj, "block");
>>       device_del(&sdkp->dev);
>>       del_gendisk(sdkp->disk);
>>       sd_shutdown(dev);
>
> Hello James,
>
> Sorry that it took so long before I could test this patch and
> the previous patch that was posted in this e-mail thread. But I
> did so earlier this morning. What I see is that the following
> warning message is reported frequently:
>
> WARNING: CPU: 1 PID: 136 at drivers/scsi/sd.c:3009 sd_probe_async+0x1ce/0=
x1e0
> Modules linked in: ib_srp libcrc32c scsi_transport_srp target_core_pscsi =
target_core_file ib_srpt target_core_iblock target_core_mod brd dm_multipat=
h scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole xt_CHECKSUM iptable_mang=
le ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_=
conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpud=
p tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_=
tables x_tables af_packet bridge stp llc iscsi_ibft iscsi_boot_sysfs ib_ipo=
ib rdma_ucm ib_ucm ib_uverbs ib_umad msr rdma_cm configfs ib_cm iw_cm mlx4_=
ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp=
 kvm_intel kvm dm_mod irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf crc=
32_pclmul ghash_clmulni_intel aesni_intel iTCO_wdt aes_x86_64 lrw iTCO_vend=
or_support dcdbas gf128mul tg3 mlx4_core glue_helper ablk_helper ptp cryptd=
 fjes ipmi_si pcspkr pps_core libphy ipmi_msghandler mei_me tpm_tis tpm_tis=
_core mei tpm lpc_ich shpchp mfd_core wmi button hid_generic usbhid mgag200=
 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops =
ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg [=
last unloaded: scsi_transport_srp]
> CPU: 1 PID: 136 Comm: kworker/u64:8 Tainted: G        W       4.8.0-rc4-d=
bg+ #2
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
>  [<ffffffff813247c5>] dump_stack+0x68/0x93
>  [<ffffffff81062f96>] __warn+0xc6/0xe0
>  [<ffffffff81063068>] warn_slowpath_null+0x18/0x20
>  [<ffffffff8148145e>] sd_probe_async+0x1ce/0x1e0
>  [<ffffffff81089d64>] async_run_entry_fn+0x34/0x140
>  [<ffffffff81080355>] process_one_work+0x1f5/0x690
>  [<ffffffff81080839>] worker_thread+0x49/0x490
>  [<ffffffff8108701a>] kthread+0xea/0x100
>  [<ffffffff8162d3bf>] ret_from_fork+0x1f/0x40
>
> The test I ran is available at https://github.com/bvanassche/srp-test.

I tried running this, but it seems I'm failing to configure my test
environment correctly [1], but I'm worried that this "re-parenting the
scsi-disk" approach, even if the above warning is addressed, may not
be backwards compatible.  We now have an ordering difference where the
link to the "block" attribute group is established after the disk's
KOBJ_ADD event which seems in the same class of problems that Fam
Zheng's patchset [2] is trying to solve.

Bart, if you can help me get the test case running I can take a look
at finishing off the disk_devt approach I proposed earlier.

[1]: https://gist.github.com/djbw/1c72526c90d1ea8fe2a05dcfbfc73dda
[2]: https://lkml.org/lkml/2016/8/17/63

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
@ 2016-08-30 20:43                     ` Dan Williams
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Williams @ 2016-08-30 20:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-block, linux-scsi, Jens Axboe,
	Martin K. Petersen, Christoph Hellwig, Tejun Heo, Dave Hansen

On Mon, Aug 29, 2016 at 11:16 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 08/14/2016 10:21 AM, James Bottomley wrote:
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d3e852a..222771d 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
>>       }
>>
>>       blk_pm_runtime_init(sdp->request_queue, dev);
>> -     device_add_disk(dev, gd);
>> +     /*
>> +      * previously the parent of the gendisk was the scsi device.  It
>> +      * was moved to fix lifetime rules, so now we install a symlink
>> +      * to the new location of the block class directory
>> +      */
>> +     device_add_disk(&sdkp->dev, gd);
>> +     WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
>>       if (sdkp->capacity)
>>               sd_dif_config_host(sdkp);
>>
>> @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
>>
>>       async_synchronize_full_domain(&scsi_sd_pm_domain);
>>       async_synchronize_full_domain(&scsi_sd_probe_domain);
>> +     sysfs_remove_link(&dev->kobj, "block");
>>       device_del(&sdkp->dev);
>>       del_gendisk(sdkp->disk);
>>       sd_shutdown(dev);
>
> Hello James,
>
> Sorry that it took so long before I could test this patch and
> the previous patch that was posted in this e-mail thread. But I
> did so earlier this morning. What I see is that the following
> warning message is reported frequently:
>
> WARNING: CPU: 1 PID: 136 at drivers/scsi/sd.c:3009 sd_probe_async+0x1ce/0x1e0
> Modules linked in: ib_srp libcrc32c scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod brd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bridge stp llc iscsi_ibft iscsi_boot_sysfs ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad msr rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dm_mod irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf crc32_pclmul ghash_clmulni_intel aesni_intel iTCO_wdt aes_x86_64 lrw iTCO_vendor_support dcdbas gf128mul tg3 mlx4_core glue_helper ablk_helper ptp cryptd fjes ipmi_si pcspkr pps_core libphy ipmi_msghandler mei_me tpm_tis tpm_tis_core mei tpm lpc_ich shpchp mfd_core wmi button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg [last unloaded: scsi_transport_srp]
> CPU: 1 PID: 136 Comm: kworker/u64:8 Tainted: G        W       4.8.0-rc4-dbg+ #2
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
>  [<ffffffff813247c5>] dump_stack+0x68/0x93
>  [<ffffffff81062f96>] __warn+0xc6/0xe0
>  [<ffffffff81063068>] warn_slowpath_null+0x18/0x20
>  [<ffffffff8148145e>] sd_probe_async+0x1ce/0x1e0
>  [<ffffffff81089d64>] async_run_entry_fn+0x34/0x140
>  [<ffffffff81080355>] process_one_work+0x1f5/0x690
>  [<ffffffff81080839>] worker_thread+0x49/0x490
>  [<ffffffff8108701a>] kthread+0xea/0x100
>  [<ffffffff8162d3bf>] ret_from_fork+0x1f/0x40
>
> The test I ran is available at https://github.com/bvanassche/srp-test.

I tried running this, but it seems I'm failing to configure my test
environment correctly [1], but I'm worried that this "re-parenting the
scsi-disk" approach, even if the above warning is addressed, may not
be backwards compatible.  We now have an ordering difference where the
link to the "block" attribute group is established after the disk's
KOBJ_ADD event which seems in the same class of problems that Fam
Zheng's patchset [2] is trying to solve.

Bart, if you can help me get the test case running I can take a look
at finishing off the disk_devt approach I proposed earlier.

[1]: https://gist.github.com/djbw/1c72526c90d1ea8fe2a05dcfbfc73dda
[2]: https://lkml.org/lkml/2016/8/17/63

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-30 20:43                     ` Dan Williams
  (?)
@ 2016-08-30 20:53                     ` Bart Van Assche
  -1 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-08-30 20:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: James Bottomley, linux-block, linux-scsi, Jens Axboe,
	Martin K. Petersen, Christoph Hellwig, Tejun Heo, Dave Hansen

On 08/30/2016 01:43 PM, Dan Williams wrote:
> I tried running this, but it seems I'm failing to configure my test
> environment correctly [1], but I'm worried that this "re-parenting the
> scsi-disk" approach, even if the above warning is addressed, may not
> be backwards compatible.  We now have an ordering difference where the
> link to the "block" attribute group is established after the disk's
> KOBJ_ADD event which seems in the same class of problems that Fam
> Zheng's patchset [2] is trying to solve.
>
> Bart, if you can help me get the test case running I can take a look
> at finishing off the disk_devt approach I proposed earlier.
>
> [1]: https://gist.github.com/djbw/1c72526c90d1ea8fe2a05dcfbfc73dda
> [2]: https://lkml.org/lkml/2016/8/17/63

Hello Dan,

Today running the srp-test software is only possible on a system 
equipped with at least one InfiniBand port. Since the soft-RoCE driver 
has been accepted in kernel v4.8 (drivers/infiniband/sw/rxe) it should 
be possible to modify the srp-test software such that it runs on any 
system equipped with at least one Ethernet port. However, that will 
require adding RoCE (RDMA over Ethernet) support to the ib_srp and 
ib_srpt drivers.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Time to make dynamically allocated devt the default for scsi disks?
  2016-08-29 18:16                   ` Bart Van Assche
  (?)
  (?)
@ 2016-09-01 15:10                   ` James Bottomley
  -1 siblings, 0 replies; 23+ messages in thread
From: James Bottomley @ 2016-09-01 15:10 UTC (permalink / raw)
  To: Bart Van Assche, Dan Williams
  Cc: linux-block, linux-scsi, Jens Axboe, Martin K. Petersen,
	Christoph Hellwig, Tejun Heo, Dave Hansen

On Mon, 2016-08-29 at 11:16 -0700, Bart Van Assche wrote:
> On 08/14/2016 10:21 AM, James Bottomley wrote:
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index d3e852a..222771d 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data,
> > async_cookie_t cookie)
> >  	}
> >  
> >  	blk_pm_runtime_init(sdp->request_queue, dev);
> > -	device_add_disk(dev, gd);
> > +	/*
> > +	 * previously the parent of the gendisk was the scsi
> > device.  It
> > +	 * was moved to fix lifetime rules, so now we install a
> > symlink
> > +	 * to the new location of the block class directory
> > +	 */
> > +	device_add_disk(&sdkp->dev, gd);
> > +	WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp
> > ->dev.kobj, "block"));
> >  	if (sdkp->capacity)
> >  		sd_dif_config_host(sdkp);
> >  
> > @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
> >  
> >  	async_synchronize_full_domain(&scsi_sd_pm_domain);
> >  	async_synchronize_full_domain(&scsi_sd_probe_domain);
> > +	sysfs_remove_link(&dev->kobj, "block");
> >  	device_del(&sdkp->dev);
> >  	del_gendisk(sdkp->disk);
> >  	sd_shutdown(dev);
> 
> Hello James,
> 
> Sorry that it took so long before I could test this patch and
> the previous patch that was posted in this e-mail thread. But I
> did so earlier this morning. What I see is that the following
> warning message is reported frequently:
> 
> WARNING: CPU: 1 PID: 136 at drivers/scsi/sd.c:3009
> sd_probe_async+0x1ce/0x1e0

That's because the link is created too early, I think.  Let me dig into
this; I managed to hose my big device machine, so I'll need time to
resurrect it.

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-01 15:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 21:29 Time to make dynamically allocated devt the default for scsi disks? Dan Williams
2016-08-12 21:35 ` Bart Van Assche
2016-08-12 21:35   ` Bart Van Assche
2016-08-12 23:32   ` Dan Williams
2016-08-13  0:17 ` James Bottomley
2016-08-13  0:29   ` Dan Williams
2016-08-13  4:57     ` Dan Williams
2016-08-13 15:23       ` James Bottomley
2016-08-13 16:29         ` Dan Williams
2016-08-13 17:43           ` James Bottomley
2016-08-13 18:27             ` Dan Williams
2016-08-13 20:38               ` Dan Williams
2016-08-14 17:20               ` James Bottomley
2016-08-14 18:08                 ` Dan Williams
2016-08-14 18:23                   ` Dan Williams
2016-08-15 20:11                     ` Bart Van Assche
2016-08-29 18:16                 ` Bart Van Assche
2016-08-29 18:16                   ` Bart Van Assche
2016-08-30 20:43                   ` Dan Williams
2016-08-30 20:43                     ` Dan Williams
2016-08-30 20:53                     ` Bart Van Assche
2016-09-01 15:10                   ` James Bottomley
2016-08-13 12:13 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.