All of lore.kernel.org
 help / color / mirror / Atom feed
* Has anyone seen this?
@ 2015-04-14 16:52 Sagi Grimberg
  2015-04-14 17:57 ` Douglas Gilbert
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Sagi Grimberg @ 2015-04-14 16:52 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen

When I set up a DIX enabled device for the first time (say
scsi_debug) it all works, but when I remove it
and set it up again I get the below crash:

Reproducer:
$modprobe scsi_debug dif=1 dix=1
$modprobe -r scsi_debug
$modprobe scsi_debug dif=1 dix=1

It seems that somehow bdi_destroy() is not
invoked for DIX...

scsi_debug_init: dif_storep 131072 bytes @ ffffc90018507000 

scsi_debug: host protection DIF1 DIX1 

scsi host9: scsi_debug, version 1.85 [20141022], dev_size_mb=8, opts=0x0 

scsi 9:0:0:0: Direct-Access     Linux    scsi_debug       0184 PQ: 0 
ANSI: 6
sd 9:0:0:0: Attached scsi generic sg2 type 0 

sd 9:0:0:0: [sdc] Enabling DIF Type 1 protection 

sd 9:0:0:0: [sdc] 16384 512-byte logical blocks: (8.38 MB/8.00 MiB) 

sd 9:0:0:0: [sdc] Write Protect is off 

sd 9:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports 
DPO and FUA
sd 9:0:0:0: [sdc] Enabling DIX T10-DIF-TYPE1-CRC protection 

sd 9:0:0:0: [sdc] DIF application tag size 2 

sd 9:0:0:0: [sdc] Attached SCSI disk 

sd 9:0:0:0: [sdc] Synchronizing SCSI cache 

scsi_debug_init: dif_storep 131072 bytes @ ffffc900185e7000 

scsi_debug: host protection DIF1 DIX1 

scsi host10: scsi_debug, version 1.85 [20141022], dev_size_mb=8, 
opts=0x0
scsi 10:0:0:0: Direct-Access     Linux    scsi_debug       0184 PQ: 0 
ANSI: 6
sd 10:0:0:0: Attached scsi generic sg2 type 0 

sd 10:0:0:0: [sdc] Enabling DIF Type 1 protection 

sd 10:0:0:0: [sdc] 16384 512-byte logical blocks: (8.38 MB/8.00 MiB) 

sd 10:0:0:0: [sdc] Write Protect is off 

sd 10:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports 
DPO and FUA
------------[ cut here ]------------
WARNING: CPU: 2 PID: 753 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
---[ end trace 08a96f4c6fca987b ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 753 at lib/kobject.c:240 
kobject_add_internal+0x194/0x1f0()
---[ end trace 08a96f4c6fca987c ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff811cbede>] sysfs_do_create_link_sd+0x3e/0xc0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: netconsole tcm_loop vhost_scsi vhost ib_srpt ib_isert 
iscsi_target_mod rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr tcm_fc 
libfc target_core_file target_core_iblock target_core_pscsi 
target_core_mod configfs nfsd exportfs nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs fscache lockd grace autofs4 sunrpc 
cpufreq_ondemand ipv6 ext4 jbd2 dm_mirror dm_region_hash dm_log uinput 
iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr ipmi_si 
ipmi_msghandler acpi_cpufreq sg sb_edac edac_core i2c_i801 lpc_ich 
mfd_core shpchp ioatdma igb dm_mod dca i2c_algo_bit i2c_core ptp 
pps_core wmi ext3(E) jbd(E) mbcache(E) sd_mod(E) ahci(E) libahci(E) 
isci(E) libsas(E) scsi_transport_sas(E) qla2xxx(E) scsi_transport_fc(E) 
[last unloaded: netconsole]
CPU: 4 PID: 753 Comm: kworker/u49:6 Tainted: G        W   E   4.0.0-rc1+ #38
Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
Workqueue: events_unbound async_run_entry_fn
task: ffff88046de44410 ti: ffff88046b754000 task.ti: ffff88046b754000
RIP: 0010:[<ffffffff811cbede>]  [<ffffffff811cbede>] 
sysfs_do_create_link_sd+0x3e/0xc0
RSP: 0018:ffff88046b757cf8  EFLAGS: 00010246
RAX: 00000000112e112e RBX: ffffffff817d2813 RCX: 0000000000000001
RDX: 000000000000112e RSI: 0000000000000010 RDI: ffffffff81e440c8
RBP: ffff88046b757d28 R08: ffff880468270f50 R09: 0000000000000040
R10: 0000000000000001 R11: 0000000000000010 R12: ffff880079b92988
R13: 0000000000000010 R14: 0000000000000001 R15: ffff88046fa0ae05
FS:  0000000000000000(0000) GS:ffff88047fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000001a0e000 CR4: 00000000000406e0
Stack:
  ffff88046b757d38 ffff88046ddea000 ffff88046b806008 ffff88046ddea080
  0000000000000000 ffff88046fa0ae05 ffff88046b757d38 ffffffff811cbfb1
  ffff88046b757d78 ffffffff81231c60 ffff88046b806008 008000206e800800
Call Trace:
  [<ffffffff811cbfb1>] sysfs_create_link+0x21/0x40
  [<ffffffff81231c60>] add_disk+0x1b0/0x310
  [<ffffffffa01306ec>] sd_probe_async+0x11c/0x1d0 [sd_mod]
  [<ffffffff8106d525>] async_run_entry_fn+0x55/0x160
  [<ffffffff81066126>] process_one_work+0x136/0x3a0
  [<ffffffff810664a7>] worker_thread+0x117/0x3b0
  [<ffffffff81066390>] ? process_one_work+0x3a0/0x3a0
  [<ffffffff81066390>] ? process_one_work+0x3a0/0x3a0
  [<ffffffff8106afbe>] kthread+0xce/0xf0
  [<ffffffff8106aef0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff81542e6c>] ret_from_fork+0x7c/0xb0
  [<ffffffff8106aef0>] ? kthread_freezable_should_stop+0x70/0x70
Code: 48 89 d3 4c 89 6d e8 4c 89 75 f0 49 89 fc 4c 89 7d f8 49 89 f5 41 
89 ce 74 6e 48 85 ff 74 69 48 c7 c7 c8 40 e4 81 e8 52 6b 37 00 <4d> 8b 
6d 30 4d 85 ed 74 08 4c 89 ef e8 01 c7 ff ff 66 83 05 d1
RIP  [<ffffffff811cbede>] sysfs_do_create_link_sd+0x3e/0xc0
  RSP <ffff88046b757cf8>
CR2: 0000000000000040
---[ end trace 08a96f4c6fca987d ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Has anyone seen this?
  2015-04-14 16:52 Has anyone seen this? Sagi Grimberg
@ 2015-04-14 17:57 ` Douglas Gilbert
  2015-04-14 18:14 ` James Bottomley
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Douglas Gilbert @ 2015-04-14 17:57 UTC (permalink / raw)
  To: Sagi Grimberg, linux-scsi; +Cc: martin.petersen

On 15-04-14 06:52 PM, Sagi Grimberg wrote:
> When I set up a DIX enabled device for the first time (say
> scsi_debug) it all works, but when I remove it
> and set it up again I get the below crash:
>
> Reproducer:
> $modprobe scsi_debug dif=1 dix=1
> $modprobe -r scsi_debug
> $modprobe scsi_debug dif=1 dix=1

Using lk 4.0.0

# modprobe scsi_debug dif=1 dix=1
# modprobe -r scsi_debug
# modprobe scsi_debug dif=1 dix=1

locks up my laptop. So does:

# modprobe scsi_debug dif=1 dix=1
# modprobe -r scsi_debug
# modprobe scsi_debug

No useful information is logged in both cases.
However, take out the dif/dix stuff and it's okay:

# modprobe scsi_debug
# modprobe -r scsi_debug
# modprobe scsi_debug


So it is dif/dix related. Looking a little closer it seems
to be related to removing a dix/dif host (and its associated
lu(s)):

# modprobe scsi_debug dif=1 dix=1
# echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host
# echo 1 > /sys/bus/pseudo/drivers/scsi_debug/add_host
<laptop freezes>

The above modprobe adds on scsi_debug dif/dix host with
one lu. The "echo -1 >" removes that host while the
"echo 1 >" attempts to re-add that dif/dix host.

My guess is that the scsi mid-level has dtor problems
cleaning up a dif/dix host and/or its associated lu(s).

Doug Gilbert

> It seems that somehow bdi_destroy() is not
> invoked for DIX...
>
> scsi_debug_init: dif_storep 131072 bytes @ ffffc90018507000
> scsi_debug: host protection DIF1 DIX1
> scsi host9: scsi_debug, version 1.85 [20141022], dev_size_mb=8, opts=0x0
> scsi 9:0:0:0: Direct-Access     Linux    scsi_debug       0184 PQ: 0 ANSI: 6
> sd 9:0:0:0: Attached scsi generic sg2 type 0
> sd 9:0:0:0: [sdc] Enabling DIF Type 1 protection
> sd 9:0:0:0: [sdc] 16384 512-byte logical blocks: (8.38 MB/8.00 MiB)
> sd 9:0:0:0: [sdc] Write Protect is off
> sd 9:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
> sd 9:0:0:0: [sdc] Enabling DIX T10-DIF-TYPE1-CRC protection
> sd 9:0:0:0: [sdc] DIF application tag size 2
> sd 9:0:0:0: [sdc] Attached SCSI disk
> sd 9:0:0:0: [sdc] Synchronizing SCSI cache
> scsi_debug_init: dif_storep 131072 bytes @ ffffc900185e7000
> scsi_debug: host protection DIF1 DIX1
> scsi host10: scsi_debug, version 1.85 [20141022], dev_size_mb=8, opts=0x0
> scsi 10:0:0:0: Direct-Access     Linux    scsi_debug       0184 PQ: 0 ANSI: 6
> sd 10:0:0:0: Attached scsi generic sg2 type 0
> sd 10:0:0:0: [sdc] Enabling DIF Type 1 protection
> sd 10:0:0:0: [sdc] 16384 512-byte logical blocks: (8.38 MB/8.00 MiB)
> sd 10:0:0:0: [sdc] Write Protect is off
> sd 10:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 753 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80()
> sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
> ---[ end trace 08a96f4c6fca987b ]---
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 753 at lib/kobject.c:240 kobject_add_internal+0x194/0x1f0()
> ---[ end trace 08a96f4c6fca987c ]---
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> IP: [<ffffffff811cbede>] sysfs_do_create_link_sd+0x3e/0xc0
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: netconsole tcm_loop vhost_scsi vhost ib_srpt ib_isert
> iscsi_target_mod rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr tcm_fc libfc
> target_core_file target_core_iblock target_core_pscsi target_core_mod configfs
> nfsd exportfs nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd
> grace autofs4 sunrpc cpufreq_ondemand ipv6 ext4 jbd2 dm_mirror dm_region_hash
> dm_log uinput iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr ipmi_si
> ipmi_msghandler acpi_cpufreq sg sb_edac edac_core i2c_i801 lpc_ich mfd_core
> shpchp ioatdma igb dm_mod dca i2c_algo_bit i2c_core ptp pps_core wmi ext3(E)
> jbd(E) mbcache(E) sd_mod(E) ahci(E) libahci(E) isci(E) libsas(E)
> scsi_transport_sas(E) qla2xxx(E) scsi_transport_fc(E) [last unloaded: netconsole]
> CPU: 4 PID: 753 Comm: kworker/u49:6 Tainted: G        W   E   4.0.0-rc1+ #38
> Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
> Workqueue: events_unbound async_run_entry_fn
> task: ffff88046de44410 ti: ffff88046b754000 task.ti: ffff88046b754000
> RIP: 0010:[<ffffffff811cbede>]  [<ffffffff811cbede>]
> sysfs_do_create_link_sd+0x3e/0xc0
> RSP: 0018:ffff88046b757cf8  EFLAGS: 00010246
> RAX: 00000000112e112e RBX: ffffffff817d2813 RCX: 0000000000000001
> RDX: 000000000000112e RSI: 0000000000000010 RDI: ffffffff81e440c8
> RBP: ffff88046b757d28 R08: ffff880468270f50 R09: 0000000000000040
> R10: 0000000000000001 R11: 0000000000000010 R12: ffff880079b92988
> R13: 0000000000000010 R14: 0000000000000001 R15: ffff88046fa0ae05
> FS:  0000000000000000(0000) GS:ffff88047fc80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000040 CR3: 0000000001a0e000 CR4: 00000000000406e0
> Stack:
>   ffff88046b757d38 ffff88046ddea000 ffff88046b806008 ffff88046ddea080
>   0000000000000000 ffff88046fa0ae05 ffff88046b757d38 ffffffff811cbfb1
>   ffff88046b757d78 ffffffff81231c60 ffff88046b806008 008000206e800800
> Call Trace:
>   [<ffffffff811cbfb1>] sysfs_create_link+0x21/0x40
>   [<ffffffff81231c60>] add_disk+0x1b0/0x310
>   [<ffffffffa01306ec>] sd_probe_async+0x11c/0x1d0 [sd_mod]
>   [<ffffffff8106d525>] async_run_entry_fn+0x55/0x160
>   [<ffffffff81066126>] process_one_work+0x136/0x3a0
>   [<ffffffff810664a7>] worker_thread+0x117/0x3b0
>   [<ffffffff81066390>] ? process_one_work+0x3a0/0x3a0
>   [<ffffffff81066390>] ? process_one_work+0x3a0/0x3a0
>   [<ffffffff8106afbe>] kthread+0xce/0xf0
>   [<ffffffff8106aef0>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff81542e6c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff8106aef0>] ? kthread_freezable_should_stop+0x70/0x70
> Code: 48 89 d3 4c 89 6d e8 4c 89 75 f0 49 89 fc 4c 89 7d f8 49 89 f5 41 89 ce 74
> 6e 48 85 ff 74 69 48 c7 c7 c8 40 e4 81 e8 52 6b 37 00 <4d> 8b 6d 30 4d 85 ed 74
> 08 4c 89 ef e8 01 c7 ff ff 66 83 05 d1
> RIP  [<ffffffff811cbede>] sysfs_do_create_link_sd+0x3e/0xc0
>   RSP <ffff88046b757cf8>
> CR2: 0000000000000040
> ---[ end trace 08a96f4c6fca987d ]---
> BUG: unable to handle kernel paging request at ffffffffffffffd8
> --


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Has anyone seen this?
  2015-04-14 16:52 Has anyone seen this? Sagi Grimberg
  2015-04-14 17:57 ` Douglas Gilbert
@ 2015-04-14 18:14 ` James Bottomley
  2015-04-14 18:18 ` Martin K. Petersen
  2015-04-14 20:56 ` [PATCH] sd: Unregister integrity profile Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2015-04-14 18:14 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-scsi, martin.petersen

On Tue, 2015-04-14 at 19:52 +0300, Sagi Grimberg wrote:
> When I set up a DIX enabled device for the first time (say
> scsi_debug) it all works, but when I remove it
> and set it up again I get the below crash:
> 
> Reproducer:
> $modprobe scsi_debug dif=1 dix=1
> $modprobe -r scsi_debug
> $modprobe scsi_debug dif=1 dix=1
> 
> It seems that somehow bdi_destroy() is not
> invoked for DIX...

That implies a refcount imbalance on the block queue.  Either from a
stray get (which looks impossible, because we only do queue gets and
puts in two places) or because there's an outstanding request, or
because there's an imbalance higher up.

Could you instrument and check we call
scsi_device_dev_release_usercontext() for the device (that should do the
final put).  If that's not happening, then we have an imbalance on the
scsi device itself.

Thanks,

James



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Has anyone seen this?
  2015-04-14 16:52 Has anyone seen this? Sagi Grimberg
  2015-04-14 17:57 ` Douglas Gilbert
  2015-04-14 18:14 ` James Bottomley
@ 2015-04-14 18:18 ` Martin K. Petersen
  2015-04-14 20:56 ` [PATCH] sd: Unregister integrity profile Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2015-04-14 18:18 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-scsi, martin.petersen

>>>>> "Sagi" == Sagi Grimberg <sagig@dev.mellanox.co.il> writes:

Sagi,

Sagi> scsi_debug $modprobe scsi_debug dif=1 dix=1

Sagi> scsi_debug: host protection DIF1 DIX1

Sagi> sd 9:0:0:0: [sdc] Enabling DIF Type 1 protection

This looks odd. dix specifies the host protection mask. A value of 1 is
"DIF Type 1" so DIX should never get enabled.

But I'll take a look...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] sd: Unregister integrity profile
  2015-04-14 16:52 Has anyone seen this? Sagi Grimberg
                   ` (2 preceding siblings ...)
  2015-04-14 18:18 ` Martin K. Petersen
@ 2015-04-14 20:56 ` Martin K. Petersen
  2015-04-15  9:22   ` Sagi Grimberg
  3 siblings, 1 reply; 6+ messages in thread
From: Martin K. Petersen @ 2015-04-14 20:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: sagig, Martin K. Petersen, stable

The new integrity code did not correctly unregister the profile for SD
disks. Call blk_integrity_unregister() when we release a disk.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
CC: stable@vger.kernel.org # v3.17+
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 6b78476d04bb..3290a3ed5b31 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3100,6 +3100,7 @@ static void scsi_disk_release(struct device *dev)
 	ida_remove(&sd_index_ida, sdkp->index);
 	spin_unlock(&sd_index_lock);
 
+	blk_integrity_unregister(disk);
 	disk->private_data = NULL;
 	put_disk(disk);
 	put_device(&sdkp->device->sdev_gendev);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sd: Unregister integrity profile
  2015-04-14 20:56 ` [PATCH] sd: Unregister integrity profile Martin K. Petersen
@ 2015-04-15  9:22   ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2015-04-15  9:22 UTC (permalink / raw)
  To: Martin K. Petersen, linux-scsi; +Cc: stable

On 4/14/2015 11:56 PM, Martin K. Petersen wrote:
> The new integrity code did not correctly unregister the profile for SD
> disks. Call blk_integrity_unregister() when we release a disk.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
> CC: stable@vger.kernel.org # v3.17+

Has it been there this long? I wander how we didn't step on
this sooner...

> ---
>   drivers/scsi/sd.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 6b78476d04bb..3290a3ed5b31 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3100,6 +3100,7 @@ static void scsi_disk_release(struct device *dev)
>   	ida_remove(&sd_index_ida, sdkp->index);
>   	spin_unlock(&sd_index_lock);
>
> +	blk_integrity_unregister(disk);
>   	disk->private_data = NULL;
>   	put_disk(disk);
>   	put_device(&sdkp->device->sdev_gendev);
>

Always nice to post a trace, go home, and find the fix in
the next morning...

Thanks Martin!

Tested-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-15  9:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14 16:52 Has anyone seen this? Sagi Grimberg
2015-04-14 17:57 ` Douglas Gilbert
2015-04-14 18:14 ` James Bottomley
2015-04-14 18:18 ` Martin K. Petersen
2015-04-14 20:56 ` [PATCH] sd: Unregister integrity profile Martin K. Petersen
2015-04-15  9:22   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.