All of lore.kernel.org
 help / color / mirror / Atom feed
* srp state in current mainline
@ 2015-11-10 17:15 Christoph Hellwig
       [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-10 17:15 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche; +Cc: linux-rdma, target-devel

I've just tried forward porting some work affecting SRP from a 4.1-ish
base, and started to run into error ASAP on current Linus' HEAD and also
4.3.  In current HEAD memory registrations on the client seem to fail,
probably due to the MR rework, but even on 4.3 I run into crazy
corruption reports from xfstests, which mostly seem to be slab
poisoning.  I'm not sure at this point if they are caused by the
target or initiator, but I'd like to share them.  This is a simply
xfstests run using XFS on a remote LIO ramdisk.

4.3 (actually -rc, but I didn't see any change since):

[   86.316719] run fstests generic/018 at 2015-11-10 08:52:56
[   86.558749] XFS (sdb): Mounting V4 Filesystem
[   86.798915] XFS (sdb): Ending clean mount
[   86.887999] XFS (sdc): Mounting V4 Filesystem
[   86.894340] XFS (sdc): Ending clean mount
[   86.980603] XFS (sdc): Unmounting Filesystem
[   86.992347] XFS (sdc): Mounting V4 Filesystem
[   86.999572] XFS (sdc): Ending clean mount
[   87.052941] XFS (sdc): Unmounting Filesystem
[   87.065080] XFS (sdc): Mounting V4 Filesystem
[   87.070402] XFS (sdc): Ending clean mount
[   87.124831] XFS (sdc): Unmounting Filesystem
[   87.136828] XFS (sdc): Mounting V4 Filesystem
[   87.144746] XFS (sdc): Ending clean mount
[   87.157052] XFS (sdc): Metadata corruption detected at xfs_agi_read_verify+0x4a/0xe0 [xfs], block 0x2
[   87.166374] XFS (sdc): Unmount and run xfs_repair
[   87.171161] XFS (sdc): First 64 bytes of corrupted metadata buffer:
[   87.177515] ffff880314c2ce00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   87.186312] ffff880314c2ce10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   87.195107] ffff880314c2ce20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   87.203902] ffff880314c2ce30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   87.212716] XFS (sdc): metadata I/O error: block 0x2
("xfs_trans_read_buf_map") error 117 numblks 1
[   87.221816] XFS (sdc): xfs_do_force_shutdown(0x1) called from line
315 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffffa067138c
   87.221828] XFS (sdc): I/O Error Detected. Shutting down filesystem
   [   87.228132] XFS (sdc): Please umount the filesystem and rectify
   the problem(s)
   [   87.328890] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.328897] XFS (sdc): Unmounting Filesystem
   [   87.328906] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.328921] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.432013] run fstests generic/020 at 2015-11-10 08:52:57

and then a couple times more until xfstests eventually gives up.

On current Linus' HEAD tree:

[  128.786534] run fstests generic/001 at 2015-11-10 09:05:06
[  132.914320] XFS (sdb): Unmounting Filesystem
[  132.914463] ------------[ cut here ]------------
[  132.914474] WARNING: CPU: 3 PID: 1795 at drivers/infiniband/ulp/srp/ib_srp.c:1262 srp_map_desc+0x64/ 0x80 [ib_srp]()
[  132.914476] Modules linked in: xfs libcrc32c ib_srp(O) scsi_transport_srp nfsd auth_rpcgss oid_regis
try nfs_acl nfs lockd grace fscache sunrpc intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_
pclmul crc32_pclmul sha256_ssse3 sha256_generic hmac drbg mgag200 ttm ansi_cprng drm_kms_helper aesni_i
ntel drm aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper snd_timer evdev cryptd ipmi_devintf i7
core_edac shpchp iTCO_wdt iTCO_vendor_support psmouse snd soundcore i2c_algo_bit i2c_core lpc_ich serio
_raw edac_core dcdbas acpi_power_meter pcspkr mfd_core acpi_cpufreq button tpm_tis ipmi_si ipmi_msghand
ler tpm ib_ipoib ib_umad rdma_ucm rdma_cm iw_cm ib_uverbs ib_cm mlx4_ib ib_sa ib_mad ib_core ib_addr au
tofs4 ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom ata_generic crc32c_intel ehci_pci uhci_hcd
[  132.914541]  ehci_hcd mptsas ata_piix scsi_transport_sas mptscsih
libata mptbase mlx4_core usbcore s
csi_mod usb_common bnx2
[  132.914552] CPU: 3 PID: 1795 Comm: xfsaild/sdb Tainted: G          IO 4.3.0+ #28
[  132.914554] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 3.0.0 01/31/2011
[  132.914556]  ffffffffa05aa460 ffffffff812c9453 0000000000000000 ffffffff8106ef51
[  132.914559]  ffff880313acfa60 ffff880313acfa30 00000006153efe80 0000000000000001
[  132.914562]  ffff880620a310c0 ffffffffa05a3a14 0801220700000000 ffff880313acfa60
[  132.914566] Call Trace:
[  132.914572]  [<ffffffff812c9453>] ? dump_stack+0x40/0x5d
[  132.914578]  [<ffffffff8106ef51>] ? warn_slowpath_common+0x81/0xb0
[  132.914582]  [<ffffffffa05a3a14>] ? srp_map_desc+0x64/0x80 [ib_srp]
[  132.914585]  [<ffffffffa05a3b89>] ? srp_map_finish_fr+0x159/0x1f0 [ib_srp]
[  132.914589]  [<ffffffffa05a41c1>] ? srp_map_idb.isra.39+0xf1/0x150 [ib_srp]
[  132.914593]  [<ffffffffa05a6961>] ? srp_queuecommand+0xc21/0xc70 [ib_srp]
[  132.914600]  [<ffffffffa009058f>] ? scsi_init_sgtable+0x3f/0x70 [scsi_mod]
[  132.914607]  [<ffffffffa008fc08>] ? scsi_dispatch_cmd+0xd8/0x1f0 [scsi_mod]
[  132.914613]  [<ffffffffa009253a>] ? scsi_request_fn+0x46a/0x600 [scsi_mod]
[  132.914619]  [<ffffffff8129a2ff>] ? __blk_run_queue+0x2f/0x40
[  132.914624]  [<ffffffff812c5a83>] ? cfq_insert_request+0x2f3/0x530
[  132.914628]  [<ffffffff8129fcc5>] ? blk_flush_plug_list+0x1f5/0x220
[  132.914631]  [<ffffffff812a0056>] ? blk_finish_plug+0x26/0x40
[  132.914654]  [<ffffffffa05f4aa2>] ? __xfs_buf_delwri_submit+0x1b2/0x230 [xfs]
[  132.914669]  [<ffffffffa05f575c>] ?
xfs_buf_delwri_submit_nowait+0x1c/0x30 [xfs]
[  132.914683]  [<ffffffffa05f575c>] ? xfs_buf_delwri_submit_nowait+0x1c/0x30 [xfs]
[  132.914696]  [<ffffffffa061c9f8>] ? xfsaild+0x258/0x570 [xfs]
[  132.914710]  [<ffffffffa061c7a0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[  132.914714]  [<ffffffff8108b79f>] ? kthread+0xcf/0xf0
[  132.914717]  [<ffffffff8108b6d0>] ? kthread_park+0x50/0x50
[  132.914721]  [<ffffffff8154d58f>] ? ret_from_fork+0x3f/0x70
[  132.914724]  [<ffffffff8108b6d0>] ? kthread_park+0x50/0x50
[  132.914727] ---[ end trace 5ef1c59be4197e01 ]---
[  132.915191] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff880313f4ca40

and then we end up in a reconnect loop

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-11 15:35   ` Bart Van Assche
  2015-11-11 15:46     ` Christoph Hellwig
  2015-11-11 21:07   ` Bart Van Assche
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-11 15:35 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 09:15 AM, Christoph Hellwig wrote:
> This is a simply xfstests run using XFS on a remote LIO ramdisk.

Hello Christoph,

Which version of the kernel and LIO were installed at the target side ?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-11 15:35   ` Bart Van Assche
@ 2015-11-11 15:46     ` Christoph Hellwig
  2015-11-11 16:03       ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-11 15:46 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Sagi Grimberg, linux-rdma, target-devel

On Wed, Nov 11, 2015 at 07:35:47AM -0800, Bart Van Assche wrote:
> On 11/10/2015 09:15 AM, Christoph Hellwig wrote:
> >This is a simply xfstests run using XFS on a remote LIO ramdisk.
> 
> Hello Christoph,
> 
> Which version of the kernel and LIO were installed at the target side ?

I've tried a couple different one from 4.1-rc3 to latest Linus tree
from yesterday, and the target side doesn't matter.

I've also bisected things down on the initiator side and the changes to
enable prefer_fr and register_always seem to be the culprit at least as
far as 4.3 is concerned.  If I disable both 4.3 is working fine, but
enabling either one causes frequent map failures.  Just rebuilding
a new Linus current tree to see if the options help there as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-11 15:46     ` Christoph Hellwig
@ 2015-11-11 16:03       ` Bart Van Assche
       [not found]         ` <564366E2.8000209-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-11 16:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-rdma, target-devel

On 11/11/2015 07:46 AM, Christoph Hellwig wrote:
> On Wed, Nov 11, 2015 at 07:35:47AM -0800, Bart Van Assche wrote:
>> On 11/10/2015 09:15 AM, Christoph Hellwig wrote:
>>> This is a simply xfstests run using XFS on a remote LIO ramdisk.
>>
>> Hello Christoph,
>>
>> Which version of the kernel and LIO were installed at the target side ?
>
> I've tried a couple different one from 4.1-rc3 to latest Linus tree
> from yesterday, and the target side doesn't matter.
>
> I've also bisected things down on the initiator side and the changes to
> enable prefer_fr and register_always seem to be the culprit at least as
> far as 4.3 is concerned.  If I disable both 4.3 is working fine, but
> enabling either one causes frequent map failures.  Just rebuilding
> a new Linus current tree to see if the options help there as well.

Hello Christoph,

The SRP initiator from kernel 4.3 is working fine on my test setup. I 
will start a test with Linus' tree and with the following SRP kernel 
module parameters:

# cat /etc/modprobe.d/ib_srp.conf
options ib_srp cmd_sg_entries=255 register_always=1 prefer_fr=1

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]         ` <564366E2.8000209-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-11-11 16:18           ` Christoph Hellwig
       [not found]             ` <20151111161845.GA31542-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-11 16:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 11, 2015 at 08:03:46AM -0800, Bart Van Assche wrote:
> Hello Christoph,
> 
> The SRP initiator from kernel 4.3 is working fine on my test setup. I will
> start a test with Linus' tree and with the following SRP kernel module
> parameters:
> 
> # cat /etc/modprobe.d/ib_srp.conf
> options ib_srp cmd_sg_entries=255 register_always=1 prefer_fr=1

I just finished two runs with the current Linus tree.

register_always=N prefer_fr=N finishes fine.

default options isn't happy:

[   69.516617] run fstests generic/001 at 2015-11-11 08:07:39
[   73.649108] XFS (sdb): Unmounting Filesystem
[   73.649265] ------------[ cut here ]------------
[   73.649276] WARNING: CPU: 2 PID: 1785 at drivers/infiniband/ulp/srp/ib_srp.c:1260 srp_map_desc+0x64/0x80 [ib_srp]()
[   73.649278] Modules linked in: xfs libcrc32c ib_srp scsi_transport_srp nfsd auth_rpcgss oid_regi stry nfs_acl nfs lockd grace fscache sunrpc intel_powerclamp coretemp kvm_intel kvm irqbypass crct1 0dif_pclmul crc32_pclmul sha256_ssse3 sha256_generic hmac drbg ansi_cprng snd_pcm mgag200 ttm drm_k ms_helper aesni_intel snd_timer drm aes_x86_64 snd lrw gf128mul psmouse iTCO_wdt iTCO_vendor_support glue_helper soundcore evdev acpi_power_meter ablk_helper i2c_algo_bit lpc_ich serio_raw dcdbas i7core_edac ipmi_devintf pcspkr i2c_core cryptd edac_core shpchp button mfd_core ipmi_si ipmi_msghandler acpi_cpufreq tpm_tis tpm ib_ipoib ib_umad rdma_ucm rdma_cm iw_cm ib_uverbs ib_cm mlx4_ib ib_sa ib_mad ib_core ib_addr autofs4 ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom ata_generic crc32c_intel m
 ptsas ata_piix
[   73.649342]  ehci_pci uhci_hcd scsi_transport_sas ehci_hcd mptscsih libata mptbase mlx4_core usb core scsi_mod usb_common bnx2
[   73.649353] CPU: 2 PID: 1785 Comm: xfsaild/sdb Tainted: G          I 4.3.0+ #32
[   73.649356] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 3.0.0 01/31/2011
[   73.649358]  ffffffffa04c2460 ffffffff812c9d63 0000000000000000 ffffffff8106ef61
[   73.649361]  ffff8800c94a3a60 ffff8800c94a3a30 00000006102ad1c0 0000000000000001
[   73.649364]  ffff88002dd4c300 ffffffffa04bba14 0801086e00000000 ffff8800c94a3a60 
[   73.649367] Call Trace:
[   73.649376]  [<ffffffff812c9d63>] ? dump_stack+0x40/0x5d
[   73.649383]  [<ffffffff8106ef61>] ? warn_slowpath_common+0x81/0xb0
[   73.649387]  [<ffffffffa04bba14>] ? srp_map_desc+0x64/0x80 [ib_srp]
[   73.649391]  [<ffffffffa04bbb89>] ? srp_map_finish_fr+0x159/0x1f0
[ib_srp]
[   73.649394]  [<ffffffffa04bc1b1>] ? srp_map_idb.isra.39+0xf1/0x150
[ib_srp]
[   73.649398]  [<ffffffffa04be951>] ? srp_queuecommand+0xc21/0xc70
[ib_srp]
[   73.649405]  [<ffffffffa009e58f>] ? scsi_init_sgtable+0x3f/0x70
[scsi_mod]
[   73.649412]  [<ffffffffa009dc08>] ? scsi_dispatch_cmd+0xd8/0x1f0
[scsi_mod]
[   73.649418]  [<ffffffffa00a053a>] ? scsi_request_fn+0x46a/0x600
[scsi_mod]
[   73.649425]  [<ffffffff8129a94f>] ? __blk_run_queue+0x2f/0x40
[   73.649432]  [<ffffffff812c6393>] ? cfq_insert_request+0x2f3/0x530
[   73.649436]  [<ffffffff812a0315>] ? blk_flush_plug_list+0x1f5/0x220
[   73.649440]  [<ffffffff812a06a6>] ? blk_finish_plug+0x26/0x40
[   73.649459]  [<ffffffffa06b1aa2>] ?
__xfs_buf_delwri_submit+0x1b2/0x230 [xfs]
[   73.649474]  [<ffffffffa06b275c>] ?
xfs_buf_delwri_submit_nowait+0x1c/0x30 [xfs]
[   73.649489]  [<ffffffffa06b275c>] ?
xfs_buf_delwri_submit_nowait+0x1c/0x30 [xfs]
[   73.649503]  [<ffffffffa06d99f8>] ? xfsaild+0x258/0x570 [xfs]
[   73.649516]  [<ffffffffa06d97a0>] ?
xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[   73.649520]  [<ffffffff8108b7af>] ? kthread+0xcf/0xf0
[   73.649524]  [<ffffffff8108b6e0>] ? kthread_park+0x50/0x50
[   73.649528]  [<ffffffff8154dc0f>] ? ret_from_fork+0x3f/0x70
[   73.649531]  [<ffffffff8108b6e0>] ? kthread_park+0x50/0x50
[   73.649534] ---[ end trace 2b5af006a2490f6b ]---
[   73.650000] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f84e780
[   83.772430] scsi host3: ib_srp: reconnect succeeded
[   83.780566] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[   93.900380] scsi host3: ib_srp: reconnect succeeded
[   93.908355] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[  104.027018] scsi host3: ib_srp: reconnect succeeded
[  104.036106] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[  114.155391] scsi host3: ib_srp: reconnect succeeded
[  114.163894] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[  124.285540] scsi host3: ib_srp: reconnect succeeded
[  124.296033] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[  134.416857] scsi host3: ib_srp: reconnect succeeded
[  134.427546] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
[  144.548077] scsi host3: ib_srp: reconnect succeeded
[  144.555542] scsi host3: ib_srp: failed receive status WR flushed (5)
for iu ffff88060f801cc0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]             ` <20151111161845.GA31542-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-11 16:34               ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2015-11-11 16:34 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA



On 11/11/2015 18:18, Christoph Hellwig wrote:
> On Wed, Nov 11, 2015 at 08:03:46AM -0800, Bart Van Assche wrote:
>> Hello Christoph,
>>
>> The SRP initiator from kernel 4.3 is working fine on my test setup. I will
>> start a test with Linus' tree and with the following SRP kernel module
>> parameters:
>>
>> # cat /etc/modprobe.d/ib_srp.conf
>> options ib_srp cmd_sg_entries=255 register_always=1 prefer_fr=1
>
> I just finished two runs with the current Linus tree.
>
> register_always=N prefer_fr=N finishes fine.
>
> default options isn't happy:

I was able to see those too (with the registration changes). Do you
have the registration changes applied?

For some reason the dma_len is 0 when coming from the indirect
descriptor srp_map_idb..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-11 15:35   ` Bart Van Assche
@ 2015-11-11 21:07   ` Bart Van Assche
  2015-11-12 17:59     ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-11 21:07 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA

On 11/10/2015 09:15 AM, Christoph Hellwig wrote:
> scsi host3: ib_srp: failed receive status WR flushed (5) for iu ffff880313f4ca40

Can you also post the logs from the target system from around the time 
this message was logged on the initiator system ? Usually this message 
means that the target system closed a QP. I'm looking for messages 
generated by the following statement in ib_srpt.c:

         pr_info("RDMA t %d for idx %u failed with status %d\n", opcode,
                 index, wc->status);

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-11 21:07   ` Bart Van Assche
@ 2015-11-12 17:59     ` Christoph Hellwig
  2015-11-13  1:12       ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-12 17:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma, target-devel

On Wed, Nov 11, 2015 at 01:07:44PM -0800, Bart Van Assche wrote:
> On 11/10/2015 09:15 AM, Christoph Hellwig wrote:
> >scsi host3: ib_srp: failed receive status WR flushed (5) for iu ffff880313f4ca40
> 
> Can you also post the logs from the target system from around the time this
> message was logged on the initiator system ? Usually this message means that
> the target system closed a QP. I'm looking for messages generated by the
> following statement in ib_srpt.c:

None of those as far as I can tell:

[  108.896609] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.903032] ib_srpt Session : kernel thread ib_srpt_compl (PID 1258) started
[  108.910776] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.916901] ib_srpt Session : kernel thread ib_srpt_compl (PID 1259) started
[  108.924338] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.929974] ib_srpt Session : kernel thread ib_srpt_compl (PID 1260) started
[  108.937428] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.943112] ib_srpt Session : kernel thread ib_srpt_compl (PID 1261) started
[  108.950896] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.956066] ib_srpt Session : kernel thread ib_srpt_compl (PID 1262) started
[  108.964064] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.969751] ib_srpt Session : kernel thread ib_srpt_compl (PID 1263) started
[  108.977249] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.982943] ib_srpt Session : kernel thread ib_srpt_compl (PID 1264) started
[  108.990002] ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0xf4521403007be042, t_port_id 0x2c903009f83a0:0x2c903009f83a0 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f83a2)
[  108.995576] ib_srpt Session : kernel thread ib_srpt_compl (PID 1265) started
[  108.998564] ------------[ cut here ]------------
[  108.998574] WARNING: CPU: 0 PID: 1258 at kernel/sched/core.c:7389 __might_sleep+0xa7/0xb0()
[  108.998580] do not call blocking ops when !TASK_RUNNING; state=1 set
at [<ffffffff810cad4d>] prepare_to_wait_event+0x5d/0x100
[  108.998582] Modules linked in: ib_srpt target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod configfs nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mgag200 ttm sha256_ssse3 sha256_generic snd_pcm hmac drm_kms_helper drbg ansi_cprng drm snd_timer aesni_intel aes_x86_64 i2c_algo_bit lrw gf128mul snd soundcore glue_helper i2c_core ablk_helper cryptd psmouse ipmi_devintf iTCO_wdt dcdbas tpm_tis evdev iTCO_vendor_support serio_raw i7core_edac acpi_power_meter pcspkr ipmi_si tpm ib_ipoib shpchp ipmi_msghandler edac_core button acpi_cpufreq lpc_ich mfd_core ib_umad rdma_ucm rdma_cm iw_cm processor ib_uverbs ib_cm mlx4_ib ib_sa ib_mad ib_core ib_addr parport_pc
  ppdev lp
[  108.998652]  parport autofs4 ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom ata_generic ata_piix crc32c_intel libata mptsas ehci_pci uhci_hcd scsi_transport_sas mptscsih ehci_hcd mptbase usbcore mlx4_core usb_common scsi_mod nvme bnx2
[  108.998677] CPU: 0 PID: 1258 Comm: ib_srpt_compl Tainted: G
I     4.2.0 #34
[  108.998679] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 3.0.0 01/31/2011
[  108.998681]  ffffffff81a01b6f ffff88061374ba28 ffffffff81621a32 0000000000000000
[  108.998685]  ffff88061374ba78 ffff88061374ba68 ffffffff8107d72a ffff88061374bb18
[  108.998689]  ffffffff81a11775 0000000000000b30 0000000000000000 00000000000000d0
[  108.998692] Call Trace:
[  108.998699]  [<ffffffff81621a32>] dump_stack+0x4c/0x65
[  108.998704]  [<ffffffff8107d72a>] warn_slowpath_common+0x8a/0xc0
[  108.998707]  [<ffffffff8107d7a6>] warn_slowpath_fmt+0x46/0x50
[  108.998710]  [<ffffffff810cad4d>] ? prepare_to_wait_event+0x5d/0x100
[  108.998713]  [<ffffffff810cad4d>] ? prepare_to_wait_event+0x5d/0x100
[  108.998716]  [<ffffffff810aa9a7>] __might_sleep+0xa7/0xb0
[  108.998720]  [<ffffffff811f66b4>] __kmalloc+0x1f4/0x680
[  108.998737]  [<ffffffffa068e0d9>] ? target_check_reservation+0x69/0x720 [target_core_mod]
[  108.998747]  [<ffffffffa068fb59>] ? target_alua_state_check+0xa9/0x480 [target_core_mod]
[  108.998758]  [<ffffffffa06998ed>] ? target_alloc_sgl+0x4d/0x160 [target_core_mod]
[  108.998769]  [<ffffffffa06998ed>] target_alloc_sgl+0x4d/0x160 [target_core_mod]
[  108.998780]  [<ffffffffa0699aa6>] transport_generic_new_cmd+0xa6/0x230 [target_core_mod]
[  108.998791]  [<ffffffffa0699c6a>] transport_handle_cdb_direct+0x3a/0x90 [target_core_mod]
[  108.998801]  [<ffffffffa0699e1e>] target_submit_cmd_map_sgls+0x15e/0x270 [target_core_mod]
[  108.998812]  [<ffffffffa0699f89>] target_submit_cmd+0x59/0x60 [target_core_mod]
[  108.998817]  [<ffffffffa057b6f1>] srpt_handle_new_iu+0x2a1/0x6c0 [ib_srpt]
[  108.998821]  [<ffffffffa057bbd9>] srpt_process_completion+0xc9/0x4d0 [ib_srpt]
[  108.998826]  [<ffffffffa057c0f6>] srpt_compl_thread+0x116/0x140 [ib_srpt]
[  108.998828]  [<ffffffff810cac40>] ? wait_woken+0xb0/0xb0
[  108.998832]  [<ffffffffa057bfe0>] ? srpt_process_completion+0x4d0/0x4d0 [ib_srpt]
[  108.998836]  [<ffffffff810a3819>] kthread+0x119/0x130
[  108.998839]  [<ffffffff810a3700>] ? kthread_create_on_node+0x240/0x240
[  108.998843]  [<ffffffff8162aa5f>] ret_from_fork+0x3f/0x70
[  108.998845]  [<ffffffff810a3700>] ? kthread_create_on_node+0x240/0x240
[  108.998848] ---[ end trace b96d3b34157d1df6 ]---
[  134.094662] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094717] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094746] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094794] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094826] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094866] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094904] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.094941] ib_srpt Received DREQ and sent DREP for session 0x0000000000000000f4521403007be042.
[  134.158698] ib_srpt Received SRP_LOGIN_REQ with i_port_id


.. and so on

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-12 17:59     ` Christoph Hellwig
@ 2015-11-13  1:12       ` Bart Van Assche
       [not found]         ` <564538EE.3050805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-13  1:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-rdma, target-devel

On 11/12/2015 09:59 AM, Christoph Hellwig wrote:
> [  108.998574] WARNING: CPU: 0 PID: 1258 at kernel/sched/core.c:7389 __might_sleep+0xa7/0xb0()
> [  108.998580] do not call blocking ops when !TASK_RUNNING; state=1 set

Although this is most likely unrelated to the issue reported at the 
start of this thread, I have started working on a patch for this warning.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]         ` <564538EE.3050805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-11-13  6:48           ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-13  6:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Thu, Nov 12, 2015 at 05:12:14PM -0800, Bart Van Assche wrote:
> On 11/12/2015 09:59 AM, Christoph Hellwig wrote:
> >[  108.998574] WARNING: CPU: 0 PID: 1258 at kernel/sched/core.c:7389 __might_sleep+0xa7/0xb0()
> >[  108.998580] do not call blocking ops when !TASK_RUNNING; state=1 set
> 
> Although this is most likely unrelated to the issue reported at the start of
> this thread, I have started working on a patch for this warning.

It's unrelated and fixed in the SCST version.  But don't bother, I have
a series that gets rid of the srpt completion thread entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-10 17:15 srp state in current mainline Christoph Hellwig
       [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-15 18:06 ` Christoph Hellwig
       [not found]   ` <20151115180621.GA24488-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-22 13:53 ` Christoph Hellwig
  2 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-15 18:06 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche; +Cc: linux-rdma, target-devel

FYI, I sent a patch for the zero S/G length issue.  With this xfstests
does fine for ext4 and btrfs.  With XFS I still run into corruption
warnings for the slab use after free poison pattern.  I suspect that
issue might be related to uniqueue XFS I/O patterns.  One thing that
might be related is that XFS can submit bios backed by kmalloced memory.

I've also tested XFS on various other storage devices on the same kernel
and didn't see an issue like that just to be sure.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]   ` <20151115180621.GA24488-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-15 19:48     ` Bart Van Assche
       [not found]       ` <5648E178.9010106-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-15 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA

On 11/15/15 10:06, Christoph Hellwig wrote:
> FYI, I sent a patch for the zero S/G length issue.  With this xfstests
> does fine for ext4 and btrfs.  With XFS I still run into corruption
> warnings for the slab use after free poison pattern.  I suspect that
> issue might be related to uniqueue XFS I/O patterns.  One thing that
> might be related is that XFS can submit bios backed by kmalloced memory.
>
> I've also tested XFS on various other storage devices on the same kernel
> and didn't see an issue like that just to be sure.

Thanks for submitting that patch.

Did I understand correctly that page-aligned I/O works fine but I/O that 
is not aligned on a page boundary not ? Have you already had the time to 
verify whether the "IB/srp: Convert to new registration API" patch is 
the patch that introduced this issue ?

BTW, another location where a buffer is used that is not aligned on a 
page boundary is the scsi_probe_and_add_lun() function. If a crash 
occurs while probing LUNs or lsscsi shows garbled data for SRP LUNs that 
might indicate a problem in the buffer mapping or registration code.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]       ` <5648E178.9010106-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-11-16  8:38         ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-16  8:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Sun, Nov 15, 2015 at 11:48:08AM -0800, Bart Van Assche wrote:
> Did I understand correctly that page-aligned I/O works fine but I/O that is
> not aligned on a page boundary not ? Have you already had the time to verify
> whether the "IB/srp: Convert to new registration API" patch is the patch
> that introduced this issue ?

Hi Bart,

I haven't figured out what doesn't work.  This was just a wild guess
what could be different in XFS.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-10 17:15 srp state in current mainline Christoph Hellwig
       [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-11-15 18:06 ` Christoph Hellwig
@ 2015-11-22 13:53 ` Christoph Hellwig
  2015-11-22 14:32   ` Christoph Hellwig
  2 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-22 13:53 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche; +Cc: linux-rdma, target-devel

So I had some time to go back and look at the 4.3 failures.  They
fail when srp_fr_pool_get called from srp_map_idb through
srp_map_finish_fr fails with -ENOMEM, and go away with
register_always=N.  Looks like the additional FR for the indirect
buffer isn't accounted anywhere.

To me this sounds like another argument to just allocate one FR
per request and don't allow non-contiguous SGLs.

Also note that 4.4-rc  prefer_fr=y register_always=n
!register_always still blows up badly with XFS and ext4 due to
data integrity errors.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-22 13:53 ` Christoph Hellwig
@ 2015-11-22 14:32   ` Christoph Hellwig
  2015-11-22 14:55     ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-22 14:32 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche; +Cc: linux-rdma, target-devel

On Sun, Nov 22, 2015 at 05:53:43AM -0800, Christoph Hellwig wrote:
> To me this sounds like another argument to just allocate one FR
> per request and don't allow non-contiguous SGLs.
> 
> Also note that 4.4-rc  prefer_fr=y register_always=n
> !register_always still blows up badly with XFS and ext4 due to
> data integrity errors.

A little instrumention revealed that the corrupted buffers are
correlated to the 'gap' case in ib_sg_to_pages.  I haven't found
anything obvious yet, but ading this line to srp_slave_configure:

	blk_queue_virt_boundary(q, target->srp_host->srp_dev->mr_page_size - 1);

and thus avoiding the gap cases makes xfstests pass fine for me, with or
without the register_always parameter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-22 14:32   ` Christoph Hellwig
@ 2015-11-22 14:55     ` Sagi Grimberg
       [not found]       ` <5651D775.9090500-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2015-11-22 14:55 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche; +Cc: linux-rdma, target-devel



On 22/11/2015 16:32, Christoph Hellwig wrote:
> On Sun, Nov 22, 2015 at 05:53:43AM -0800, Christoph Hellwig wrote:
>> To me this sounds like another argument to just allocate one FR
>> per request and don't allow non-contiguous SGLs.
>>
>> Also note that 4.4-rc  prefer_fr=y register_always=n
>> !register_always still blows up badly with XFS and ext4 due to
>> data integrity errors.

So the register_always=N makes bad things happen? if we register
all the frags we're OK (i.e. register_always=Y)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]       ` <5651D775.9090500-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-11-22 15:10         ` Christoph Hellwig
  2015-11-22 15:26           ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-22 15:10 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Sun, Nov 22, 2015 at 04:55:49PM +0200, Sagi Grimberg wrote:
> >>Also note that 4.4-rc  prefer_fr=y register_always=n
> >>!register_always still blows up badly with XFS and ext4 due to
> >>data integrity errors.
> 
> So the register_always=N makes bad things happen? if we register
> all the frags we're OK (i.e. register_always=Y)?

No. register_always=Y is already broken in 4.3, but register_always=N is
now also broken in 4.4.

The 4.3+ issue that only affects register_always=Y is the FR mapping
of the indirect buffer that is not accounted for.

The 4.4+ issue that also affects register_always=N is something in your
MR patches when we hit a gappy SG list.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-22 15:10         ` Christoph Hellwig
@ 2015-11-22 15:26           ` Sagi Grimberg
  2015-11-22 15:31             ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2015-11-22 15:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, linux-rdma, target-devel



On 22/11/2015 17:10, Christoph Hellwig wrote:
> On Sun, Nov 22, 2015 at 04:55:49PM +0200, Sagi Grimberg wrote:
>>>> Also note that 4.4-rc  prefer_fr=y register_always=n
>>>> !register_always still blows up badly with XFS and ext4 due to
>>>> data integrity errors.
>>
>> So the register_always=N makes bad things happen? if we register
>> all the frags we're OK (i.e. register_always=Y)?
>
> No. register_always=Y is already broken in 4.3, but register_always=N is
> now also broken in 4.4.

OK, I'm confused so please let me understand slowly :)

Your patch "ib_srp: initialize dma_length in srp_map_idb" solves
the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve
the data integrity errors too?

> The 4.3+ issue that only affects register_always=Y is the FR mapping
> of the indirect buffer that is not accounted for.

So it fails because there are no free FRs to use. I see.


> The 4.4+ issue that also affects register_always=N is something in your
> MR patches when we hit a gappy SG list.

This is something I'd need to look at. I tested this code path using
Bart's discontigiuous_io utility and didn't see problems.

This code path is specific to srp because all other ULPs guarantee
no-gaps...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-22 15:26           ` Sagi Grimberg
@ 2015-11-22 15:31             ` Christoph Hellwig
       [not found]               ` <20151122153106.GA26919-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-11-22 15:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Bart Van Assche, linux-rdma, target-devel

On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote:
> >No. register_always=Y is already broken in 4.3, but register_always=N is
> >now also broken in 4.4.
> 
> OK, I'm confused so please let me understand slowly :)
> 
> Your patch "ib_srp: initialize dma_length in srp_map_idb" solves
> the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve
> the data integrity errors too?

No, it doesn't.

> This code path is specific to srp because all other ULPs guarantee
> no-gaps...

Yes.  Life would be simpler if we could just set the virt_boundary
on SRP, and Bart has indicated that he's willing to at least looks at
this for the next merge window.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]               ` <20151122153106.GA26919-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-11-24  1:14                 ` Bart Van Assche
  2015-11-25 19:34                   ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-24  1:14 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA

On 11/22/2015 07:31 AM, Christoph Hellwig wrote:
> On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote:
>>> No. register_always=Y is already broken in 4.3, but register_always=N is
>>> now also broken in 4.4.
>>
>> OK, I'm confused so please let me understand slowly :)
>>
>> Your patch "ib_srp: initialize dma_length in srp_map_idb" solves
>> the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve
>> the data integrity errors too?
>
> No, it doesn't.
>
>> This code path is specific to srp because all other ULPs guarantee
>> no-gaps...
>
> Yes.  Life would be simpler if we could just set the virt_boundary
> on SRP, and Bart has indicated that he's willing to at least looks at
> this for the next merge window.

Hello Christoph,

Tomorrow I will try to reproduce this behavior on my test setup. I 
prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and 
target are running on the same server. Tomorrow I will install xfstests 
and see whether these tests pass fine against an XFS filesystem.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
  2015-11-24  1:14                 ` Bart Van Assche
@ 2015-11-25 19:34                   ` Bart Van Assche
       [not found]                     ` <56560D46.1070904-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-11-25 19:34 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: linux-rdma, target-devel

On 11/23/2015 05:14 PM, Bart Van Assche wrote:
> On 11/22/2015 07:31 AM, Christoph Hellwig wrote:
>> On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote:
>>>> No. register_always=Y is already broken in 4.3, but
>>>> register_always=N is
>>>> now also broken in 4.4.
>>>
>>> OK, I'm confused so please let me understand slowly :)
>>>
>>> Your patch "ib_srp: initialize dma_length in srp_map_idb" solves
>>> the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve
>>> the data integrity errors too?
>>
>> No, it doesn't.
>>
>>> This code path is specific to srp because all other ULPs guarantee
>>> no-gaps...
>>
>> Yes.  Life would be simpler if we could just set the virt_boundary
>> on SRP, and Bart has indicated that he's willing to at least looks at
>> this for the next merge window.
>
> Hello Christoph,
>
> Tomorrow I will try to reproduce this behavior on my test setup. I
> prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and
> target are running on the same server. Tomorrow I will install xfstests
> and see whether these tests pass fine against an XFS filesystem.

(replying to my own e-mail)

I can reproduce this behavior with both LIO and SCST. I have modified 
initiator and target code such that the target appends a data CRC at the 
end of the SRP_RSP IU and such that the initiator checks that CRC. A CRC 
mismatch was reported for the following SG-list by the initiator:

scsi host10: srp_check_crc: bufflen 1024; resid 0; sg-list len 2; dir 
DMA_TO_DEVICE; CRC mismatch (0x7916620b <> 0xde97b796); sg-list:
[0] ffff880407378348 len 512
[1] ffff880407378000 len 512

I will check the memory registration code next.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: srp state in current mainline
       [not found]                     ` <56560D46.1070904-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-11-26  1:17                       ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2015-11-26  1:17 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11/25/2015 11:34 AM, Bart Van Assche wrote:
> On 11/23/2015 05:14 PM, Bart Van Assche wrote:
>> On 11/22/2015 07:31 AM, Christoph Hellwig wrote:
>>> On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote:
>>>>> No. register_always=Y is already broken in 4.3, but
>>>>> register_always=N is
>>>>> now also broken in 4.4.
>>>>
>>>> OK, I'm confused so please let me understand slowly :)
>>>>
>>>> Your patch "ib_srp: initialize dma_length in srp_map_idb" solves
>>>> the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve
>>>> the data integrity errors too?
>>>
>>> No, it doesn't.
>>>
>>>> This code path is specific to srp because all other ULPs guarantee
>>>> no-gaps...
>>>
>>> Yes.  Life would be simpler if we could just set the virt_boundary
>>> on SRP, and Bart has indicated that he's willing to at least looks at
>>> this for the next merge window.
>>
>> Hello Christoph,
>>
>> Tomorrow I will try to reproduce this behavior on my test setup. I
>> prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and
>> target are running on the same server. Tomorrow I will install xfstests
>> and see whether these tests pass fine against an XFS filesystem.
> 
> (replying to my own e-mail)
> 
> I can reproduce this behavior with both LIO and SCST. I have modified
> initiator and target code such that the target appends a data CRC at the
> end of the SRP_RSP IU and such that the initiator checks that CRC. A CRC
> mismatch was reported for the following SG-list by the initiator:
> 
> scsi host10: srp_check_crc: bufflen 1024; resid 0; sg-list len 2; dir
> DMA_TO_DEVICE; CRC mismatch (0x7916620b <> 0xde97b796); sg-list:
> [0] ffff880407378348 len 512
> [1] ffff880407378000 len 512
> 
> I will check the memory registration code next.

(again replying to my own e-mail)

The patch below helps somewhat but is not sufficient to make the XFS tests
pass. On Monday I will continue to work on this.

Bart.

[PATCH] IB core: Fix ib_sg_to_pages()

Fix the code for detecting gaps and disable the code for chunking.

Fixes: "IB/core: Introduce new fast registration API" (commit 4c67e2bfc8b7)
Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..3f9c820 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1530,7 +1530,6 @@ int ib_sg_to_pages(struct ib_mr *mr,
 		   int (*set_page)(struct ib_mr *, u64))
 {
 	struct scatterlist *sg;
-	u64 last_end_dma_addr = 0, last_page_addr = 0;
 	unsigned int last_page_off = 0;
 	u64 page_mask = ~((u64)mr->page_size - 1);
 	int i;
@@ -1544,23 +1543,9 @@ int ib_sg_to_pages(struct ib_mr *mr,
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
 
-		if (i && page_addr != dma_addr) {
-			if (last_end_dma_addr != dma_addr) {
-				/* gap */
-				goto done;
-
-			} else if (last_page_off + dma_len <= mr->page_size) {
-				/* chunk this fragment with the last */
-				mr->length += dma_len;
-				last_end_dma_addr += dma_len;
-				last_page_off += dma_len;
-				continue;
-			} else {
-				/* map starting from the next page */
-				page_addr = last_page_addr + mr->page_size;
-				dma_len -= mr->page_size - last_page_off;
-			}
-		}
+		/* gap */
+		if (i && (last_page_off || page_addr != dma_addr))
+			goto done;
 
 		do {
 			if (unlikely(set_page(mr, page_addr)))
@@ -1569,8 +1554,6 @@ int ib_sg_to_pages(struct ib_mr *mr,
 		} while (page_addr < end_dma_addr);
 
 		mr->length += dma_len;
-		last_end_dma_addr = end_dma_addr;
-		last_page_addr = end_dma_addr & page_mask;
 		last_page_off = end_dma_addr & ~page_mask;
 	}
 
-- 
2.1.4



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-11-26  1:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10 17:15 srp state in current mainline Christoph Hellwig
     [not found] ` <20151110171509.GA27781-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-11 15:35   ` Bart Van Assche
2015-11-11 15:46     ` Christoph Hellwig
2015-11-11 16:03       ` Bart Van Assche
     [not found]         ` <564366E2.8000209-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-11-11 16:18           ` Christoph Hellwig
     [not found]             ` <20151111161845.GA31542-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-11 16:34               ` Sagi Grimberg
2015-11-11 21:07   ` Bart Van Assche
2015-11-12 17:59     ` Christoph Hellwig
2015-11-13  1:12       ` Bart Van Assche
     [not found]         ` <564538EE.3050805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-11-13  6:48           ` Christoph Hellwig
2015-11-15 18:06 ` Christoph Hellwig
     [not found]   ` <20151115180621.GA24488-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-15 19:48     ` Bart Van Assche
     [not found]       ` <5648E178.9010106-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-11-16  8:38         ` Christoph Hellwig
2015-11-22 13:53 ` Christoph Hellwig
2015-11-22 14:32   ` Christoph Hellwig
2015-11-22 14:55     ` Sagi Grimberg
     [not found]       ` <5651D775.9090500-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-11-22 15:10         ` Christoph Hellwig
2015-11-22 15:26           ` Sagi Grimberg
2015-11-22 15:31             ` Christoph Hellwig
     [not found]               ` <20151122153106.GA26919-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-11-24  1:14                 ` Bart Van Assche
2015-11-25 19:34                   ` Bart Van Assche
     [not found]                     ` <56560D46.1070904-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-11-26  1:17                       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.