All of lore.kernel.org
 help / color / mirror / Atom feed
* T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
@ 2016-04-08 22:15 Christoph Hellwig
  2016-04-08 23:48 ` Nicholas A. Bellinger
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-08 22:15 UTC (permalink / raw)
  To: sagig; +Cc: linux-rdma, target-devel

Hi all,

I've recently started looking into adding support for signature MRs
in the RDMA R/W API, but it seems like it's already broken upstream.

I suspect it's the target that's at fault, but I'm not sure.  I've
tested 4.5 and todays 4.6-rc tree on both the target and the client
side.

The setup is pretty simple: on the target I've loaded scsi_debug
with dif=1 and dix=1, and set I've set fabric_prot_type=1, t10_pi=1
and ImmediateData to No on the tpg, and then did a simple mkfs.ext4.

Just doing the login I get the following error on the target:

[  369.890182] Bad block size given: 0
[  369.894131] mlx5_0:mlx5_ib_post_send:3296:(pid 331): 
[  369.894228] isert: isert_reg_sig_mr: fast registration failed, ret:-22
[  369.901580] isert: isert_handle_prot_cmd: conn ffff880904d54000 failed to fast reg mr
[  369.906348] iSCSI/iqn.1993-08.org.debian:01:197dad5d1f17: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[  369.910395] isert: isert_put_datain: Cmd: ffff880902b26b00 failed to prepare RDMA res

If I then do an actual I/O think go downhill quickly.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-08 22:15 T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc Christoph Hellwig
@ 2016-04-08 23:48 ` Nicholas A. Bellinger
  2016-04-09  0:14   ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2016-04-08 23:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sagig, linux-rdma, target-devel

On Fri, 2016-04-08 at 15:15 -0700, Christoph Hellwig wrote:
> Hi all,
> 
> I've recently started looking into adding support for signature MRs
> in the RDMA R/W API, but it seems like it's already broken upstream.
> 
> I suspect it's the target that's at fault, but I'm not sure.  I've
> tested 4.5 and todays 4.6-rc tree on both the target and the client
> side.
> 
> The setup is pretty simple: on the target I've loaded scsi_debug
> with dif=1 and dix=1, and set I've set fabric_prot_type=1, t10_pi=1
> and ImmediateData to No on the tpg, and then did a simple mkfs.ext4.
> 
> Just doing the login I get the following error on the target:
> 
> [  369.890182] Bad block size given: 0
> [  369.894131] mlx5_0:mlx5_ib_post_send:3296:(pid 331): 
> [  369.894228] isert: isert_reg_sig_mr: fast registration failed, ret:-22
> [  369.901580] isert: isert_handle_prot_cmd: conn ffff880904d54000 failed to fast reg mr
> [  369.906348] iSCSI/iqn.1993-08.org.debian:01:197dad5d1f17: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
> [  369.910395] isert: isert_put_datain: Cmd: ffff880902b26b00 failed to prepare RDMA res
> 
> If I then do an actual I/O think go downhill quickly.

It looks like the isert_set_dif_domain() assignments of
ib_sig_domain->sig.dif.* from dev->dev_attrib.block_size + friends is
being zeroed before making it into mlx5_ib_post_send() ->
IB_WR_REG_SIG_MR -> set_sig_umr_wr() -> set_sig_data_segment().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-08 23:48 ` Nicholas A. Bellinger
@ 2016-04-09  0:14   ` Christoph Hellwig
  2016-04-09  2:36     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-09  0:14 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: Christoph Hellwig, sagig, linux-rdma, target-devel

On Fri, Apr 08, 2016 at 04:48:10PM -0700, Nicholas A. Bellinger wrote:
> It looks like the isert_set_dif_domain() assignments of
> ib_sig_domain->sig.dif.* from dev->dev_attrib.block_size + friends is
> being zeroed before making it into mlx5_ib_post_send() ->
> IB_WR_REG_SIG_MR -> set_sig_umr_wr() -> set_sig_data_segment().

>From my limited understanding that's not the case, even if the weird
printk about the bad block size might suggest it.

I think what we have is the TARGET_PROT_DIN_INSERT /
TARGET_PROT_DOUT_STRIP case (probably the first), and thus
sig_attrs->mem.sig_type is set to IB_SIG_TYPE_NONE, but for some
reason the prot pointer is set, which will lead to this misleading
printk.

Does anyone know a recent working version of T10 PI in isert?  There
aren't any recent-ish changes that look related.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-09  0:14   ` Christoph Hellwig
@ 2016-04-09  2:36     ` Nicholas A. Bellinger
  2016-04-09  4:58       ` Christoph Hellwig
       [not found]       ` <1460169374.5010.4.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  0 siblings, 2 replies; 9+ messages in thread
From: Nicholas A. Bellinger @ 2016-04-09  2:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sagig, linux-rdma, target-devel

On Fri, 2016-04-08 at 17:14 -0700, Christoph Hellwig wrote:
> On Fri, Apr 08, 2016 at 04:48:10PM -0700, Nicholas A. Bellinger wrote:
> > It looks like the isert_set_dif_domain() assignments of
> > ib_sig_domain->sig.dif.* from dev->dev_attrib.block_size + friends is
> > being zeroed before making it into mlx5_ib_post_send() ->
> > IB_WR_REG_SIG_MR -> set_sig_umr_wr() -> set_sig_data_segment().
> 
> From my limited understanding that's not the case, even if the weird
> printk about the bad block size might suggest it.
> 
> I think what we have is the TARGET_PROT_DIN_INSERT /
> TARGET_PROT_DOUT_STRIP case (probably the first), and thus
> sig_attrs->mem.sig_type is set to IB_SIG_TYPE_NONE, but for some
> reason the prot pointer is set, which will lead to this misleading
> printk.

Ah, yes.  However, TARGET_PROT_DIN_INSERT / TARGET_PROT_DOUT_STRIP
will only be happening in sbc_set_prot_op_checks() if fabric_prot = true
for a backend that does not support PI.

Eg: fabric_prot is for the special case where the fabric supports PI,
but the backend does not, and the normal feature bits a device would
expose for PI are emulated based upon the fabric protection features.

Since your backend does support PI, cmd->prot_op should always be
TARGET_PROT_*_PASS regardless.

The other thing to check is that isert_get_sup_prot_ops() is returning
TARGET_PROT_ALL when tpg->tpg_attrib.t10_pi = true.

> 
> Does anyone know a recent working version of T10 PI in isert?  There
> aren't any recent-ish changes that look related.

Sagi..?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-09  2:36     ` Nicholas A. Bellinger
@ 2016-04-09  4:58       ` Christoph Hellwig
       [not found]         ` <20160409045846.GA9269-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
       [not found]       ` <1460169374.5010.4.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-09  4:58 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: sagig, linux-rdma, target-devel

On Fri, Apr 08, 2016 at 07:36:14PM -0700, Nicholas A. Bellinger wrote:
> Ah, yes.  However, TARGET_PROT_DIN_INSERT / TARGET_PROT_DOUT_STRIP
> will only be happening in sbc_set_prot_op_checks() if fabric_prot = true
> for a backend that does not support PI.
> 
> Eg: fabric_prot is for the special case where the fabric supports PI,
> but the backend does not, and the normal feature bits a device would
> expose for PI are emulated based upon the fabric protection features.
> 
> Since your backend does support PI, cmd->prot_op should always be
> TARGET_PROT_*_PASS regardless.
> 
> The other thing to check is that isert_get_sup_prot_ops() is returning
> TARGET_PROT_ALL when tpg->tpg_attrib.t10_pi = true.

Ok, it turns out this problem was a second LUN that doesn't support
PI.  So the reported bug is for a this fabrics_prot case, the scsi_debug
LUN was actually doing fine.

The root cause seems to be that transport_generic_new_cmd allocates
a prot_sg for all protection cases, which causes isert_reg_sig_mr
to assign a value to sig_wr.prot even for the strip/insert case,
which will then cause mlx5 to blow up.  But even when fixing that
I run into data compare errors, so there's defintively something deeper
hiding here.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
       [not found]       ` <1460169374.5010.4.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2016-04-10  8:42         ` sagig
  0 siblings, 0 replies; 9+ messages in thread
From: sagig @ 2016-04-10  8:42 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy,
	jennyf-VPRAkNaXOzVWk0Htik3J/w


>> Does anyone know a recent working version of T10 PI in isert?  There
>> aren't any recent-ish changes that look related.
> Sagi..?

The T10-PI isert support should be working fine AFAICT.

I know Mellanox folks are testing it nightly.

CCing Max & Jenny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
       [not found]         ` <20160409045846.GA9269-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-04-10  8:53           ` Sagi Grimberg
  2016-04-10 14:37             ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2016-04-10  8:53 UTC (permalink / raw)
  To: Christoph Hellwig, Nicholas A. Bellinger
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, target-devel-u79uwXL29TY76Z2rM5mHXA



On 09/04/16 07:58, Christoph Hellwig wrote:
> On Fri, Apr 08, 2016 at 07:36:14PM -0700, Nicholas A. Bellinger wrote:
>> Ah, yes.  However, TARGET_PROT_DIN_INSERT / TARGET_PROT_DOUT_STRIP
>> will only be happening in sbc_set_prot_op_checks() if fabric_prot = true
>> for a backend that does not support PI.
>>
>> Eg: fabric_prot is for the special case where the fabric supports PI,
>> but the backend does not, and the normal feature bits a device would
>> expose for PI are emulated based upon the fabric protection features.
>>
>> Since your backend does support PI, cmd->prot_op should always be
>> TARGET_PROT_*_PASS regardless.
>>
>> The other thing to check is that isert_get_sup_prot_ops() is returning
>> TARGET_PROT_ALL when tpg->tpg_attrib.t10_pi = true.
> Ok, it turns out this problem was a second LUN that doesn't support
> PI.  So the reported bug is for a this fabrics_prot case, the scsi_debug
> LUN was actually doing fine.

OK, good to know.

Since the fabric_prot was added later, I don't think it was properly
tested (and from what I've seen, not a very interesting use-case).

>
> The root cause seems to be that transport_generic_new_cmd allocates
> a prot_sg for all protection cases, which causes isert_reg_sig_mr
> to assign a value to sig_wr.prot even for the strip/insert case,
> which will then cause mlx5 to blow up.  But even when fixing that
> I run into data compare errors, so there's defintively something deeper
> hiding here.

I don't have access to mlx5 devices at the moment so I can't help a lot :(

But data integrity errors sound strange...

Is it possible that the the sess_prot_type is not set correctly?


P.S.
I'm no longer available in @mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-10  8:53           ` Sagi Grimberg
@ 2016-04-10 14:37             ` Christoph Hellwig
  2016-04-10 15:30               ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-10 14:37 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Nicholas A. Bellinger, linux-rdma, target-devel

On Sun, Apr 10, 2016 at 11:53:14AM +0300, Sagi Grimberg wrote:
> OK, good to know.
> 
> Since the fabric_prot was added later, I don't think it was properly
> tested (and from what I've seen, not a very interesting use-case).

It's been there for more a year, and once your have non-T10 capable
LUNs in a TPG it gets turned on automatically it seems..

> P.S.
> I'm no longer available in @mellanox.com

I keep messing that all the time by grabbing an address from the list
archives, sorry.  That's why people should only use personal addresses
on mailing lists :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc
  2016-04-10 14:37             ` Christoph Hellwig
@ 2016-04-10 15:30               ` Sagi Grimberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2016-04-10 15:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Nicholas A. Bellinger, linux-rdma, target-devel


>> Since the fabric_prot was added later, I don't think it was properly
>> tested (and from what I've seen, not a very interesting use-case).
>
> It's been there for more a year, and once your have non-T10 capable
> LUNs in a TPG it gets turned on automatically it seems..

Yea, but I'm not aware of anyone running non-PI together with PI.
Usually people use either/or (although it can make sense to mix
if you have different classes of data sets).

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-10 15:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 22:15 T10 PI offloading seems to be broken in iser/isert in 4.5/4.6-rc Christoph Hellwig
2016-04-08 23:48 ` Nicholas A. Bellinger
2016-04-09  0:14   ` Christoph Hellwig
2016-04-09  2:36     ` Nicholas A. Bellinger
2016-04-09  4:58       ` Christoph Hellwig
     [not found]         ` <20160409045846.GA9269-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-10  8:53           ` Sagi Grimberg
2016-04-10 14:37             ` Christoph Hellwig
2016-04-10 15:30               ` Sagi Grimberg
     [not found]       ` <1460169374.5010.4.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-04-10  8:42         ` sagig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.