All of lore.kernel.org
 help / color / mirror / Atom feed
* Can someone help me understand the reason for this code in ib_isert.c?
@ 2014-10-20 15:29 Chris Moore
       [not found] ` <462EF229174FDB4D92ACE4656EA5610051E2EEF9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Moore @ 2014-10-20 15:29 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

The following code is in isert_conn_setup_qp() in ib_isert.c:

         /*
           * FIXME: Use devattr.max_sge - 2 for max_send_sge as
           * work-around for RDMA_READ..
           */
         attr.cap.max_send_sge = device->dev_attr.max_sge - 2;

It's not clear from the comment what this is a work-around for, and I wasn't able
to figure it out from looking at logs.

In trying to get isert working with ocrdma I ran into a problem where the code
thought there was only 1 send SGE available when in fact the device has 3.
Then I found the above work-around, which explained the problem.

Removing this (setting attr.cap.max_send_sge to device->dev_attr.max_sge)
fixes the problem with isert and ocrdma, but I don't know what other side effects
that might have.

Thanks,

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found] ` <462EF229174FDB4D92ACE4656EA5610051E2EEF9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-10-20 21:13   ` Or Gerlitz
       [not found]     ` <CAJ3xEMjmmt1guJO8rF6ChnTq-ZQbt9dpb_hwsNQCR65C79waRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2014-10-20 21:13 UTC (permalink / raw)
  To: Chris Moore; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Nicholas A. Bellinger

On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
> The following code is in isert_conn_setup_qp() in ib_isert.c:
>
>          /*
>            * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>            * work-around for RDMA_READ..
>            */
>          attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>
> It's not clear from the comment what this is a work-around for, and I wasn't able
> to figure it out from looking at logs.

I believe this refers to some IBTA spec corner case which comes into
play with the max_sges advertized by mlx4, Eli, can you shed some
light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
isn't always achievable) with mlx5 too?

> In trying to get isert working with ocrdma I ran into a problem where the code
> thought there was only 1 send SGE available when in fact the device has 3.
> Then I found the above work-around, which explained the problem.

Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?



> Removing this (setting attr.cap.max_send_sge to device->dev_attr.max_sge)
> fixes the problem with isert and ocrdma, but I don't know what other side effects
> that might have.
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]     ` <CAJ3xEMjmmt1guJO8rF6ChnTq-ZQbt9dpb_hwsNQCR65C79waRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-10-22  5:06       ` Nicholas A. Bellinger
       [not found]         ` <1413954397.30983.33.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  2014-10-23  7:43       ` Eli Cohen
  1 sibling, 1 reply; 14+ messages in thread
From: Nicholas A. Bellinger @ 2014-10-22  5:06 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg

On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
> > The following code is in isert_conn_setup_qp() in ib_isert.c:
> >
> >          /*
> >            * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> >            * work-around for RDMA_READ..
> >            */
> >          attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> >
> > It's not clear from the comment what this is a work-around for, and I wasn't able
> > to figure it out from looking at logs.
>
> I believe this refers to some IBTA spec corner case which comes into
> play with the max_sges advertized by mlx4, Eli, can you shed some
> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
> isn't always achievable) with mlx5 too?
> 

It's for ConnectX-2 reporting max_sge=31 during development, while the
largest max_send_sge for stable large block RDMA reads was (is..?) 29
SGEs.

> > In trying to get isert working with ocrdma I ran into a problem where the code
> > thought there was only 1 send SGE available when in fact the device has 3.
> > Then I found the above work-around, which explained the problem.
> 
> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
> 

IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
size per RDMA read, and would issue subsequent RDMA read + offset from
completion up to the total se_cmd->data_length.

Not sure how this works with fastreg today.  Sagi..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]         ` <1413954397.30983.33.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2014-10-22  9:02           ` Or Gerlitz
  2014-10-22 21:50             ` Nicholas A. Bellinger
  2014-10-22 11:39           ` Sagi Grimberg
  1 sibling, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2014-10-22  9:02 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Or Gerlitz
  Cc: Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg

On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
> On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
>> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
>>> The following code is in isert_conn_setup_qp() in ib_isert.c:
>>>
>>>           /*
>>>             * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>>>             * work-around for RDMA_READ..
>>>             */
>>>           attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>>>
>>> It's not clear from the comment what this is a work-around for, and I wasn't able
>>> to figure it out from looking at logs.
>> I believe this refers to some IBTA spec corner case which comes into
>> play with the max_sges advertized by mlx4, Eli, can you shed some
>> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
>> isn't always achievable) with mlx5 too?
>>
> It's for ConnectX-2 reporting max_sge=31 during development, while the
> largest max_send_sge for stable large block RDMA reads was (is..?) 29
> SGEs.
>
>>> In trying to get isert working with ocrdma I ran into a problem where the code
>>> thought there was only 1 send SGE available when in fact the device has 3.
>>> Then I found the above work-around, which explained the problem.
>> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
>>
> IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
> size per RDMA read, and would issue subsequent RDMA read + offset from
> completion up to the total se_cmd->data_length.
>
> Not sure how this works with fastreg today.  Sagi..?
>

While on fastreg mode, for RDMA reads  we use only one SGE, talking to 
Sagi he explained me that the problem with creating the QP with 
max_send_sge = 1 has to do with other flows where two or even three SGEs 
are needed, e.g when the iscsi/iser header (doesn't exist in RDMA ops) 
is taken from one spot of the memory and the data (sense) taken from 
another one?

Nic, we need to see what is the minimum needed by the code (two?) and 
tweak that line to make sure it gets there as long as the device 
supports it, SB, makes sense?


diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 0bea577..24abbb3 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn *isert_conn, 
struct rdma_cm_id *cma_id,
         attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
         /*
          * FIXME: Use devattr.max_sge - 2 for max_send_sge as
-        * work-around for RDMA_READ..
+        * work-around for RDMA_READ.. still need to make sure to
+        * have at least two SGEs for SCSI responses.
          */
-       attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
+       attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
         isert_conn->max_sge = attr.cap.max_send_sge;

         attr.cap.max_recv_sge = 1;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]         ` <1413954397.30983.33.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  2014-10-22  9:02           ` Or Gerlitz
@ 2014-10-22 11:39           ` Sagi Grimberg
  1 sibling, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2014-10-22 11:39 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Or Gerlitz
  Cc: Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
> On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
>> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
>>> The following code is in isert_conn_setup_qp() in ib_isert.c:
>>>
>>>           /*
>>>             * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>>>             * work-around for RDMA_READ..
>>>             */
>>>           attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>>>
>>> It's not clear from the comment what this is a work-around for, and I wasn't able
>>> to figure it out from looking at logs.
>>
>> I believe this refers to some IBTA spec corner case which comes into
>> play with the max_sges advertized by mlx4, Eli, can you shed some
>> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
>> isn't always achievable) with mlx5 too?
>>
>
> It's for ConnectX-2 reporting max_sge=31 during development, while the
> largest max_send_sge for stable large block RDMA reads was (is..?) 29
> SGEs.

Hmm, long time since I've worked with CX2...
I'll check that.

>
>>> In trying to get isert working with ocrdma I ran into a problem where the code
>>> thought there was only 1 send SGE available when in fact the device has 3.
>>> Then I found the above work-around, which explained the problem.
>>
>> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
>>
>
> IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
> size per RDMA read, and would issue subsequent RDMA read + offset from
> completion up to the total se_cmd->data_length.

The non-fastreg code logic should have stayed the same unless I
got a bug in there...

Chris, can you get us logs with debug enabled?

>
> Not sure how this works with fastreg today.  Sagi..?

First, fastreg is used only if signature is enabled.
Second, since isert registers the memory (single {key, address, length}
tuple describes the page list) it will use *only 1* sge in the RDMA (no
need for more...), so no problem here.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
  2014-10-22  9:02           ` Or Gerlitz
@ 2014-10-22 21:50             ` Nicholas A. Bellinger
  2014-10-29 18:05               ` Chris Moore
  0 siblings, 1 reply; 14+ messages in thread
From: Nicholas A. Bellinger @ 2014-10-22 21:50 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Chris Moore, linux-rdma, Sagi Grimberg, target-devel

On Wed, 2014-10-22 at 12:02 +0300, Or Gerlitz wrote:
> On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
> > On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
> >> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore@emulex.com> wrote:
> >>> The following code is in isert_conn_setup_qp() in ib_isert.c:
> >>>
> >>>           /*
> >>>             * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> >>>             * work-around for RDMA_READ..
> >>>             */
> >>>           attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> >>>
> >>> It's not clear from the comment what this is a work-around for, and I wasn't able
> >>> to figure it out from looking at logs.
> >> I believe this refers to some IBTA spec corner case which comes into
> >> play with the max_sges advertized by mlx4, Eli, can you shed some
> >> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
> >> isn't always achievable) with mlx5 too?
> >>
> > It's for ConnectX-2 reporting max_sge=31 during development, while the
> > largest max_send_sge for stable large block RDMA reads was (is..?) 29
> > SGEs.
> >
> >>> In trying to get isert working with ocrdma I ran into a problem where the code
> >>> thought there was only 1 send SGE available when in fact the device has 3.
> >>> Then I found the above work-around, which explained the problem.
> >> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
> >>
> > IIRC, pre fastreg code supported max_send_sge=1 by limiting the transfer
> > size per RDMA read, and would issue subsequent RDMA read + offset from
> > completion up to the total se_cmd->data_length.
> >
> > Not sure how this works with fastreg today.  Sagi..?
> >
> 
> While on fastreg mode, for RDMA reads  we use only one SGE, talking to 
> Sagi he explained me that the problem with creating the QP with 
> max_send_sge = 1 has to do with other flows where two or even three SGEs 
> are needed, e.g when the iscsi/iser header (doesn't exist in RDMA ops) 
> is taken from one spot of the memory and the data (sense) taken from 
> another one?
> 
> Nic, we need to see what is the minimum needed by the code (two?) and 
> tweak that line to make sure it gets there as long as the device 
> supports it, SB, makes sense?
> 
>
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
> b/drivers/infiniband/ulp/isert/ib_isert.c
> index 0bea577..24abbb3 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn *isert_conn, 
> struct rdma_cm_id *cma_id,
>          attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
>          /*
>           * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> -        * work-around for RDMA_READ..
> +        * work-around for RDMA_READ.. still need to make sure to
> +        * have at least two SGEs for SCSI responses.
>           */
> -       attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> +       attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
>          isert_conn->max_sge = attr.cap.max_send_sge;
> 
>          attr.cap.max_recv_sge = 1;
> 
> 

After a quick audit, the minimum max_send_sge=2 patch looks correct for
the hardcoded tx_desc->num_sge cases in isert_put_login_tx(),
isert_put_response(), isert_put_reject() and isert_put_text_rsp().

For the RDMA read path with non fast-reg, isert_build_rdma_wr() is still
correctly limiting the SGEs per operation based upon the negotiated
isert_conn->max_sge set above in isert_conn_setup_qp().

Chris, please confirm on ocrdma with Or's patch, and I'll include his
patch into target-pending/queue for now, and move into master once you
give a proper Tested-By.

Thanks!

--nab

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]     ` <CAJ3xEMjmmt1guJO8rF6ChnTq-ZQbt9dpb_hwsNQCR65C79waRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-10-22  5:06       ` Nicholas A. Bellinger
@ 2014-10-23  7:43       ` Eli Cohen
  2014-10-26 12:57         ` Bart Van Assche
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Cohen @ 2014-10-23  7:43 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Nicholas A. Bellinger

On Tue, Oct 21, 2014 at 12:13:22AM +0300, Or Gerlitz wrote:
> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
> > The following code is in isert_conn_setup_qp() in ib_isert.c:
> >
> >          /*
> >            * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> >            * work-around for RDMA_READ..
> >            */
> >          attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> >
> > It's not clear from the comment what this is a work-around for, and I wasn't able
> > to figure it out from looking at logs.
> 
> I believe this refers to some IBTA spec corner case which comes into
> play with the max_sges advertized by mlx4, Eli, can you shed some
> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
> isn't always achievable) with mlx5 too?
> 

I don't think it has to do anything with some corner case. If you look
at the spec you will find that is not guarnteed that whenever you
create a QP with device->dev_attr.max_sge it will succeed. The
consumer should try smaller values if it cannot create a QP with max
sge. This is from IB spec 1.2.1.

11.2.1.2 QUERY HCA
Description:
Returns the attributes for the specified HCA.  The maximum values
defined in this section are guaranteed not-to-exceed values. It is
possible for an implementation to allocate some HCA resources from the
same space. In that case, the maximum values returned are not
guaranteed for all of those resources simultaneously.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
  2014-10-23  7:43       ` Eli Cohen
@ 2014-10-26 12:57         ` Bart Van Assche
       [not found]           ` <544CEFBD.2090405-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2014-10-26 12:57 UTC (permalink / raw)
  To: Eli Cohen, Or Gerlitz
  Cc: Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Nicholas A. Bellinger

On 10/23/14 09:43, Eli Cohen wrote:
> On Tue, Oct 21, 2014 at 12:13:22AM +0300, Or Gerlitz wrote:
>> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
>>> The following code is in isert_conn_setup_qp() in ib_isert.c:
>>>
>>>           /*
>>>             * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>>>             * work-around for RDMA_READ..
>>>             */
>>>           attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>>>
>>> It's not clear from the comment what this is a work-around for, and I wasn't able
>>> to figure it out from looking at logs.
>>
>> I believe this refers to some IBTA spec corner case which comes into
>> play with the max_sges advertized by mlx4, Eli, can you shed some
>> light (IBTA pointer) on that? is this the case (i.e dev_attr.max_sge
>> isn't always achievable) with mlx5 too?
>
> I don't think it has to do anything with some corner case. If you look
> at the spec you will find that is not guarnteed that whenever you
> create a QP with device->dev_attr.max_sge it will succeed. The
> consumer should try smaller values if it cannot create a QP with max
> sge. This is from IB spec 1.2.1.
>
> 11.2.1.2 QUERY HCA
> Description:
> Returns the attributes for the specified HCA.  The maximum values
> defined in this section are guaranteed not-to-exceed values. It is
> possible for an implementation to allocate some HCA resources from the
> same space. In that case, the maximum values returned are not
> guaranteed for all of those resources simultaneously.

Hello Eli,

Applications really need a way to query what the maximum supported 
number of scatter/gather entries per work request is. The current 
approach, namely using "dev_attr.max_sge - <magic value>" is cumbersome 
since this approach doesn't work if the value of max_sge reported by the 
HCA is small. I see two possible solutions: either modify the max_sge 
value reported by the mlx4 and mlx5 drivers or introduce a new parameter 
in struct ib_device_attr.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]           ` <544CEFBD.2090405-HInyCGIudOg@public.gmane.org>
@ 2014-10-26 14:08             ` Eli Cohen
  2014-10-28 10:06               ` Or Gerlitz
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Cohen @ 2014-10-26 14:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Eli Cohen, Or Gerlitz, Chris Moore,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Nicholas A. Bellinger

On Sun, Oct 26, 2014 at 01:57:33PM +0100, Bart Van Assche wrote:
> 
> Applications really need a way to query what the maximum supported
> number of scatter/gather entries per work request is. The current
> approach, namely using "dev_attr.max_sge - <magic value>" is
> cumbersome since this approach doesn't work if the value of max_sge
> reported by the HCA is small. I see two possible solutions: either
> modify the max_sge value reported by the mlx4 and mlx5 drivers or
> introduce a new parameter in struct ib_device_attr.
> 
Hi Bart,
we already have reported max_sge value. Maybe what we need is another
field, "guarnateed_num_sge" so applicatios providing this value can be
sure it would succeed (or there is some other problem). Then they can
either use this value or iterate from max_sge down to
guarnateed_num_sge to find the highest possible value.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
  2014-10-26 14:08             ` Eli Cohen
@ 2014-10-28 10:06               ` Or Gerlitz
       [not found]                 ` <544F6A8E.9000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Or Gerlitz @ 2014-10-28 10:06 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Bart Van Assche, Chris Moore, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Nicholas A. Bellinger

On 10/26/2014 4:08 PM, Eli Cohen wrote:
> On Sun, Oct 26, 2014 at 01:57:33PM +0100, Bart Van Assche wrote:
>> Applications really need a way to query what the maximum supported
>> number of scatter/gather entries per work request is. The current
>> approach, namely using "dev_attr.max_sge - <magic value>" is
>> cumbersome since this approach doesn't work if the value of max_sge
>> reported by the HCA is small. I see two possible solutions: either
>> modify the max_sge value reported by the mlx4 and mlx5 drivers or
>> introduce a new parameter in struct ib_device_attr.
>>
> we already have reported max_sge value. Maybe what we need is another
> field, "guarnateed_num_sge" so applicatios providing this value can be
> sure it would succeed (or there is some other problem). Then they can
> either use this value or iterate from max_sge down to
> guarnateed_num_sge to find the highest possible value.
>
Eli, can we be a bit more restrictive and report something which would 
work?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]                 ` <544F6A8E.9000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-10-28 10:55                   ` Eli Cohen
  0 siblings, 0 replies; 14+ messages in thread
From: Eli Cohen @ 2014-10-28 10:55 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eli Cohen, Bart Van Assche, Chris Moore,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Nicholas A. Bellinger

On Tue, Oct 28, 2014 at 12:06:06PM +0200, Or Gerlitz wrote:
> >we already have reported max_sge value. Maybe what we need is another
> >field, "guarnateed_num_sge" so applicatios providing this value can be
> >sure it would succeed (or there is some other problem). Then they can
> >either use this value or iterate from max_sge down to
> >guarnateed_num_sge to find the highest possible value.
> >
> Eli, can we be a bit more restrictive and report something which
> would work?

I don't think this is the right approach because it would cause
applications to ask for values lower than they could succeed and the
spec does not require it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Can someone help me understand the reason for this code in ib_isert.c?
  2014-10-22 21:50             ` Nicholas A. Bellinger
@ 2014-10-29 18:05               ` Chris Moore
       [not found]                 ` <462EF229174FDB4D92ACE4656EA5610051E396BE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Moore @ 2014-10-29 18:05 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Or Gerlitz
  Cc: Or Gerlitz, linux-rdma, Sagi Grimberg, target-devel

> On Wed, 2014-10-22 at 12:02 +0300, Or Gerlitz wrote:
> > On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
> > > On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
> > >> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore
> <Chris.Moore@emulex.com> wrote:
> > >>> The following code is in isert_conn_setup_qp() in ib_isert.c:
> > >>>
> > >>>           /*
> > >>>             * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> > >>>             * work-around for RDMA_READ..
> > >>>             */
> > >>>           attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> > >>>
> > >>> It's not clear from the comment what this is a work-around for,
> > >>> and I wasn't able to figure it out from looking at logs.
> > >> I believe this refers to some IBTA spec corner case which comes
> > >> into play with the max_sges advertized by mlx4, Eli, can you shed
> > >> some light (IBTA pointer) on that? is this the case (i.e
> > >> dev_attr.max_sge isn't always achievable) with mlx5 too?
> > >>
> > > It's for ConnectX-2 reporting max_sge=31 during development, while
> > > the largest max_send_sge for stable large block RDMA reads was
> > > (is..?) 29 SGEs.
> > >
> > >>> In trying to get isert working with ocrdma I ran into a problem
> > >>> where the code thought there was only 1 send SGE available when in
> fact the device has 3.
> > >>> Then I found the above work-around, which explained the problem.
> > >> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
> > >>
> > > IIRC, pre fastreg code supported max_send_sge=1 by limiting the
> > > transfer size per RDMA read, and would issue subsequent RDMA read +
> > > offset from completion up to the total se_cmd->data_length.
> > >
> > > Not sure how this works with fastreg today.  Sagi..?
> > >
> >
> > While on fastreg mode, for RDMA reads  we use only one SGE, talking to
> > Sagi he explained me that the problem with creating the QP with
> > max_send_sge = 1 has to do with other flows where two or even three
> > SGEs are needed, e.g when the iscsi/iser header (doesn't exist in RDMA
> > ops) is taken from one spot of the memory and the data (sense) taken
> > from another one?
> >
> > Nic, we need to see what is the minimum needed by the code (two?) and
> > tweak that line to make sure it gets there as long as the device
> > supports it, SB, makes sense?
> >
> >
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> > b/drivers/infiniband/ulp/isert/ib_isert.c
> > index 0bea577..24abbb3 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn
> > *isert_conn, struct rdma_cm_id *cma_id,
> >          attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
> >          /*
> >           * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> > -        * work-around for RDMA_READ..
> > +        * work-around for RDMA_READ.. still need to make sure to
> > +        * have at least two SGEs for SCSI responses.
> >           */
> > -       attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> > +       attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
> >          isert_conn->max_sge = attr.cap.max_send_sge;
> >
> >          attr.cap.max_recv_sge = 1;
> >
> >
> 
> After a quick audit, the minimum max_send_sge=2 patch looks correct for
> the hardcoded tx_desc->num_sge cases in isert_put_login_tx(),
> isert_put_response(), isert_put_reject() and isert_put_text_rsp().
> 
> For the RDMA read path with non fast-reg, isert_build_rdma_wr() is still
> correctly limiting the SGEs per operation based upon the negotiated
> isert_conn->max_sge set above in isert_conn_setup_qp().
> 
> Chris, please confirm on ocrdma with Or's patch, and I'll include his patch into
> target-pending/queue for now, and move into master once you give a
> proper Tested-By.
> 
> Thanks!
> 
> --nab

Sorry for the long delay, I was dealing with some hardware issues.

I've tested the proposed patch and it fixes the ocrdma issue.

Tested-by: Chris Moore <chris.moore@emulex.com>

Thanks,

Chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can someone help me understand the reason for this code in ib_isert.c?
       [not found]                 ` <462EF229174FDB4D92ACE4656EA5610051E396BE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-10-30  8:24                   ` Sagi Grimberg
  2014-10-30 15:06                     ` Chris Moore
  0 siblings, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2014-10-30  8:24 UTC (permalink / raw)
  To: Chris Moore, Nicholas A. Bellinger, Or Gerlitz
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg,
	target-devel

On 10/29/2014 8:05 PM, Chris Moore wrote:
>> On Wed, 2014-10-22 at 12:02 +0300, Or Gerlitz wrote:
>>> On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
>>>> On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
>>>>> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore
>> <Chris.Moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>> The following code is in isert_conn_setup_qp() in ib_isert.c:
>>>>>>
>>>>>>            /*
>>>>>>              * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>>>>>>              * work-around for RDMA_READ..
>>>>>>              */
>>>>>>            attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>>>>>>
>>>>>> It's not clear from the comment what this is a work-around for,
>>>>>> and I wasn't able to figure it out from looking at logs.
>>>>> I believe this refers to some IBTA spec corner case which comes
>>>>> into play with the max_sges advertized by mlx4, Eli, can you shed
>>>>> some light (IBTA pointer) on that? is this the case (i.e
>>>>> dev_attr.max_sge isn't always achievable) with mlx5 too?
>>>>>
>>>> It's for ConnectX-2 reporting max_sge=31 during development, while
>>>> the largest max_send_sge for stable large block RDMA reads was
>>>> (is..?) 29 SGEs.
>>>>
>>>>>> In trying to get isert working with ocrdma I ran into a problem
>>>>>> where the code thought there was only 1 send SGE available when in
>> fact the device has 3.
>>>>>> Then I found the above work-around, which explained the problem.
>>>>> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
>>>>>
>>>> IIRC, pre fastreg code supported max_send_sge=1 by limiting the
>>>> transfer size per RDMA read, and would issue subsequent RDMA read +
>>>> offset from completion up to the total se_cmd->data_length.
>>>>
>>>> Not sure how this works with fastreg today.  Sagi..?
>>>>
>>>
>>> While on fastreg mode, for RDMA reads  we use only one SGE, talking to
>>> Sagi he explained me that the problem with creating the QP with
>>> max_send_sge = 1 has to do with other flows where two or even three
>>> SGEs are needed, e.g when the iscsi/iser header (doesn't exist in RDMA
>>> ops) is taken from one spot of the memory and the data (sense) taken
>>> from another one?
>>>
>>> Nic, we need to see what is the minimum needed by the code (two?) and
>>> tweak that line to make sure it gets there as long as the device
>>> supports it, SB, makes sense?
>>>
>>>
>>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
>>> b/drivers/infiniband/ulp/isert/ib_isert.c
>>> index 0bea577..24abbb3 100644
>>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
>>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
>>> @@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn
>>> *isert_conn, struct rdma_cm_id *cma_id,
>>>           attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
>>>           /*
>>>            * FIXME: Use devattr.max_sge - 2 for max_send_sge as
>>> -        * work-around for RDMA_READ..
>>> +        * work-around for RDMA_READ.. still need to make sure to
>>> +        * have at least two SGEs for SCSI responses.
>>>            */
>>> -       attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
>>> +       attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);

I think we need to fail if (dev_attr.max_sge - 2 <= 0)

>>>           isert_conn->max_sge = attr.cap.max_send_sge;
>>>
>>>           attr.cap.max_recv_sge = 1;
>>>
>>>
>>
>> After a quick audit, the minimum max_send_sge=2 patch looks correct for
>> the hardcoded tx_desc->num_sge cases in isert_put_login_tx(),
>> isert_put_response(), isert_put_reject() and isert_put_text_rsp().
>>
>> For the RDMA read path with non fast-reg, isert_build_rdma_wr() is still
>> correctly limiting the SGEs per operation based upon the negotiated
>> isert_conn->max_sge set above in isert_conn_setup_qp().
>>
>> Chris, please confirm on ocrdma with Or's patch, and I'll include his patch into
>> target-pending/queue for now, and move into master once you give a
>> proper Tested-By.
>>
>> Thanks!
>>
>> --nab
>
> Sorry for the long delay, I was dealing with some hardware issues.
>
> I've tested the proposed patch and it fixes the ocrdma issue.

This is a good patch for a work-around. But we should fix
this one from the root.

Sagi.

>
> Tested-by: Chris Moore <chris.moore-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
>
> Thanks,
>
> Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Can someone help me understand the reason for this code in ib_isert.c?
  2014-10-30  8:24                   ` Sagi Grimberg
@ 2014-10-30 15:06                     ` Chris Moore
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Moore @ 2014-10-30 15:06 UTC (permalink / raw)
  To: Sagi Grimberg, Nicholas A. Bellinger, Or Gerlitz
  Cc: Or Gerlitz, linux-rdma, Sagi Grimberg, target-devel

> On 10/29/2014 8:05 PM, Chris Moore wrote:
> >> On Wed, 2014-10-22 at 12:02 +0300, Or Gerlitz wrote:
> >>> On 10/22/2014 8:06 AM, Nicholas A. Bellinger wrote:
> >>>> On Tue, 2014-10-21 at 00:13 +0300, Or Gerlitz wrote:
> >>>>> On Mon, Oct 20, 2014 at 6:29 PM, Chris Moore
> >> <Chris.Moore@emulex.com> wrote:
> >>>>>> The following code is in isert_conn_setup_qp() in ib_isert.c:
> >>>>>>
> >>>>>>            /*
> >>>>>>              * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> >>>>>>              * work-around for RDMA_READ..
> >>>>>>              */
> >>>>>>            attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> >>>>>>
> >>>>>> It's not clear from the comment what this is a work-around for,
> >>>>>> and I wasn't able to figure it out from looking at logs.
> >>>>> I believe this refers to some IBTA spec corner case which comes
> >>>>> into play with the max_sges advertized by mlx4, Eli, can you shed
> >>>>> some light (IBTA pointer) on that? is this the case (i.e
> >>>>> dev_attr.max_sge isn't always achievable) with mlx5 too?
> >>>>>
> >>>> It's for ConnectX-2 reporting max_sge=31 during development, while
> >>>> the largest max_send_sge for stable large block RDMA reads was
> >>>> (is..?) 29 SGEs.
> >>>>
> >>>>>> In trying to get isert working with ocrdma I ran into a problem
> >>>>>> where the code thought there was only 1 send SGE available when
> >>>>>> in
> >> fact the device has 3.
> >>>>>> Then I found the above work-around, which explained the problem.
> >>>>> Nic, any reason for attr.cap.max_send_sge = 1 not to work OK?
> >>>>>
> >>>> IIRC, pre fastreg code supported max_send_sge=1 by limiting the
> >>>> transfer size per RDMA read, and would issue subsequent RDMA read
> +
> >>>> offset from completion up to the total se_cmd->data_length.
> >>>>
> >>>> Not sure how this works with fastreg today.  Sagi..?
> >>>>
> >>>
> >>> While on fastreg mode, for RDMA reads  we use only one SGE, talking
> >>> to Sagi he explained me that the problem with creating the QP with
> >>> max_send_sge = 1 has to do with other flows where two or even three
> >>> SGEs are needed, e.g when the iscsi/iser header (doesn't exist in
> >>> RDMA
> >>> ops) is taken from one spot of the memory and the data (sense) taken
> >>> from another one?
> >>>
> >>> Nic, we need to see what is the minimum needed by the code (two?)
> >>> and tweak that line to make sure it gets there as long as the device
> >>> supports it, SB, makes sense?
> >>>
> >>>
> >>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> >>> b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> index 0bea577..24abbb3 100644
> >>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> >>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> @@ -115,9 +115,10 @@ isert_conn_setup_qp(struct isert_conn
> >>> *isert_conn, struct rdma_cm_id *cma_id,
> >>>           attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
> >>>           /*
> >>>            * FIXME: Use devattr.max_sge - 2 for max_send_sge as
> >>> -        * work-around for RDMA_READ..
> >>> +        * work-around for RDMA_READ.. still need to make sure to
> >>> +        * have at least two SGEs for SCSI responses.
> >>>            */
> >>> -       attr.cap.max_send_sge = device->dev_attr.max_sge - 2;
> >>> +       attr.cap.max_send_sge = max(2, device->dev_attr.max_sge -
> >>> + 2);
> 
> I think we need to fail if (dev_attr.max_sge - 2 <= 0)
> 

We may need to fail if (dev_attr.max_sge - 2 <= 1).   I think LOGIN RESPONSE requires
2 SGEs.

> >>>           isert_conn->max_sge = attr.cap.max_send_sge;
> >>>
> >>>           attr.cap.max_recv_sge = 1;
> >>>
> >>>
> >>
> >> After a quick audit, the minimum max_send_sge=2 patch looks correct
> >> for the hardcoded tx_desc->num_sge cases in isert_put_login_tx(),
> >> isert_put_response(), isert_put_reject() and isert_put_text_rsp().
> >>
> >> For the RDMA read path with non fast-reg, isert_build_rdma_wr() is
> >> still correctly limiting the SGEs per operation based upon the
> >> negotiated isert_conn->max_sge set above in isert_conn_setup_qp().
> >>
> >> Chris, please confirm on ocrdma with Or's patch, and I'll include his
> >> patch into target-pending/queue for now, and move into master once
> >> you give a proper Tested-By.
> >>
> >> Thanks!
> >>
> >> --nab
> >
> > Sorry for the long delay, I was dealing with some hardware issues.
> >
> > I've tested the proposed patch and it fixes the ocrdma issue.
> 
> This is a good patch for a work-around. But we should fix this one from the
> root.
> 
> Sagi.
> 

I agree, it's a work-around, but I think it's better than the work-around that
it's replacing.   If we add the check to fail if we don't have enough SGEs 
could we go with that for now?

In the long term maybe we can consider re-writing the LOGIN RESPONSE code to
only use a single SGE (Assuming I'm correct about the LOGIN RESPONSE taking 2 SGEs)

Another thing to consider is rather than subtracting 2 from the reported
number of SGEs maybe we just need to put a max on it.

Chris
> >
> > Tested-by: Chris Moore <chris.moore@emulex.com>
> >
> > Thanks,
> >
> > Chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-10-30 15:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20 15:29 Can someone help me understand the reason for this code in ib_isert.c? Chris Moore
     [not found] ` <462EF229174FDB4D92ACE4656EA5610051E2EEF9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-10-20 21:13   ` Or Gerlitz
     [not found]     ` <CAJ3xEMjmmt1guJO8rF6ChnTq-ZQbt9dpb_hwsNQCR65C79waRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-22  5:06       ` Nicholas A. Bellinger
     [not found]         ` <1413954397.30983.33.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2014-10-22  9:02           ` Or Gerlitz
2014-10-22 21:50             ` Nicholas A. Bellinger
2014-10-29 18:05               ` Chris Moore
     [not found]                 ` <462EF229174FDB4D92ACE4656EA5610051E396BE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-10-30  8:24                   ` Sagi Grimberg
2014-10-30 15:06                     ` Chris Moore
2014-10-22 11:39           ` Sagi Grimberg
2014-10-23  7:43       ` Eli Cohen
2014-10-26 12:57         ` Bart Van Assche
     [not found]           ` <544CEFBD.2090405-HInyCGIudOg@public.gmane.org>
2014-10-26 14:08             ` Eli Cohen
2014-10-28 10:06               ` Or Gerlitz
     [not found]                 ` <544F6A8E.9000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-10-28 10:55                   ` Eli Cohen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.