linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SQ overflow seen running isert traffic with high block sizes
@ 2017-06-28  9:25 Amrani, Ram
       [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2017-06-28 10:39 ` Sagi Grimberg
  0 siblings, 2 replies; 26+ messages in thread
From: Amrani, Ram @ 2017-06-28  9:25 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Elior, Ariel

We are hitting SQ overflow on iSER target side with high block sizes over RoCE
(see dmesg output below).

We are using Q-Logic/Cavium NIC with a capability of 4 sges.

Following the thread "SQ overflow seen running isert traffic" [1], I was wondering
if someone is working on SQ accounting, or more graceful handling of overflow,
as the messages are printed over and over.

Dmesg output:
2017-06-06T09:23:28.824234+05:30 SLES12SP3Beta3 kernel: [65057.799615] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83cb2cb0 failed to post RDMA res
2017-06-06T09:23:29.500095+05:30 SLES12SP3Beta3 kernel: [65058.475858] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec53ec020 failed to post RDMA res
2017-06-06T09:23:29.560085+05:30 SLES12SP3Beta3 kernel: [65058.533787] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec622ae08 failed to post RDMA res
2017-06-06T09:23:29.984209+05:30 SLES12SP3Beta3 kernel: [65058.958509] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ff08e6bb0 failed to post RDMA res
2017-06-06T09:23:30.056098+05:30 SLES12SP3Beta3 kernel: [65059.032182] isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa6761138 failed to post RDMA res
2017-06-06T09:23:30.288152+05:30 SLES12SP3Beta3 kernel: [65059.262748] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec3caf668 failed to post RDMA res
2017-06-06T09:23:30.444068+05:30 SLES12SP3Beta3 kernel: [65059.421071] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f2186cc30 failed to post RDMA res
2017-06-06T09:23:30.532135+05:30 SLES12SP3Beta3 kernel: [65059.505380] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec6429bf0 failed to post RDMA res
2017-06-06T09:23:30.672098+05:30 SLES12SP3Beta3 kernel: [65059.645585] isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa5526bb0 failed to post RDMA res
2017-06-06T09:23:30.852121+05:30 SLES12SP3Beta3 kernel: [65059.828072] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f5c8a89d8 failed to post RDMA res
2017-06-06T09:23:31.464125+05:30 SLES12SP3Beta3 kernel: [65060.440092] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ffefdf918 failed to post RDMA res
2017-06-06T09:23:31.576074+05:30 SLES12SP3Beta3 kernel: [65060.550314] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83222350 failed to post RDMA res
2017-06-06T09:24:30.532064+05:30 SLES12SP3Beta3 kernel: [65119.503466] ABORT_TASK: Found referenced iSCSI task_tag: 103
2017-06-06T09:24:30.532079+05:30 SLES12SP3Beta3 kernel: [65119.503468] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 103
2017-06-06T09:24:31.428084+05:30 SLES12SP3Beta3 kernel: [65120.399433] ABORT_TASK: Found referenced iSCSI task_tag: 101
2017-06-06T09:24:31.428101+05:30 SLES12SP3Beta3 kernel: [65120.399436] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 101
2017-06-06T09:24:31.556053+05:30 SLES12SP3Beta3 kernel: [65120.527461] ABORT_TASK: Found referenced iSCSI task_tag: 119
2017-06-06T09:24:31.556060+05:30 SLES12SP3Beta3 kernel: [65120.527465] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 119
2017-06-06T09:24:31.556061+05:30 SLES12SP3Beta3 kernel: [65120.527468] ABORT_TASK: Found referenced iSCSI task_tag: 43
2017-06-06T09:24:31.556062+05:30 SLES12SP3Beta3 kernel: [65120.527469] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 43
2017-06-06T09:24:31.556063+05:30 SLES12SP3Beta3 kernel: [65120.527470] ABORT_TASK: Found referenced iSCSI task_tag: 79
2017-06-06T09:24:31.556064+05:30 SLES12SP3Beta3 kernel: [65120.527471] ABORT_TASK: Found referenced iSCSI task_tag: 71
2017-06-06T09:24:31.556066+05:30 SLES12SP3Beta3 kernel: [65120.527472] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 79
2017-06-06T09:24:31.556067+05:30 SLES12SP3Beta3 kernel: [65120.527472] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 71
2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527506] ABORT_TASK: Found referenced iSCSI task_tag: 122
2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527508] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 122
2017-06-06T09:24:32.452073+05:30 SLES12SP3Beta3 kernel: [65121.423425] ABORT_TASK: Found referenced iSCSI task_tag: 58
2017-06-06T09:24:32.452080+05:30 SLES12SP3Beta3 kernel: [65121.423427] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 58
2017-06-06T09:24:32.516054+05:30 SLES12SP3Beta3 kernel: [65121.487380] ABORT_TASK: Found referenced iSCSI task_tag: 100
2017-06-06T09:24:32.516061+05:30 SLES12SP3Beta3 kernel: [65121.487382] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 100
2017-06-06T09:24:32.584031+05:30 SLES12SP3Beta3 kernel: [65121.555374] ABORT_TASK: Found referenced iSCSI task_tag: 52
2017-06-06T09:24:32.584041+05:30 SLES12SP3Beta3 kernel: [65121.555376] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 52
2017-06-06T09:24:33.412057+05:30 SLES12SP3Beta3 kernel: [65122.383341] ABORT_TASK: Found referenced iSCSI task_tag: 43
2017-06-06T09:24:33.412065+05:30 SLES12SP3Beta3 kernel: [65122.383376] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 43
2017-06-06T09:24:33.476061+05:30 SLES12SP3Beta3 kernel: [65122.447354] ABORT_TASK: Found referenced iSCSI task_tag: 63
2017-06-06T09:24:33.476070+05:30 SLES12SP3Beta3 kernel: [65122.447360] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 63

[1] https://patchwork.kernel.org/patch/9633675/

Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-06-28 10:35   ` Potnuri Bharat Teja
       [not found]     ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Potnuri Bharat Teja @ 2017-06-28 10:35 UTC (permalink / raw)
  To: Amrani, Ram; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel

Here is some more discussion regarding SQ overflow.
https://www.spinics.net/lists/linux-rdma/msg47635.html

Current SQ overflow handling is post error handling to keep the IO running and 
We still see SQ post failures filling the dmesg.
We are still working on fixing SQ overflow.

Thanks,
Bharat.

On Wednesday, June 06/28/17, 2017 at 14:55:45 +0530, Amrani, Ram wrote:
>    We are hitting SQ overflow on iSER target side with high block sizes over
>    RoCE
>    (see dmesg output below).
> 
>    We are using Q-Logic/Cavium NIC with a capability of 4 sges.
> 
>    Following the thread "SQ overflow seen running isert traffic" [1], I was
>    wondering
>    if someone is working on SQ accounting, or more graceful handling of
>    overflow,
>    as the messages are printed over and over.
> 
>    Dmesg output:
>    2017-06-06T09:23:28.824234+05:30 SLES12SP3Beta3 kernel: [65057.799615]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83cb2cb0 failed to post RDMA
>    res
>    2017-06-06T09:23:29.500095+05:30 SLES12SP3Beta3 kernel: [65058.475858]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec53ec020 failed to post RDMA
>    res
>    2017-06-06T09:23:29.560085+05:30 SLES12SP3Beta3 kernel: [65058.533787]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec622ae08 failed to post RDMA
>    res
>    2017-06-06T09:23:29.984209+05:30 SLES12SP3Beta3 kernel: [65058.958509]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ff08e6bb0 failed to post RDMA
>    res
>    2017-06-06T09:23:30.056098+05:30 SLES12SP3Beta3 kernel: [65059.032182]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa6761138 failed to post RDMA
>    res
>    2017-06-06T09:23:30.288152+05:30 SLES12SP3Beta3 kernel: [65059.262748]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec3caf668 failed to post RDMA
>    res
>    2017-06-06T09:23:30.444068+05:30 SLES12SP3Beta3 kernel: [65059.421071]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880f2186cc30 failed to post RDMA
>    res
>    2017-06-06T09:23:30.532135+05:30 SLES12SP3Beta3 kernel: [65059.505380]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec6429bf0 failed to post RDMA
>    res
>    2017-06-06T09:23:30.672098+05:30 SLES12SP3Beta3 kernel: [65059.645585]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa5526bb0 failed to post RDMA
>    res
>    2017-06-06T09:23:30.852121+05:30 SLES12SP3Beta3 kernel: [65059.828072]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880f5c8a89d8 failed to post RDMA
>    res
>    2017-06-06T09:23:31.464125+05:30 SLES12SP3Beta3 kernel: [65060.440092]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880ffefdf918 failed to post RDMA
>    res
>    2017-06-06T09:23:31.576074+05:30 SLES12SP3Beta3 kernel: [65060.550314]
>    isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83222350 failed to post RDMA
>    res
>    2017-06-06T09:24:30.532064+05:30 SLES12SP3Beta3 kernel: [65119.503466]
>    ABORT_TASK: Found referenced iSCSI task_tag: 103
>    2017-06-06T09:24:30.532079+05:30 SLES12SP3Beta3 kernel: [65119.503468]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 103
>    2017-06-06T09:24:31.428084+05:30 SLES12SP3Beta3 kernel: [65120.399433]
>    ABORT_TASK: Found referenced iSCSI task_tag: 101
>    2017-06-06T09:24:31.428101+05:30 SLES12SP3Beta3 kernel: [65120.399436]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 101
>    2017-06-06T09:24:31.556053+05:30 SLES12SP3Beta3 kernel: [65120.527461]
>    ABORT_TASK: Found referenced iSCSI task_tag: 119
>    2017-06-06T09:24:31.556060+05:30 SLES12SP3Beta3 kernel: [65120.527465]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 119
>    2017-06-06T09:24:31.556061+05:30 SLES12SP3Beta3 kernel: [65120.527468]
>    ABORT_TASK: Found referenced iSCSI task_tag: 43
>    2017-06-06T09:24:31.556062+05:30 SLES12SP3Beta3 kernel: [65120.527469]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 43
>    2017-06-06T09:24:31.556063+05:30 SLES12SP3Beta3 kernel: [65120.527470]
>    ABORT_TASK: Found referenced iSCSI task_tag: 79
>    2017-06-06T09:24:31.556064+05:30 SLES12SP3Beta3 kernel: [65120.527471]
>    ABORT_TASK: Found referenced iSCSI task_tag: 71
>    2017-06-06T09:24:31.556066+05:30 SLES12SP3Beta3 kernel: [65120.527472]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 79
>    2017-06-06T09:24:31.556067+05:30 SLES12SP3Beta3 kernel: [65120.527472]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 71
>    2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527506]
>    ABORT_TASK: Found referenced iSCSI task_tag: 122
>    2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527508]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 122
>    2017-06-06T09:24:32.452073+05:30 SLES12SP3Beta3 kernel: [65121.423425]
>    ABORT_TASK: Found referenced iSCSI task_tag: 58
>    2017-06-06T09:24:32.452080+05:30 SLES12SP3Beta3 kernel: [65121.423427]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 58
>    2017-06-06T09:24:32.516054+05:30 SLES12SP3Beta3 kernel: [65121.487380]
>    ABORT_TASK: Found referenced iSCSI task_tag: 100
>    2017-06-06T09:24:32.516061+05:30 SLES12SP3Beta3 kernel: [65121.487382]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 100
>    2017-06-06T09:24:32.584031+05:30 SLES12SP3Beta3 kernel: [65121.555374]
>    ABORT_TASK: Found referenced iSCSI task_tag: 52
>    2017-06-06T09:24:32.584041+05:30 SLES12SP3Beta3 kernel: [65121.555376]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 52
>    2017-06-06T09:24:33.412057+05:30 SLES12SP3Beta3 kernel: [65122.383341]
>    ABORT_TASK: Found referenced iSCSI task_tag: 43
>    2017-06-06T09:24:33.412065+05:30 SLES12SP3Beta3 kernel: [65122.383376]
>    ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 43
>    2017-06-06T09:24:33.476061+05:30 SLES12SP3Beta3 kernel: [65122.447354]
>    ABORT_TASK: Found referenced iSCSI task_tag: 63
>    2017-06-06T09:24:33.476070+05:30 SLES12SP3Beta3 kernel: [65122.447360]
>    ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 63
> 
>    [1] [1]https://patchwork.kernel.org/patch/9633675/
> 
>    Thanks,
>    Ram
>    --
>    To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>    the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>    More majordomo info at  [2]http://vger.kernel.org/majordomo-info.html
> 
> References
> 
>    Visible links
>    1. https://patchwork.kernel.org/patch/9633675/
>    2. http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2017-06-28  9:25 SQ overflow seen running isert traffic with high block sizes Amrani, Ram
       [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-06-28 10:39 ` Sagi Grimberg
  2017-06-28 11:32   ` Amrani, Ram
  1 sibling, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-28 10:39 UTC (permalink / raw)
  To: Amrani, Ram, linux-rdma; +Cc: Elior, Ariel, target-devel

Hey Ram, CC'ing target-devel for iser-target related posts.

> We are hitting SQ overflow on iSER target side with high block sizes over RoCE
> (see dmesg output below).
> 
> We are using Q-Logic/Cavium NIC with a capability of 4 sges.

That's somewhat expected if the device has low max_sge. It was decided
that queue_full mechanism is not something that iser-target should
handle but rather the iscsi-target core on top.

You probably should not get into aborts though... Does the I/O complete?
or does it fail?

Is this upstream? is [1] applied?

I could come up with some queue-full handling in isert that will be more
lightweight, but I'd let Nic make a judgment call before I do anything.

[1]:
commit a4467018c2a7228f4ef58051f0511bd037bff264
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Sun Oct 30 17:30:08 2016 -0700

     iscsi-target: Propigate queue_data_in + queue_status errors

     This patch changes iscsi-target to propagate iscsit_transport
     ->iscsit_queue_data_in() and ->iscsit_queue_status() callback
     errors, back up into target-core.

     This allows target-core to retry failed iscsit_transport
     callbacks using internal queue-full logic.

     Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
     Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
     Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
     Cc: Potnuri Bharat Teja <bharat@chelsio.com>
     Reported-by: Steve Wise <swise@opengridcomputing.com>
     Cc: Steve Wise <swise@opengridcomputing.com>
     Cc: Sagi Grimberg <sagi@grimberg.me>
     Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
       [not found]     ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
@ 2017-06-28 11:29       ` Amrani, Ram
  0 siblings, 0 replies; 26+ messages in thread
From: Amrani, Ram @ 2017-06-28 11:29 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel

> Here is some more discussion regarding SQ overflow.
> https://www.spinics.net/lists/linux-rdma/msg47635.html
> 
> Current SQ overflow handling is post error handling to keep the IO running and
> We still see SQ post failures filling the dmesg.
> We are still working on fixing SQ overflow.
> 
> Thanks,
> Bharat.

That's good to know.
Let us know when you have something working, we can help test it too.

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
  2017-06-28 10:39 ` Sagi Grimberg
@ 2017-06-28 11:32   ` Amrani, Ram
       [not found]     ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Amrani, Ram @ 2017-06-28 11:32 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma; +Cc: Elior, Ariel, target-devel

> > We are hitting SQ overflow on iSER target side with high block sizes over RoCE
> > (see dmesg output below).
> >
> > We are using Q-Logic/Cavium NIC with a capability of 4 sges.
> 
> That's somewhat expected if the device has low max_sge. It was decided
> that queue_full mechanism is not something that iser-target should
> handle but rather the iscsi-target core on top.
> 
> You probably should not get into aborts though... Does the I/O complete?
> or does it fail?

The IOs complete

> 
> Is this upstream? is [1] applied?
> 
> I could come up with some queue-full handling in isert that will be more
> lightweight, but I'd let Nic make a judgment call before I do anything.
> 
> [1]:
> commit a4467018c2a7228f4ef58051f0511bd037bff264
> Author: Nicholas Bellinger <nab@linux-iscsi.org>
> Date:   Sun Oct 30 17:30:08 2016 -0700
> 
>      iscsi-target: Propigate queue_data_in + queue_status errors
> 

Yes, the patch is applied.

Thanks,
Ram


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]     ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-07-13 18:29       ` Nicholas A. Bellinger
  2017-07-17  9:26         ` Amrani, Ram
  0 siblings, 1 reply; 26+ messages in thread
From: Nicholas A. Bellinger @ 2017-07-13 18:29 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel,
	target-devel, Potnuri Bharat Teja

Hi Ram & Co,

(Adding Potnuri CC')

On Wed, 2017-06-28 at 11:32 +0000, Amrani, Ram wrote:
> > > We are hitting SQ overflow on iSER target side with high block sizes over RoCE
> > > (see dmesg output below).
> > >
> > > We are using Q-Logic/Cavium NIC with a capability of 4 sges.
> > 
> > That's somewhat expected if the device has low max_sge. It was decided
> > that queue_full mechanism is not something that iser-target should
> > handle but rather the iscsi-target core on top.
> > 
> > You probably should not get into aborts though... Does the I/O complete?
> > or does it fail?
> 
> The IOs complete
> 
> > 
> > Is this upstream? is [1] applied?
> > 
> > I could come up with some queue-full handling in isert that will be more
> > lightweight, but I'd let Nic make a judgment call before I do anything.
> > 
> > [1]:
> > commit a4467018c2a7228f4ef58051f0511bd037bff264
> > Author: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> > Date:   Sun Oct 30 17:30:08 2016 -0700
> > 
> >      iscsi-target: Propigate queue_data_in + queue_status errors
> > 
> 
> Yes, the patch is applied.
> 

Just to confirm, the following four patches where required to get
Potnuri up and running on iser-target + iw_cxgb4 with a similarly small
number of hw SGEs:

7a56dc8 iser-target: avoid posting a recv buffer twice
555a65f iser-target: Fix queue-full response handling
a446701 iscsi-target: Propigate queue_data_in + queue_status errors
fa7e25c target: Fix unknown fabric callback queue-full errors

So Did you test with Q-Logic/Cavium with RoCE using these four patches,
or just with commit a4467018..?

Note these have not been CC'ed to stable yet, as I was reluctant since
they didn't have much mileage on them at the time..

Now however, they should be OK to consider for stable, especially if
they get you unblocked as well.









--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
  2017-07-13 18:29       ` Nicholas A. Bellinger
@ 2017-07-17  9:26         ` Amrani, Ram
       [not found]           ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Amrani, Ram @ 2017-07-17  9:26 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel,
	Potnuri Bharat Teja

Hi Nicholas,

> Just to confirm, the following four patches where required to get
> Potnuri up and running on iser-target + iw_cxgb4 with a similarly small
> number of hw SGEs:
> 
> 7a56dc8 iser-target: avoid posting a recv buffer twice
> 555a65f iser-target: Fix queue-full response handling
> a446701 iscsi-target: Propigate queue_data_in + queue_status errors
> fa7e25c target: Fix unknown fabric callback queue-full errors
> 
> So Did you test with Q-Logic/Cavium with RoCE using these four patches,
> or just with commit a4467018..?
> 
> Note these have not been CC'ed to stable yet, as I was reluctant since
> they didn't have much mileage on them at the time..
> 
> Now however, they should be OK to consider for stable, especially if
> they get you unblocked as well.

The issue is still seen with these four patches.

Thanks,
Ram


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]           ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-10-06 22:40             ` Shiraz Saleem
       [not found]               ` <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Shiraz Saleem @ 2017-10-06 22:40 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: Nicholas A. Bellinger, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel,
	Potnuri Bharat Teja

On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> Hi Nicholas,
> 
> > Just to confirm, the following four patches where required to get
> > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small
> > number of hw SGEs:
> > 
> > 7a56dc8 iser-target: avoid posting a recv buffer twice
> > 555a65f iser-target: Fix queue-full response handling
> > a446701 iscsi-target: Propigate queue_data_in + queue_status errors
> > fa7e25c target: Fix unknown fabric callback queue-full errors
> > 
> > So Did you test with Q-Logic/Cavium with RoCE using these four patches,
> > or just with commit a4467018..?
> > 
> > Note these have not been CC'ed to stable yet, as I was reluctant since
> > they didn't have much mileage on them at the time..
> > 
> > Now however, they should be OK to consider for stable, especially if
> > they get you unblocked as well.
> 
> The issue is still seen with these four patches.
> 
> Thanks,
> Ram

Hi,

On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on
isert for larger block sizes. 4.14-rc2 kernel.

Eventually there is a timeout/conn-error on iser initiator and the connection is torn down.

The aforementioned patches dont seem to be alleviating the SQ overflow issue?

Initiator
------------

[17007.465524] scsi host11: iSCSI Initiator over iSER
[17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2.
[17007.466924] iscsi: Rounding can_queue to 32.
[17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0  PQ: 0 ANSI: 5
[17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS
[17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
[17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
[17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[17007.472405] sd 11:0:0:0: [sdb] Write Protect is off
[17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
[17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk
[17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds
[17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
[17031.269821]  sdb:
[17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[17049.056155]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352
[17049.057499]  connection2:0: detected conn error (1022)
[17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3
[..]


Target
----------
[....]
[17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
[17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
[17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
[17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
[17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
[17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
[17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
[17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
[17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
[17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
[17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
[17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
[17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
[17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
[17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
[17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
[17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
[17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
[17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
[17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
[17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3
[17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
[17066.397251] QP 3 flush_issued
[17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
[17066.397253] Got unknown fabric queue status: -22
[17066.397254] QP 3 flush_issued
[17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
[17066.397255] Got unknown fabric queue status: -22
[17066.397258] QP 3 flush_issued
[17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
[17066.397259] Got unknown fabric queue status: -22
[17066.397267] QP 3 flush_issued
[17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
[17066.397268] Got unknown fabric queue status: -22
[17066.397287] QP 3 flush_issued
[17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
[17066.397288] Got unknown fabric queue status: -22
[17066.397291] QP 3 flush_issued
[17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
[17066.397292] Got unknown fabric queue status: -22
[17066.397295] QP 3 flush_issued
[17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
[17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
[17066.397297] Got unknown fabric queue status: -22
[17066.397307] QP 3 flush_issued
[17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22
[17066.397308] isert: isert_post_response: ib_post_send failed with -22
[17066.397309] i40iw i40iw_qp_disconnect Call close API
[....]

Shiraz 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]               ` <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
@ 2018-01-15  4:56                 ` Nicholas A. Bellinger
       [not found]                   ` <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-15  4:56 UTC (permalink / raw)
  To: Shiraz Saleem
  Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Elior, Ariel, target-devel, Potnuri Bharat Teja

Hi Shiraz, Ram, Ariel, & Potnuri,

Following up on this old thread, as it relates to Potnuri's recent fix
for a iser-target queue-full memory leak:

https://www.spinics.net/lists/target-devel/msg16282.html

Just curious how frequent this happens in practice with sustained large
block workloads, as it appears to effect at least three different
iwarp RNICS (i40iw, qedr and iw_cxgb4)..?

Is there anything else from an iser-target consumer level that should be
changed for iwarp to avoid repeated ib_post_send() failures..?

On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote:
> On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> > Hi Nicholas,
> > 
> > > Just to confirm, the following four patches where required to get
> > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small
> > > number of hw SGEs:
> > > 
> > > 7a56dc8 iser-target: avoid posting a recv buffer twice
> > > 555a65f iser-target: Fix queue-full response handling
> > > a446701 iscsi-target: Propigate queue_data_in + queue_status errors
> > > fa7e25c target: Fix unknown fabric callback queue-full errors
> > > 
> > > So Did you test with Q-Logic/Cavium with RoCE using these four patches,
> > > or just with commit a4467018..?
> > > 
> > > Note these have not been CC'ed to stable yet, as I was reluctant since
> > > they didn't have much mileage on them at the time..
> > > 
> > > Now however, they should be OK to consider for stable, especially if
> > > they get you unblocked as well.
> > 
> > The issue is still seen with these four patches.
> > 
> > Thanks,
> > Ram
> 
> Hi,
> 
> On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on
> isert for larger block sizes. 4.14-rc2 kernel.
> 
> Eventually there is a timeout/conn-error on iser initiator and the connection is torn down.
> 
> The aforementioned patches dont seem to be alleviating the SQ overflow issue?
> 
> Initiator
> ------------
> 
> [17007.465524] scsi host11: iSCSI Initiator over iSER
> [17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2.
> [17007.466924] iscsi: Rounding can_queue to 32.
> [17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0  PQ: 0 ANSI: 5
> [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS
> [17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
> [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
> [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
> [17007.472405] sd 11:0:0:0: [sdb] Write Protect is off
> [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
> [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk
> [17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds
> [17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
> [17031.269821]  sdb:
> [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
> [17049.056155]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352
> [17049.057499]  connection2:0: detected conn error (1022)
> [17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3
> [..]
> 
> 
> Target
> ----------
> [....]
> [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> [17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> [17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> [17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> [17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> [17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> [17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
> [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> [17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> [17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> [17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> [17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> [17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> [17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3
> [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> [17066.397251] QP 3 flush_issued
> [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> [17066.397253] Got unknown fabric queue status: -22
> [17066.397254] QP 3 flush_issued
> [17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> [17066.397255] Got unknown fabric queue status: -22
> [17066.397258] QP 3 flush_issued
> [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> [17066.397259] Got unknown fabric queue status: -22
> [17066.397267] QP 3 flush_issued
> [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> [17066.397268] Got unknown fabric queue status: -22
> [17066.397287] QP 3 flush_issued
> [17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> [17066.397288] Got unknown fabric queue status: -22
> [17066.397291] QP 3 flush_issued
> [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> [17066.397292] Got unknown fabric queue status: -22
> [17066.397295] QP 3 flush_issued
> [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> [17066.397297] Got unknown fabric queue status: -22
> [17066.397307] QP 3 flush_issued
> [17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22
> [17066.397308] isert: isert_post_response: ib_post_send failed with -22
> [17066.397309] i40iw i40iw_qp_disconnect Call close API
> [....]
> 
> Shiraz 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
       [not found]                   ` <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
@ 2018-01-15 10:12                     ` Kalderon, Michal
       [not found]                       ` <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Kalderon, Michal @ 2018-01-15 10:12 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Shiraz Saleem
  Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Elior, Ariel, target-devel, Potnuri Bharat Teja

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 10474 bytes --]

> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> Sent: Monday, January 15, 2018 6:57 AM
> To: Shiraz Saleem <shiraz.saleem@intel.com>
> Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> Potnuri Bharat Teja <bharat@chelsio.com>
> Subject: Re: SQ overflow seen running isert traffic with high block sizes
> 
> Hi Shiraz, Ram, Ariel, & Potnuri,
> 
> Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> target queue-full memory leak:
> 
> https://www.spinics.net/lists/target-devel/msg16282.html
> 
> Just curious how frequent this happens in practice with sustained large block
> workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> qedr and iw_cxgb4)..?
> 
> Is there anything else from an iser-target consumer level that should be
> changed for iwarp to avoid repeated ib_post_send() failures..?
> 
Would like to mention, that although we are an iWARP RNIC as well, we've hit this
Issue when running RoCE. It's not iWARP related. 
This is easily reproduced within seconds with IO size of 5121K
Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.

IO Command used:
maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1

thanks,
Michal

> On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote:
> > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> > > Hi Nicholas,
> > >
> > > > Just to confirm, the following four patches where required to get
> > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly
> > > > small number of hw SGEs:
> > > >
> > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f
> > > > iser-target: Fix queue-full response handling
> > > > a446701 iscsi-target: Propigate queue_data_in + queue_status
> > > > errors fa7e25c target: Fix unknown fabric callback queue-full
> > > > errors
> > > >
> > > > So Did you test with Q-Logic/Cavium with RoCE using these four
> > > > patches, or just with commit a4467018..?
> > > >
> > > > Note these have not been CC'ed to stable yet, as I was reluctant
> > > > since they didn't have much mileage on them at the time..
> > > >
> > > > Now however, they should be OK to consider for stable, especially
> > > > if they get you unblocked as well.
> > >
> > > The issue is still seen with these four patches.
> > >
> > > Thanks,
> > > Ram
> >
> > Hi,
> >
> > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ
> > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel.
> >
> > Eventually there is a timeout/conn-error on iser initiator and the
> connection is torn down.
> >
> > The aforementioned patches dont seem to be alleviating the SQ overflow
> issue?
> >
> > Initiator
> > ------------
> >
> > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295]
> > iscsi: invalid can_queue of 55. can_queue must be a power of 2.
> > [17007.466924] iscsi: Rounding can_queue to 32.
> > [17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0
> PQ: 0 ANSI: 5
> > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit
> > TPGS [17007.471656] scsi 11:0:0:0: alua: device
> > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
> > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
> > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks:
> > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is
> > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
> > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache:
> > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb]
> > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition
> > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00
> state A non-preferred supports TOlUSNA [17031.269821]  sdb:
> > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data
> > mode. Opts: (null) [17049.056155]  connection2:0: ping timeout of 5
> > secs expired, recv timeout 5, last rx 4311705998, last ping
> > 4311711232, now 4311716352 [17049.057499]  connection2:0: detected
> > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3
> > next_iw_state 3 [..]
> >
> >
> > Target
> > ----------
> > [....]
> > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020
> > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30
> > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8f20a0 failed to post RDMA res [17066.397195]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48
> > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ec020 failed to post RDMA res [17066.397204]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id =
> > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0
> > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ecc30 failed to post RDMA res [17066.397215]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ede48 failed to post RDMA res [17066.397232]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id =
> > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > failed to post RDMA res [17066.397251] QP 3 flush_issued
> > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > failed to post RDMA res [17066.397253] Got unknown fabric queue
> > status: -22 [17066.397254] QP 3 flush_issued [17066.397254]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP
> > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0
> > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown
> > fabric queue status: -22 [17066.397267] QP 3 flush_issued
> > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > failed to post RDMA res [17066.397268] Got unknown fabric queue
> > status: -22 [17066.397287] QP 3 flush_issued [17066.397287]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP
> > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0
> > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown
> > fabric queue status: -22 [17066.397295] QP 3 flush_issued
> > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > failed to post RDMA res [17066.397297] Got unknown fabric queue
> > status: -22 [17066.397307] QP 3 flush_issued [17066.397307]
> > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert:
> > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw
> > i40iw_qp_disconnect Call close API [....]
> >
> > Shiraz
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > target-devel" in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]                       ` <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-01-15 15:22                         ` Shiraz Saleem
  2018-01-18  9:58                           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 26+ messages in thread
From: Shiraz Saleem @ 2018-01-15 15:22 UTC (permalink / raw)
  To: Kalderon, Michal, Nicholas A. Bellinger
  Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Elior, Ariel, target-devel, Potnuri Bharat Teja

On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger
> > Sent: Monday, January 15, 2018 6:57 AM
> > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg
> > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel
> > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>;
> > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > 
> > Hi Shiraz, Ram, Ariel, & Potnuri,
> > 
> > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > target queue-full memory leak:
> > 
> > https://www.spinics.net/lists/target-devel/msg16282.html
> > 
> > Just curious how frequent this happens in practice with sustained large block
> > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > qedr and iw_cxgb4)..?
> > 
> > Is there anything else from an iser-target consumer level that should be
> > changed for iwarp to avoid repeated ib_post_send() failures..?
> > 
> Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> Issue when running RoCE. It's not iWARP related. 
> This is easily reproduced within seconds with IO size of 5121K
> Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> 
> IO Command used:
> maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> 
> thanks,
> Michal

Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
rather quickly, in a matter of seconds.

fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
--direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 

Shiraz

> 
> > On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote:
> > > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> > > > Hi Nicholas,
> > > >
> > > > > Just to confirm, the following four patches where required to get
> > > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly
> > > > > small number of hw SGEs:
> > > > >
> > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f
> > > > > iser-target: Fix queue-full response handling
> > > > > a446701 iscsi-target: Propigate queue_data_in + queue_status
> > > > > errors fa7e25c target: Fix unknown fabric callback queue-full
> > > > > errors
> > > > >
> > > > > So Did you test with Q-Logic/Cavium with RoCE using these four
> > > > > patches, or just with commit a4467018..?
> > > > >
> > > > > Note these have not been CC'ed to stable yet, as I was reluctant
> > > > > since they didn't have much mileage on them at the time..
> > > > >
> > > > > Now however, they should be OK to consider for stable, especially
> > > > > if they get you unblocked as well.
> > > >
> > > > The issue is still seen with these four patches.
> > > >
> > > > Thanks,
> > > > Ram
> > >
> > > Hi,
> > >
> > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ
> > > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel.
> > >
> > > Eventually there is a timeout/conn-error on iser initiator and the
> > connection is torn down.
> > >
> > > The aforementioned patches dont seem to be alleviating the SQ overflow
> > issue?
> > >
> > > Initiator
> > > ------------
> > >
> > > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295]
> > > iscsi: invalid can_queue of 55. can_queue must be a power of 2.
> > > [17007.466924] iscsi: Rounding can_queue to 32.
> > > [17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0
> > PQ: 0 ANSI: 5
> > > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit
> > > TPGS [17007.471656] scsi 11:0:0:0: alua: device
> > > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
> > > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
> > > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks:
> > > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is
> > > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
> > > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache:
> > > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb]
> > > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition
> > > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00
> > state A non-preferred supports TOlUSNA [17031.269821]  sdb:
> > > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data
> > > mode. Opts: (null) [17049.056155]  connection2:0: ping timeout of 5
> > > secs expired, recv timeout 5, last rx 4311705998, last ping
> > > 4311711232, now 4311716352 [17049.057499]  connection2:0: detected
> > > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3
> > > next_iw_state 3 [..]
> > >
> > >
> > > Target
> > > ----------
> > > [....]
> > > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020
> > > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30
> > > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8f20a0 failed to post RDMA res [17066.397195]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> > > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48
> > > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ec020 failed to post RDMA res [17066.397204]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> > > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id =
> > > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0
> > > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ecc30 failed to post RDMA res [17066.397215]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> > > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ede48 failed to post RDMA res [17066.397232]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> > > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> > > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id =
> > > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > > failed to post RDMA res [17066.397251] QP 3 flush_issued
> > > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > > failed to post RDMA res [17066.397253] Got unknown fabric queue
> > > status: -22 [17066.397254] QP 3 flush_issued [17066.397254]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> > > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP
> > > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0
> > > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown
> > > fabric queue status: -22 [17066.397267] QP 3 flush_issued
> > > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > > failed to post RDMA res [17066.397268] Got unknown fabric queue
> > > status: -22 [17066.397287] QP 3 flush_issued [17066.397287]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP
> > > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0
> > > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown
> > > fabric queue status: -22 [17066.397295] QP 3 flush_issued
> > > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > > failed to post RDMA res [17066.397297] Got unknown fabric queue
> > > status: -22 [17066.397307] QP 3 flush_issued [17066.397307]
> > > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert:
> > > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw
> > > i40iw_qp_disconnect Call close API [....]
> > >
> > > Shiraz
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > target-devel" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> > body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-15 15:22                         ` Shiraz Saleem
@ 2018-01-18  9:58                           ` Nicholas A. Bellinger
  2018-01-18 17:53                             ` Potnuri Bharat Teja
                                               ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-18  9:58 UTC (permalink / raw)
  To: Shiraz Saleem
  Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior,
	Ariel, target-devel, Potnuri Bharat Teja

Hi Shiraz, Michal & Co,

Thanks for the feedback.  Comments below.

On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > Sent: Monday, January 15, 2018 6:57 AM
> > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > 
> > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > 
> > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > target queue-full memory leak:
> > > 
> > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > 
> > > Just curious how frequent this happens in practice with sustained large block
> > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > qedr and iw_cxgb4)..?
> > > 
> > > Is there anything else from an iser-target consumer level that should be
> > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > 
> > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > Issue when running RoCE. It's not iWARP related. 
> > This is easily reproduced within seconds with IO size of 5121K
> > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > 
> > IO Command used:
> > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > 
> > thanks,
> > Michal
> 
> Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> rather quickly, in a matter of seconds.
> 
> fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> 

A couple of thoughts.

First, would it be helpful to limit maximum payload size per I/O for
consumers based on number of iser-target sq hw sges..?

That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
maximum payload size per I/O being too large there is an existing
target_core_fabric_ops mechanism for limiting using SCSI residuals,
originally utilized by qla2xxx here:

target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e

Note this patch also will return a smaller Block Limits VPD (0x86)
MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
automatically limit maximum outgoing payload transfer length, and avoid
SCSI residual logic.

As-is, iser-target doesn't a propagate max_data_sg_ents limit into
iscsi-target, but you can try testing with a smaller value to see if
it's useful.  Eg:

diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
index 0ebc481..d8a4cc5 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
        .module                         = THIS_MODULE,
        .name                           = "iscsi",
        .node_acl_size                  = sizeof(struct iscsi_node_acl),
+       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
        .get_fabric_name                = iscsi_get_fabric_name,
        .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
        .tpg_get_tag                    = lio_tpg_get_tag,

Second, if the failures are not SCSI transfer length specific, another
option would be to limit the total command sequence number depth (CmdSN)
per session.

This is controlled at runtime by default_cmdsn_depth TPG attribute:

/sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth

and on per initiator context with cmdsn_depth NodeACL attribute:

/sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth

Note these default to 64, and can be changed at build time via
include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH.

That said, Sagi, any further comments as what else iser-target should be
doing to avoid repeated queue-fulls with limited hw sges..?

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-18  9:58                           ` Nicholas A. Bellinger
@ 2018-01-18 17:53                             ` Potnuri Bharat Teja
       [not found]                               ` <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
       [not found]                               ` <1516778717.24576.319.came l@haakon3.daterainc.com>
  2018-01-19 19:33                             ` Kalderon, Michal
       [not found]                             ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
  2 siblings, 2 replies; 26+ messages in thread
From: Potnuri Bharat Teja @ 2018-01-18 17:53 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg,
	linux-rdma, Elior, Ariel, target-devel

Hi Nicholas, 
thanks for the suggestions. Comments below.

On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> Hi Shiraz, Michal & Co,
> 
> Thanks for the feedback.  Comments below.
> 
> On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > 
> > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > 
> > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > target queue-full memory leak:
> > > > 
> > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > 
> > > > Just curious how frequent this happens in practice with sustained large block
> > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > qedr and iw_cxgb4)..?
> > > > 
> > > > Is there anything else from an iser-target consumer level that should be
> > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > 
> > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > Issue when running RoCE. It's not iWARP related. 
> > > This is easily reproduced within seconds with IO size of 5121K
> > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > 
> > > IO Command used:
> > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > 
> > > thanks,
> > > Michal
> > 
> > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > rather quickly, in a matter of seconds.
> > 
> > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > 
> 
> A couple of thoughts.
> 
> First, would it be helpful to limit maximum payload size per I/O for
> consumers based on number of iser-target sq hw sges..?
yes, I think HW num sge needs to be propagated to iscsi target.
> 
> That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> maximum payload size per I/O being too large there is an existing

Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> target_core_fabric_ops mechanism for limiting using SCSI residuals,
> originally utilized by qla2xxx here:
> 
> target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> 
> Note this patch also will return a smaller Block Limits VPD (0x86)
> MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> automatically limit maximum outgoing payload transfer length, and avoid
> SCSI residual logic.
> 
> As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> iscsi-target, but you can try testing with a smaller value to see if
> it's useful.  Eg:
> 
> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> index 0ebc481..d8a4cc5 100644
> --- a/drivers/target/iscsi/iscsi_target_configfs.c
> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
>         .module                         = THIS_MODULE,
>         .name                           = "iscsi",
>         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
>         .get_fabric_name                = iscsi_get_fabric_name,
>         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
>         .tpg_get_tag                    = lio_tpg_get_tag,
> 
With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.
> Second, if the failures are not SCSI transfer length specific, another
> option would be to limit the total command sequence number depth (CmdSN)
> per session.
> 
> This is controlled at runtime by default_cmdsn_depth TPG attribute:
> 
> /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth
> 
> and on per initiator context with cmdsn_depth NodeACL attribute:
> 
> /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth
> 
> Note these default to 64, and can be changed at build time via
> include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH.
> 
> That said, Sagi, any further comments as what else iser-target should be
> doing to avoid repeated queue-fulls with limited hw sges..?
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-18  9:58                           ` Nicholas A. Bellinger
  2018-01-18 17:53                             ` Potnuri Bharat Teja
@ 2018-01-19 19:33                             ` Kalderon, Michal
  2018-01-24  7:55                               ` Nicholas A. Bellinger
       [not found]                             ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
  2 siblings, 1 reply; 26+ messages in thread
From: Kalderon, Michal @ 2018-01-19 19:33 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Shiraz Saleem
  Cc: Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel,
	target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav

________________________________________
From: Nicholas A. Bellinger <nab@linux-iscsi.org>
Sent: Thursday, January 18, 2018 11:58 AM

> Hi Shiraz, Michal & Co,
Hi Nicholas, 

> Thanks for the feedback.  Comments below.

> A couple of thoughts.

> First, would it be helpful to limit maximum payload size per I/O for
> consumers based on number of iser-target sq hw sges..?

I don't think you need to limit the maximum payload, but instead 
initialize the max_wr to be based on the number of supported SGEs
Instead of what is there today:
#define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
                                ISERT_MAX_TX_MISC_PDUS  + \
                                ISERT_MAX_RX_MISC_PDUS)
Add the maximum number of WQEs per command, 
The calculation of number of WQEs per command needs to be something like
"MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".

For some devices like ours, breaking the IO into multiple WRs according to supported
number of SGEs doesn't necessarily means performance penalty.

thanks,
Michal

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
       [not found]                             ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
@ 2018-01-22 17:49                               ` Saleem, Shiraz
  2018-01-24  8:01                                 ` Nicholas A. Bellinger
  0 siblings, 1 reply; 26+ messages in thread
From: Saleem, Shiraz @ 2018-01-22 17:49 UTC (permalink / raw)
  To: 'Nicholas A. Bellinger'
  Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel,
	Potnuri Bharat Teja

> Subject: Re: SQ overflow seen running isert traffic with high block sizes
> 
> 
> First, would it be helpful to limit maximum payload size per I/O for consumers
> based on number of iser-target sq hw sges..?
> 
Assuming data is not able to be fast registered as if virtually contiguous;
artificially limiting the data size might not be the best solution.

But max SGEs does need to be exposed higher. Somewhere in the stack,
there might need to be multiple WRs submitted or data copied.

> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> b/drivers/target/iscsi/iscsi_target_configf
> index 0ebc481..d8a4cc5 100644
> --- a/drivers/target/iscsi/iscsi_target_configfs.c
> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
>         .module                         = THIS_MODULE,
>         .name                           = "iscsi",
>         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
> TRANSFER LENGTH */
>         .get_fabric_name                = iscsi_get_fabric_name,
>         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
>         .tpg_get_tag                    = lio_tpg_get_tag,
> 

BTW, this is helping the SQ overflow issue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]                               ` <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
@ 2018-01-24  7:25                                 ` Nicholas A. Bellinger
  2018-01-24 12:21                                   ` Potnuri Bharat Teja
  0 siblings, 1 reply; 26+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-24  7:25 UTC (permalink / raw)
  To: Potnuri Bharat Teja
  Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel

Hi Potnuri & Co,

On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> Hi Nicholas, 
> thanks for the suggestions. Comments below.
> 
> On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> > Hi Shiraz, Michal & Co,
> > 
> > Thanks for the feedback.  Comments below.
> > 
> > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > > > > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger
> > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg
> > > > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel
> > > > > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>;
> > > > > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > 
> > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > 
> > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > > target queue-full memory leak:
> > > > > 
> > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > 
> > > > > Just curious how frequent this happens in practice with sustained large block
> > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > > qedr and iw_cxgb4)..?
> > > > > 
> > > > > Is there anything else from an iser-target consumer level that should be
> > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > 
> > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > > Issue when running RoCE. It's not iWARP related. 
> > > > This is easily reproduced within seconds with IO size of 5121K
> > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > > 
> > > > IO Command used:
> > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > 
> > > > thanks,
> > > > Michal
> > > 
> > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > > rather quickly, in a matter of seconds.
> > > 
> > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > > 
> > 
> > A couple of thoughts.
> > 
> > First, would it be helpful to limit maximum payload size per I/O for
> > consumers based on number of iser-target sq hw sges..?
> yes, I think HW num sge needs to be propagated to iscsi target.
> > 
> > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > maximum payload size per I/O being too large there is an existing
> 
> Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> 256k and for READ tests only with chelsio(iw_cxgb4) adapters.

Thanks for confirming.

> > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > originally utilized by qla2xxx here:
> > 
> > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> > 
> > Note this patch also will return a smaller Block Limits VPD (0x86)
> > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> > automatically limit maximum outgoing payload transfer length, and avoid
> > SCSI residual logic.
> > 
> > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > iscsi-target, but you can try testing with a smaller value to see if
> > it's useful.  Eg:
> > 
> > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> > index 0ebc481..d8a4cc5 100644
> > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> >         .module                         = THIS_MODULE,
> >         .name                           = "iscsi",
> >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
> >         .get_fabric_name                = iscsi_get_fabric_name,
> >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> >         .tpg_get_tag                    = lio_tpg_get_tag,
> > 
> With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.

OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
avoids SQ overflow with iw_cxgb4.

What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-19 19:33                             ` Kalderon, Michal
@ 2018-01-24  7:55                               ` Nicholas A. Bellinger
  2018-01-24  8:09                                 ` Kalderon, Michal
       [not found]                                 ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
  0 siblings, 2 replies; 26+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-24  7:55 UTC (permalink / raw)
  To: Kalderon, Michal
  Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior,
	Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon,
	Yoav

Hi Michal & Co,

On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote:
> ________________________________________
> From: Nicholas A. Bellinger <nab@linux-iscsi.org>
> Sent: Thursday, January 18, 2018 11:58 AM
> 
> > Hi Shiraz, Michal & Co,
> Hi Nicholas, 
> 
> > Thanks for the feedback.  Comments below.
> 
> > A couple of thoughts.
> 
> > First, would it be helpful to limit maximum payload size per I/O for
> > consumers based on number of iser-target sq hw sges..?
> 
> I don't think you need to limit the maximum payload, but instead 
> initialize the max_wr to be based on the number of supported SGEs
> Instead of what is there today:
> #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
>                                 ISERT_MAX_TX_MISC_PDUS  + \
>                                 ISERT_MAX_RX_MISC_PDUS)
> Add the maximum number of WQEs per command, 
> The calculation of number of WQEs per command needs to be something like
> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> 

Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
right..?

Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..

Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
limited attr.cap.max_send_sge..?

> For some devices like ours, breaking the IO into multiple WRs according to supported
> number of SGEs doesn't necessarily means performance penalty.
> 

AFAICT ading max_data_sg_nents for iser-target is safe enough
work-around to include for stable, assuming we agree on what the
max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a
larger default.  I don't have a string preference either way, as long as
it can be picked up for 4.x stable.

Sagi, WDYT..?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-22 17:49                               ` Saleem, Shiraz
@ 2018-01-24  8:01                                 ` Nicholas A. Bellinger
  2018-01-26 18:52                                   ` Shiraz Saleem
  2018-01-29 19:36                                   ` Sagi Grimberg
  0 siblings, 2 replies; 26+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-24  8:01 UTC (permalink / raw)
  To: Saleem, Shiraz
  Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior,
	Ariel, target-devel, Potnuri Bharat Teja

Hi Shiraz & Co,

Thanks for the feedback.

On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote:
> > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > 
> > 
> > First, would it be helpful to limit maximum payload size per I/O for consumers
> > based on number of iser-target sq hw sges..?
> > 
> Assuming data is not able to be fast registered as if virtually contiguous;
> artificially limiting the data size might not be the best solution.
> 
> But max SGEs does need to be exposed higher. Somewhere in the stack,
> there might need to be multiple WRs submitted or data copied.
> 

Sagi..?

> > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> > b/drivers/target/iscsi/iscsi_target_configf
> > index 0ebc481..d8a4cc5 100644
> > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> >         .module                         = THIS_MODULE,
> >         .name                           = "iscsi",
> >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
> > TRANSFER LENGTH */
> >         .get_fabric_name                = iscsi_get_fabric_name,
> >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> >         .tpg_get_tag                    = lio_tpg_get_tag,
> > 
> 
> BTW, this is helping the SQ overflow issue.

Thanks for confirming as a possible work-around.

For reference, what is i40iw's max_send_sg reporting..?

Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
LENGTH to avoid consistent SQ overflow as-is with i40iw..?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
  2018-01-24  7:55                               ` Nicholas A. Bellinger
@ 2018-01-24  8:09                                 ` Kalderon, Michal
  2018-01-29 19:20                                   ` Sagi Grimberg
       [not found]                                 ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Kalderon, Michal @ 2018-01-24  8:09 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior,
	Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon,
	Yoav

> From: Nicholas A. Bellinger [mailto:nab@linux-iscsi.org]
> Sent: Wednesday, January 24, 2018 9:56 AM
> 
> Hi Michal & Co,
> 
> On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote:
> > ________________________________________
> > From: Nicholas A. Bellinger <nab@linux-iscsi.org>
> > Sent: Thursday, January 18, 2018 11:58 AM
> >
> > > Hi Shiraz, Michal & Co,
> > Hi Nicholas,
> >
> > > Thanks for the feedback.  Comments below.
> >
> > > A couple of thoughts.
> >
> > > First, would it be helpful to limit maximum payload size per I/O for
> > > consumers based on number of iser-target sq hw sges..?
> >
> > I don't think you need to limit the maximum payload, but instead
> > initialize the max_wr to be based on the number of supported SGEs
> > Instead of what is there today:
> > #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
> >                                 ISERT_MAX_TX_MISC_PDUS  + \
> >                                 ISERT_MAX_RX_MISC_PDUS) Add the
> > maximum number of WQEs per command, The calculation of number of
> WQEs
> > per command needs to be something like
> > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> >
> 
> Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> right..?
Globally or perhaps configurable by sysfs configuration?
> 
> Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> ISER_MAX_CQ_LEN, which currently depend on
> ISERT_QP_MAX_REQ_DTOS..
I think it can remain dependent on MAX_REQ_DTOS. 
> 
> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime
> vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited
> attr.cap.max_send_sge..?
For our device defining max_data_sg_nents didn't help on some scenarios,
It seems that Frequency of the issue occurring increases with number of luns we
Try to run over. 

> 
> > For some devices like ours, breaking the IO into multiple WRs
> > according to supported number of SGEs doesn't necessarily means
> performance penalty.
> >
> 
> AFAICT ading max_data_sg_nents for iser-target is safe enough work-around
> to include for stable, assuming we agree on what the max_send_sg cut-off is
> for setting max_data_sg_nents=32 usage from a larger default.  I don't have
> a string preference either way, as long as it can be picked up for 4.x stable.
> 
> Sagi, WDYT..?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-24  7:25                                 ` Nicholas A. Bellinger
@ 2018-01-24 12:21                                   ` Potnuri Bharat Teja
  0 siblings, 0 replies; 26+ messages in thread
From: Potnuri Bharat Teja @ 2018-01-24 12:21 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg,
	linux-rdma, Elior, Ariel, target-devel

On Wednesday, January 01/24/18, 2018 at 12:55:17 +0530, Nicholas A. Bellinger wrote:
> Hi Potnuri & Co,
> 
> On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> > Hi Nicholas, 
> > thanks for the suggestions. Comments below.
> > 
> > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote:
> > > Hi Shiraz, Michal & Co,
> > > 
> > > Thanks for the feedback.  Comments below.
> > > 
> > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg
> > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel
> > > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>;
> > > > > > Potnuri Bharat Teja <bharat@chelsio.com>
> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > > 
> > > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > > 
> > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > > > > > target queue-full memory leak:
> > > > > > 
> > > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > > 
> > > > > > Just curious how frequent this happens in practice with sustained large block
> > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > > > > > qedr and iw_cxgb4)..?
> > > > > > 
> > > > > > Is there anything else from an iser-target consumer level that should be
> > > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > > 
> > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> > > > > Issue when running RoCE. It's not iWARP related. 
> > > > > This is easily reproduced within seconds with IO size of 5121K
> > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> > > > > 
> > > > > IO Command used:
> > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > > 
> > > > > thanks,
> > > > > Michal
> > > > 
> > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
> > > > rather quickly, in a matter of seconds.
> > > > 
> > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
> > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 
> > > > 
> > > 
> > > A couple of thoughts.
> > > 
> > > First, would it be helpful to limit maximum payload size per I/O for
> > > consumers based on number of iser-target sq hw sges..?
> > yes, I think HW num sge needs to be propagated to iscsi target.
> > > 
> > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > > maximum payload size per I/O being too large there is an existing
> > 
> > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> > 256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> 
> Thanks for confirming.
> 
> > > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > > originally utilized by qla2xxx here:
> > > 
> > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e
> > > 
> > > Note this patch also will return a smaller Block Limits VPD (0x86)
> > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which
> > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will
> > > automatically limit maximum outgoing payload transfer length, and avoid
> > > SCSI residual logic.
> > > 
> > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > > iscsi-target, but you can try testing with a smaller value to see if
> > > it's useful.  Eg:
> > > 
> > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf
> > > index 0ebc481..d8a4cc5 100644
> > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
> > >         .module                         = THIS_MODULE,
> > >         .name                           = "iscsi",
> > >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */
> > >         .get_fabric_name                = iscsi_get_fabric_name,
> > >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> > >         .tpg_get_tag                    = lio_tpg_get_tag,
> > > 
> > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16.
> 
> OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
> avoids SQ overflow with iw_cxgb4.
> 
> What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?
max_send_sge is 4 for iw_cxgb4
 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: SQ overflow seen running isert traffic with high block sizes
       [not found]                                 ` <1516778717.24576.319.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
@ 2018-01-24 16:03                                   ` Steve Wise
  0 siblings, 0 replies; 26+ messages in thread
From: Steve Wise @ 2018-01-24 16:03 UTC (permalink / raw)
  To: 'Nicholas A. Bellinger', 'Potnuri Bharat Teja'
  Cc: 'Shiraz Saleem', 'Kalderon, Michal',
	'Amrani, Ram', 'Sagi Grimberg',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Elior, Ariel',
	'target-devel'

Hey all,

> 
> Hi Potnuri & Co,
> 
> On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote:
> > Hi Nicholas,
> > thanks for the suggestions. Comments below.
> >
> > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger
> wrote:
> > > Hi Shiraz, Michal & Co,
> > >
> > > Thanks for the feedback.  Comments below.
> > >
> > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote:
> > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > > > > > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > > > > > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger
> > > > > > Sent: Monday, January 15, 2018 6:57 AM
> > > > > > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > > > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg
> > > > > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel
> > > > > > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target-
> devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>;
> > > > > > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > > > >
> > > > > > Hi Shiraz, Ram, Ariel, & Potnuri,
> > > > > >
> > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a
> iser-
> > > > > > target queue-full memory leak:
> > > > > >
> > > > > > https://www.spinics.net/lists/target-devel/msg16282.html
> > > > > >
> > > > > > Just curious how frequent this happens in practice with sustained large
> block
> > > > > > workloads, as it appears to effect at least three different iwarp RNICS
> (i40iw,
> > > > > > qedr and iw_cxgb4)..?
> > > > > >
> > > > > > Is there anything else from an iser-target consumer level that should be
> > > > > > changed for iwarp to avoid repeated ib_post_send() failures..?
> > > > > >
> > > > > Would like to mention, that although we are an iWARP RNIC as well,
> we've hit this
> > > > > Issue when running RoCE. It's not iWARP related.
> > > > > This is easily reproduced within seconds with IO size of 5121K
> > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks
> each.
> > > > >
> > > > > IO Command used:
> > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> > > > >
> > > > > thanks,
> > > > > Michal
> > > >
> > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And
> similar to Michals report;
> > > > rather quickly, in a matter of seconds.
> > > >
> > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --
> size=20g --loops=1 --ioengine=libaio
> > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --
> filename=/dev/sdb --name=sdb
> > > >
> > >
> > > A couple of thoughts.
> > >
> > > First, would it be helpful to limit maximum payload size per I/O for
> > > consumers based on number of iser-target sq hw sges..?
> > yes, I think HW num sge needs to be propagated to iscsi target.
> > >
> > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to
> > > maximum payload size per I/O being too large there is an existing
> >
> > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above
> > 256k and for READ tests only with chelsio(iw_cxgb4) adapters.
> 
> Thanks for confirming.
> 
> > > target_core_fabric_ops mechanism for limiting using SCSI residuals,
> > > originally utilized by qla2xxx here:
> > >
> > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8
> f9b565482c537821588444e09ff732c7d65ed6e
> > >
> > > Note this patch also will return a smaller Block Limits VPD (0x86)
> > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE,
> which
> > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH
> will
> > > automatically limit maximum outgoing payload transfer length, and avoid
> > > SCSI residual logic.
> > >
> > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into
> > > iscsi-target, but you can try testing with a smaller value to see if
> > > it's useful.  Eg:
> > >
> > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> b/drivers/target/iscsi/iscsi_target_configf
> > > index 0ebc481..d8a4cc5 100644
> > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd
> *se_cmd)
> > >         .module                         = THIS_MODULE,
> > >         .name                           = "iscsi",
> > >         .node_acl_size                  = sizeof(struct iscsi_node_acl),
> > > +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
> TRANSFER LENGTH */
> > >         .get_fabric_name                = iscsi_get_fabric_name,
> > >         .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
> > >         .tpg_get_tag                    = lio_tpg_get_tag,
> > >
> > With above change, SQ overflow isn't observed. I started of with
> max_data_sg_nents = 16.
> 
> OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages)
> avoids SQ overflow with iw_cxgb4.
> 
> What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..?

The ib_device attributes only advertise max_sge, which applies to both send and recv queues.  Because the iw_cxgb4 RQ max sge is 4, iw_cxgb4 currently advertises 4 for max_sge, even though the SQ can handle more.  So attr.cap.max_send_sge ends up being 4.

I have a todo on my list to extend the rdma/core to have max_send_sge and max_recv_sge attributes.  If it helps, I can bump up the priority of this.  Perhaps Bharat could do the work.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-24  8:01                                 ` Nicholas A. Bellinger
@ 2018-01-26 18:52                                   ` Shiraz Saleem
  2018-01-29 19:36                                   ` Sagi Grimberg
  1 sibling, 0 replies; 26+ messages in thread
From: Shiraz Saleem @ 2018-01-26 18:52 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior,
	Ariel, target-devel, Potnuri Bharat Teja

On Wed, Jan 24, 2018 at 01:01:58AM -0700, Nicholas A. Bellinger wrote:
> Hi Shiraz & Co,
> 
> Thanks for the feedback.
> 
> On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote:
> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > > 
> > > 
> > > First, would it be helpful to limit maximum payload size per I/O for consumers
> > > based on number of iser-target sq hw sges..?
> > > 
> > Assuming data is not able to be fast registered as if virtually contiguous;
> > artificially limiting the data size might not be the best solution.
> > 
> > But max SGEs does need to be exposed higher. Somewhere in the stack,
> > there might need to be multiple WRs submitted or data copied.
> > 
> 
> Sagi..?
> 

> 
> For reference, what is i40iw's max_send_sg reporting..?

3

> 
> Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
> LENGTH to avoid consistent SQ overflow as-is with i40iw..?

For the configuration I am testing, max_data_sg_nents=32 & 64 worked,
and the SQ overflow issue reproduced at 128.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]                                 ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
@ 2018-01-29 19:17                                   ` Sagi Grimberg
       [not found]                                     ` <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2018-01-29 19:17 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Kalderon, Michal
  Cc: Shiraz Saleem, Amrani, Ram, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit,
	Galon, Yoav


>>> First, would it be helpful to limit maximum payload size per I/O for
>>> consumers based on number of iser-target sq hw sges..?
>>
>> I don't think you need to limit the maximum payload, but instead
>> initialize the max_wr to be based on the number of supported SGEs
>> Instead of what is there today:
>> #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
>>                                  ISERT_MAX_TX_MISC_PDUS  + \
>>                                  ISERT_MAX_RX_MISC_PDUS)
>> Add the maximum number of WQEs per command,
>> The calculation of number of WQEs per command needs to be something like
>> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
>>
> 
> Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> right..?
> 
> Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..
> 
> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
> runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
> limited attr.cap.max_send_sge..?

Sorry for the late reply,

Can we go back and understand why do we need to limit isert transfer
size? I would suggest that we handle queue-full scenarios instead
of limiting the transfered payload size.

 From the trace Shiraz sent, it looks that:
a) we are too chatty when failing to post a wr on a queue-pair
(something that can happen by design), and
b) isert escalates to terminating the connection which means we
screwed up handling it.

Shiraz, can you explain these messages:
[17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
[17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3

Who is initiating the connection teardown? the initiator or the target?
(looks like the initiator gave up on iscsi ping timeout expiration)

Nic,
Currently, what I see is that queue-full handler simply schedules qf
work to re-issue the I/O again. The issue is that the only way that
a new send queue entry becomes available again is that isert process
one or more send completions. If at all, this work is interfering with
the isert_send_done handler.

Will it be possible that some transports will be able to schedule
qf_work_queue themselves? I guess It would also hold if iscsit were to
use non-blocking sockets and continue at .write_space()?

Something like transport_process_wait_list() that would be triggered
from the transport completion handler (or from centralize place like
target_sess_put_cmd or something...)?

Also, I see that this wait list is singular accross the se_device. Maybe
it would be a better idea to have it per se_session as it maps to iscsi
connection (or srp channel for that matter)? For large I/O sizes this
should happen quite a lot so its a bit of a shame that we need will
to compete over the list_empty check...

If we prefer to make this go away by limiting the transfer size then its
fine I guess, but maybe we can do better? (although it can take some
extra work...)

>> For some devices like ours, breaking the IO into multiple WRs according to supported
>> number of SGEs doesn't necessarily means performance penalty.
>>
> 
> AFAICT ading max_data_sg_nents for iser-target is safe enough
> work-around to include for stable, assuming we agree on what the
> max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a
> larger default.  I don't have a string preference either way, as long as
> it can be picked up for 4.x stable. >
> Sagi, WDYT..?

I think its an easier fix for sure. What I don't know, is weather this
introduces a regression for devices that can handle more sges on large
I/O sizes. I very much doubt it will though...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-24  8:09                                 ` Kalderon, Michal
@ 2018-01-29 19:20                                   ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2018-01-29 19:20 UTC (permalink / raw)
  To: Kalderon, Michal, Nicholas A. Bellinger
  Cc: Shiraz Saleem, Amrani, Ram, linux-rdma, Elior, Ariel,
	target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav


>> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime
>> vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited
>> attr.cap.max_send_sge..?
> For our device defining max_data_sg_nents didn't help on some scenarios,
> It seems that Frequency of the issue occurring increases with number of luns we
> Try to run over.

Maybe this is related to the queue-full strategy the target core takes
by simply scheduling another attempt unconditionally without any
hard guarantees that the next attempt will succeed?

This flow is per se_device which might be a hint why its happening more
with a larger number of luns? maybe the isert completion handler
context (which is also workqueue) struggles with finding cpu quota?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
  2018-01-24  8:01                                 ` Nicholas A. Bellinger
  2018-01-26 18:52                                   ` Shiraz Saleem
@ 2018-01-29 19:36                                   ` Sagi Grimberg
  1 sibling, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2018-01-29 19:36 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Saleem, Shiraz
  Cc: Kalderon, Michal, Amrani, Ram, linux-rdma, Elior, Ariel,
	target-devel, Potnuri Bharat Teja

Hi,

>>> First, would it be helpful to limit maximum payload size per I/O for consumers
>>> based on number of iser-target sq hw sges..?
>>>
>> Assuming data is not able to be fast registered as if virtually contiguous;
>> artificially limiting the data size might not be the best solution.
>>
>> But max SGEs does need to be exposed higher. Somewhere in the stack,
>> there might need to be multiple WRs submitted or data copied.
>>
> 
> Sagi..?

I tend to agree that if the adapter support just a hand-full of sges its
counter-productive to expose infinite data transfer size. On the other
hand, I think we should be able to chunk more with memory registrations
(although rdma rw code never even allocates them for non-iwarp devices).

We have an API for check this in the RDMA core (thanks to chuck)
introduced in:
commit 0062818298662d0d05061949d12880146b5ebd65
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Aug 28 15:06:14 2017 -0400

     rdma core: Add rdma_rw_mr_payload()

     The amount of payload per MR depends on device capabilities and
     the memory registration mode in use. The new rdma_rw API hides both,
     making it difficult for ULPs to determine how large their transport
     send queues need to be.

     Expose the MR payload information via a new API.

     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
     Acked-by: Doug Ledford <dledford@redhat.com>
     Signed-off-by: J. Bruce Fields <bfields@redhat.com>


So the easy way out would be to use that and plug it to
max_sg_data_nents. Regardless, queue-full logic today yields a TX attack
on the transport.

>>> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
>>> b/drivers/target/iscsi/iscsi_target_configf
>>> index 0ebc481..d8a4cc5 100644
>>> --- a/drivers/target/iscsi/iscsi_target_configfs.c
>>> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
>>> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
>>>          .module                         = THIS_MODULE,
>>>          .name                           = "iscsi",
>>>          .node_acl_size                  = sizeof(struct iscsi_node_acl),
>>> +       .max_data_sg_nents              = 32, /* 32 * PAGE_SIZE = MAXIMUM
>>> TRANSFER LENGTH */
>>>          .get_fabric_name                = iscsi_get_fabric_name,
>>>          .tpg_get_wwn                    = lio_tpg_get_endpoint_wwn,
>>>          .tpg_get_tag                    = lio_tpg_get_tag,
>>>
>>
>> BTW, this is helping the SQ overflow issue.
> 
> Thanks for confirming as a possible work-around.
> 
> For reference, what is i40iw's max_send_sg reporting..?
> 
> Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER
> LENGTH to avoid consistent SQ overflow as-is with i40iw..?

I vaguely recall that this is the maximum mr length for i40e (and cxgb4
if I'm not mistaken).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: SQ overflow seen running isert traffic with high block sizes
       [not found]                                     ` <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2018-01-30 16:30                                       ` Shiraz Saleem
  0 siblings, 0 replies; 26+ messages in thread
From: Shiraz Saleem @ 2018-01-30 16:30 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Nicholas A. Bellinger, Kalderon, Michal, Amrani, Ram,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel,
	Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav

On Mon, Jan 29, 2018 at 09:17:02PM +0200, Sagi Grimberg wrote:
> 
> > > > First, would it be helpful to limit maximum payload size per I/O for
> > > > consumers based on number of iser-target sq hw sges..?
> > > 
> > > I don't think you need to limit the maximum payload, but instead
> > > initialize the max_wr to be based on the number of supported SGEs
> > > Instead of what is there today:
> > > #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
> > >                                  ISERT_MAX_TX_MISC_PDUS  + \
> > >                                  ISERT_MAX_RX_MISC_PDUS)
> > > Add the maximum number of WQEs per command,
> > > The calculation of number of WQEs per command needs to be something like
> > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> > > 
> > 
> > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> > right..?
> > 
> > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..
> > 
> > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
> > runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
> > limited attr.cap.max_send_sge..?
> 
> Sorry for the late reply,
> 
> Can we go back and understand why do we need to limit isert transfer
> size? I would suggest that we handle queue-full scenarios instead
> of limiting the transfered payload size.
> 
> From the trace Shiraz sent, it looks that:
> a) we are too chatty when failing to post a wr on a queue-pair
> (something that can happen by design), and
> b) isert escalates to terminating the connection which means we
> screwed up handling it.
> 
> Shiraz, can you explain these messages:
> [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
> [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3

These are some device specific Asynchronous Event logging I turned on.
It indicates the QP received a FIN while in RTS and eventually was moved
to CLOSED state.

 
> Who is initiating the connection teardown? the initiator or the target?
> (looks like the initiator gave up on iscsi ping timeout expiration)
> 

Initiator

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-01-30 16:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28  9:25 SQ overflow seen running isert traffic with high block sizes Amrani, Ram
     [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-06-28 10:35   ` Potnuri Bharat Teja
     [not found]     ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-06-28 11:29       ` Amrani, Ram
2017-06-28 10:39 ` Sagi Grimberg
2017-06-28 11:32   ` Amrani, Ram
     [not found]     ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-07-13 18:29       ` Nicholas A. Bellinger
2017-07-17  9:26         ` Amrani, Ram
     [not found]           ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-06 22:40             ` Shiraz Saleem
     [not found]               ` <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2018-01-15  4:56                 ` Nicholas A. Bellinger
     [not found]                   ` <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
2018-01-15 10:12                     ` Kalderon, Michal
     [not found]                       ` <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-01-15 15:22                         ` Shiraz Saleem
2018-01-18  9:58                           ` Nicholas A. Bellinger
2018-01-18 17:53                             ` Potnuri Bharat Teja
     [not found]                               ` <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2018-01-24  7:25                                 ` Nicholas A. Bellinger
2018-01-24 12:21                                   ` Potnuri Bharat Teja
     [not found]                               ` <1516778717.24576.319.came l@haakon3.daterainc.com>
     [not found]                                 ` <1516778717.24576.319.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
2018-01-24 16:03                                   ` Steve Wise
2018-01-19 19:33                             ` Kalderon, Michal
2018-01-24  7:55                               ` Nicholas A. Bellinger
2018-01-24  8:09                                 ` Kalderon, Michal
2018-01-29 19:20                                   ` Sagi Grimberg
     [not found]                                 ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
2018-01-29 19:17                                   ` Sagi Grimberg
     [not found]                                     ` <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2018-01-30 16:30                                       ` Shiraz Saleem
     [not found]                             ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>
2018-01-22 17:49                               ` Saleem, Shiraz
2018-01-24  8:01                                 ` Nicholas A. Bellinger
2018-01-26 18:52                                   ` Shiraz Saleem
2018-01-29 19:36                                   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).