* SQ overflow seen running isert traffic with high block sizes @ 2017-06-28 9:25 Amrani, Ram [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2017-06-28 10:39 ` Sagi Grimberg 0 siblings, 2 replies; 44+ messages in thread From: Amrani, Ram @ 2017-06-28 9:25 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Elior, Ariel We are hitting SQ overflow on iSER target side with high block sizes over RoCE (see dmesg output below). We are using Q-Logic/Cavium NIC with a capability of 4 sges. Following the thread "SQ overflow seen running isert traffic" [1], I was wondering if someone is working on SQ accounting, or more graceful handling of overflow, as the messages are printed over and over. Dmesg output: 2017-06-06T09:23:28.824234+05:30 SLES12SP3Beta3 kernel: [65057.799615] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83cb2cb0 failed to post RDMA res 2017-06-06T09:23:29.500095+05:30 SLES12SP3Beta3 kernel: [65058.475858] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec53ec020 failed to post RDMA res 2017-06-06T09:23:29.560085+05:30 SLES12SP3Beta3 kernel: [65058.533787] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec622ae08 failed to post RDMA res 2017-06-06T09:23:29.984209+05:30 SLES12SP3Beta3 kernel: [65058.958509] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ff08e6bb0 failed to post RDMA res 2017-06-06T09:23:30.056098+05:30 SLES12SP3Beta3 kernel: [65059.032182] isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa6761138 failed to post RDMA res 2017-06-06T09:23:30.288152+05:30 SLES12SP3Beta3 kernel: [65059.262748] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec3caf668 failed to post RDMA res 2017-06-06T09:23:30.444068+05:30 SLES12SP3Beta3 kernel: [65059.421071] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f2186cc30 failed to post RDMA res 2017-06-06T09:23:30.532135+05:30 SLES12SP3Beta3 kernel: [65059.505380] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec6429bf0 failed to post RDMA res 2017-06-06T09:23:30.672098+05:30 SLES12SP3Beta3 kernel: [65059.645585] isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa5526bb0 failed to post RDMA res 2017-06-06T09:23:30.852121+05:30 SLES12SP3Beta3 kernel: [65059.828072] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f5c8a89d8 failed to post RDMA res 2017-06-06T09:23:31.464125+05:30 SLES12SP3Beta3 kernel: [65060.440092] isert: isert_rdma_rw_ctx_post: Cmd: ffff880ffefdf918 failed to post RDMA res 2017-06-06T09:23:31.576074+05:30 SLES12SP3Beta3 kernel: [65060.550314] isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83222350 failed to post RDMA res 2017-06-06T09:24:30.532064+05:30 SLES12SP3Beta3 kernel: [65119.503466] ABORT_TASK: Found referenced iSCSI task_tag: 103 2017-06-06T09:24:30.532079+05:30 SLES12SP3Beta3 kernel: [65119.503468] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 103 2017-06-06T09:24:31.428084+05:30 SLES12SP3Beta3 kernel: [65120.399433] ABORT_TASK: Found referenced iSCSI task_tag: 101 2017-06-06T09:24:31.428101+05:30 SLES12SP3Beta3 kernel: [65120.399436] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 101 2017-06-06T09:24:31.556053+05:30 SLES12SP3Beta3 kernel: [65120.527461] ABORT_TASK: Found referenced iSCSI task_tag: 119 2017-06-06T09:24:31.556060+05:30 SLES12SP3Beta3 kernel: [65120.527465] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 119 2017-06-06T09:24:31.556061+05:30 SLES12SP3Beta3 kernel: [65120.527468] ABORT_TASK: Found referenced iSCSI task_tag: 43 2017-06-06T09:24:31.556062+05:30 SLES12SP3Beta3 kernel: [65120.527469] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 43 2017-06-06T09:24:31.556063+05:30 SLES12SP3Beta3 kernel: [65120.527470] ABORT_TASK: Found referenced iSCSI task_tag: 79 2017-06-06T09:24:31.556064+05:30 SLES12SP3Beta3 kernel: [65120.527471] ABORT_TASK: Found referenced iSCSI task_tag: 71 2017-06-06T09:24:31.556066+05:30 SLES12SP3Beta3 kernel: [65120.527472] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 79 2017-06-06T09:24:31.556067+05:30 SLES12SP3Beta3 kernel: [65120.527472] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 71 2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527506] ABORT_TASK: Found referenced iSCSI task_tag: 122 2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527508] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 122 2017-06-06T09:24:32.452073+05:30 SLES12SP3Beta3 kernel: [65121.423425] ABORT_TASK: Found referenced iSCSI task_tag: 58 2017-06-06T09:24:32.452080+05:30 SLES12SP3Beta3 kernel: [65121.423427] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 58 2017-06-06T09:24:32.516054+05:30 SLES12SP3Beta3 kernel: [65121.487380] ABORT_TASK: Found referenced iSCSI task_tag: 100 2017-06-06T09:24:32.516061+05:30 SLES12SP3Beta3 kernel: [65121.487382] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 100 2017-06-06T09:24:32.584031+05:30 SLES12SP3Beta3 kernel: [65121.555374] ABORT_TASK: Found referenced iSCSI task_tag: 52 2017-06-06T09:24:32.584041+05:30 SLES12SP3Beta3 kernel: [65121.555376] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 52 2017-06-06T09:24:33.412057+05:30 SLES12SP3Beta3 kernel: [65122.383341] ABORT_TASK: Found referenced iSCSI task_tag: 43 2017-06-06T09:24:33.412065+05:30 SLES12SP3Beta3 kernel: [65122.383376] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 43 2017-06-06T09:24:33.476061+05:30 SLES12SP3Beta3 kernel: [65122.447354] ABORT_TASK: Found referenced iSCSI task_tag: 63 2017-06-06T09:24:33.476070+05:30 SLES12SP3Beta3 kernel: [65122.447360] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 63 [1] https://patchwork.kernel.org/patch/9633675/ Thanks, Ram -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2017-06-28 10:35 ` Potnuri Bharat Teja [not found] ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Potnuri Bharat Teja @ 2017-06-28 10:35 UTC (permalink / raw) To: Amrani, Ram; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel Here is some more discussion regarding SQ overflow. https://www.spinics.net/lists/linux-rdma/msg47635.html Current SQ overflow handling is post error handling to keep the IO running and We still see SQ post failures filling the dmesg. We are still working on fixing SQ overflow. Thanks, Bharat. On Wednesday, June 06/28/17, 2017 at 14:55:45 +0530, Amrani, Ram wrote: > We are hitting SQ overflow on iSER target side with high block sizes over > RoCE > (see dmesg output below). > > We are using Q-Logic/Cavium NIC with a capability of 4 sges. > > Following the thread "SQ overflow seen running isert traffic" [1], I was > wondering > if someone is working on SQ accounting, or more graceful handling of > overflow, > as the messages are printed over and over. > > Dmesg output: > 2017-06-06T09:23:28.824234+05:30 SLES12SP3Beta3 kernel: [65057.799615] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83cb2cb0 failed to post RDMA > res > 2017-06-06T09:23:29.500095+05:30 SLES12SP3Beta3 kernel: [65058.475858] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec53ec020 failed to post RDMA > res > 2017-06-06T09:23:29.560085+05:30 SLES12SP3Beta3 kernel: [65058.533787] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec622ae08 failed to post RDMA > res > 2017-06-06T09:23:29.984209+05:30 SLES12SP3Beta3 kernel: [65058.958509] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ff08e6bb0 failed to post RDMA > res > 2017-06-06T09:23:30.056098+05:30 SLES12SP3Beta3 kernel: [65059.032182] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa6761138 failed to post RDMA > res > 2017-06-06T09:23:30.288152+05:30 SLES12SP3Beta3 kernel: [65059.262748] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec3caf668 failed to post RDMA > res > 2017-06-06T09:23:30.444068+05:30 SLES12SP3Beta3 kernel: [65059.421071] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880f2186cc30 failed to post RDMA > res > 2017-06-06T09:23:30.532135+05:30 SLES12SP3Beta3 kernel: [65059.505380] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ec6429bf0 failed to post RDMA > res > 2017-06-06T09:23:30.672098+05:30 SLES12SP3Beta3 kernel: [65059.645585] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880fa5526bb0 failed to post RDMA > res > 2017-06-06T09:23:30.852121+05:30 SLES12SP3Beta3 kernel: [65059.828072] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880f5c8a89d8 failed to post RDMA > res > 2017-06-06T09:23:31.464125+05:30 SLES12SP3Beta3 kernel: [65060.440092] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880ffefdf918 failed to post RDMA > res > 2017-06-06T09:23:31.576074+05:30 SLES12SP3Beta3 kernel: [65060.550314] > isert: isert_rdma_rw_ctx_post: Cmd: ffff880f83222350 failed to post RDMA > res > 2017-06-06T09:24:30.532064+05:30 SLES12SP3Beta3 kernel: [65119.503466] > ABORT_TASK: Found referenced iSCSI task_tag: 103 > 2017-06-06T09:24:30.532079+05:30 SLES12SP3Beta3 kernel: [65119.503468] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 103 > 2017-06-06T09:24:31.428084+05:30 SLES12SP3Beta3 kernel: [65120.399433] > ABORT_TASK: Found referenced iSCSI task_tag: 101 > 2017-06-06T09:24:31.428101+05:30 SLES12SP3Beta3 kernel: [65120.399436] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 101 > 2017-06-06T09:24:31.556053+05:30 SLES12SP3Beta3 kernel: [65120.527461] > ABORT_TASK: Found referenced iSCSI task_tag: 119 > 2017-06-06T09:24:31.556060+05:30 SLES12SP3Beta3 kernel: [65120.527465] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 119 > 2017-06-06T09:24:31.556061+05:30 SLES12SP3Beta3 kernel: [65120.527468] > ABORT_TASK: Found referenced iSCSI task_tag: 43 > 2017-06-06T09:24:31.556062+05:30 SLES12SP3Beta3 kernel: [65120.527469] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 43 > 2017-06-06T09:24:31.556063+05:30 SLES12SP3Beta3 kernel: [65120.527470] > ABORT_TASK: Found referenced iSCSI task_tag: 79 > 2017-06-06T09:24:31.556064+05:30 SLES12SP3Beta3 kernel: [65120.527471] > ABORT_TASK: Found referenced iSCSI task_tag: 71 > 2017-06-06T09:24:31.556066+05:30 SLES12SP3Beta3 kernel: [65120.527472] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 79 > 2017-06-06T09:24:31.556067+05:30 SLES12SP3Beta3 kernel: [65120.527472] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 71 > 2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527506] > ABORT_TASK: Found referenced iSCSI task_tag: 122 > 2017-06-06T09:24:31.556068+05:30 SLES12SP3Beta3 kernel: [65120.527508] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 122 > 2017-06-06T09:24:32.452073+05:30 SLES12SP3Beta3 kernel: [65121.423425] > ABORT_TASK: Found referenced iSCSI task_tag: 58 > 2017-06-06T09:24:32.452080+05:30 SLES12SP3Beta3 kernel: [65121.423427] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 58 > 2017-06-06T09:24:32.516054+05:30 SLES12SP3Beta3 kernel: [65121.487380] > ABORT_TASK: Found referenced iSCSI task_tag: 100 > 2017-06-06T09:24:32.516061+05:30 SLES12SP3Beta3 kernel: [65121.487382] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 100 > 2017-06-06T09:24:32.584031+05:30 SLES12SP3Beta3 kernel: [65121.555374] > ABORT_TASK: Found referenced iSCSI task_tag: 52 > 2017-06-06T09:24:32.584041+05:30 SLES12SP3Beta3 kernel: [65121.555376] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 52 > 2017-06-06T09:24:33.412057+05:30 SLES12SP3Beta3 kernel: [65122.383341] > ABORT_TASK: Found referenced iSCSI task_tag: 43 > 2017-06-06T09:24:33.412065+05:30 SLES12SP3Beta3 kernel: [65122.383376] > ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 43 > 2017-06-06T09:24:33.476061+05:30 SLES12SP3Beta3 kernel: [65122.447354] > ABORT_TASK: Found referenced iSCSI task_tag: 63 > 2017-06-06T09:24:33.476070+05:30 SLES12SP3Beta3 kernel: [65122.447360] > ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 63 > > [1] [1]https://patchwork.kernel.org/patch/9633675/ > > Thanks, > Ram > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at [2]http://vger.kernel.org/majordomo-info.html > > References > > Visible links > 1. https://patchwork.kernel.org/patch/9633675/ > 2. http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>]
* RE: SQ overflow seen running isert traffic with high block sizes [not found] ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> @ 2017-06-28 11:29 ` Amrani, Ram 0 siblings, 0 replies; 44+ messages in thread From: Amrani, Ram @ 2017-06-28 11:29 UTC (permalink / raw) To: Potnuri Bharat Teja; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel > Here is some more discussion regarding SQ overflow. > https://www.spinics.net/lists/linux-rdma/msg47635.html > > Current SQ overflow handling is post error handling to keep the IO running and > We still see SQ post failures filling the dmesg. > We are still working on fixing SQ overflow. > > Thanks, > Bharat. That's good to know. Let us know when you have something working, we can help test it too. Thanks, Ram -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2017-06-28 9:25 SQ overflow seen running isert traffic with high block sizes Amrani, Ram [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2017-06-28 10:39 ` Sagi Grimberg 2017-06-28 11:32 ` Amrani, Ram 1 sibling, 1 reply; 44+ messages in thread From: Sagi Grimberg @ 2017-06-28 10:39 UTC (permalink / raw) To: Amrani, Ram, linux-rdma; +Cc: Elior, Ariel, target-devel Hey Ram, CC'ing target-devel for iser-target related posts. > We are hitting SQ overflow on iSER target side with high block sizes over RoCE > (see dmesg output below). > > We are using Q-Logic/Cavium NIC with a capability of 4 sges. That's somewhat expected if the device has low max_sge. It was decided that queue_full mechanism is not something that iser-target should handle but rather the iscsi-target core on top. You probably should not get into aborts though... Does the I/O complete? or does it fail? Is this upstream? is [1] applied? I could come up with some queue-full handling in isert that will be more lightweight, but I'd let Nic make a judgment call before I do anything. [1]: commit a4467018c2a7228f4ef58051f0511bd037bff264 Author: Nicholas Bellinger <nab@linux-iscsi.org> Date: Sun Oct 30 17:30:08 2016 -0700 iscsi-target: Propigate queue_data_in + queue_status errors This patch changes iscsi-target to propagate iscsit_transport ->iscsit_queue_data_in() and ->iscsit_queue_status() callback errors, back up into target-core. This allows target-core to retry failed iscsit_transport callbacks using internal queue-full logic. Reported-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com> Cc: Potnuri Bharat Teja <bharat@chelsio.com> Reported-by: Steve Wise <swise@opengridcomputing.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes 2017-06-28 10:39 ` Sagi Grimberg @ 2017-06-28 11:32 ` Amrani, Ram [not found] ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Amrani, Ram @ 2017-06-28 11:32 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma; +Cc: Elior, Ariel, target-devel > > We are hitting SQ overflow on iSER target side with high block sizes over RoCE > > (see dmesg output below). > > > > We are using Q-Logic/Cavium NIC with a capability of 4 sges. > > That's somewhat expected if the device has low max_sge. It was decided > that queue_full mechanism is not something that iser-target should > handle but rather the iscsi-target core on top. > > You probably should not get into aborts though... Does the I/O complete? > or does it fail? The IOs complete > > Is this upstream? is [1] applied? > > I could come up with some queue-full handling in isert that will be more > lightweight, but I'd let Nic make a judgment call before I do anything. > > [1]: > commit a4467018c2a7228f4ef58051f0511bd037bff264 > Author: Nicholas Bellinger <nab@linux-iscsi.org> > Date: Sun Oct 30 17:30:08 2016 -0700 > > iscsi-target: Propigate queue_data_in + queue_status errors > Yes, the patch is applied. Thanks, Ram ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2017-07-13 18:29 ` Nicholas A. Bellinger 2017-07-17 9:26 ` Amrani, Ram 0 siblings, 1 reply; 44+ messages in thread From: Nicholas A. Bellinger @ 2017-07-13 18:29 UTC (permalink / raw) To: Amrani, Ram Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Ram & Co, (Adding Potnuri CC') On Wed, 2017-06-28 at 11:32 +0000, Amrani, Ram wrote: > > > We are hitting SQ overflow on iSER target side with high block sizes over RoCE > > > (see dmesg output below). > > > > > > We are using Q-Logic/Cavium NIC with a capability of 4 sges. > > > > That's somewhat expected if the device has low max_sge. It was decided > > that queue_full mechanism is not something that iser-target should > > handle but rather the iscsi-target core on top. > > > > You probably should not get into aborts though... Does the I/O complete? > > or does it fail? > > The IOs complete > > > > > Is this upstream? is [1] applied? > > > > I could come up with some queue-full handling in isert that will be more > > lightweight, but I'd let Nic make a judgment call before I do anything. > > > > [1]: > > commit a4467018c2a7228f4ef58051f0511bd037bff264 > > Author: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> > > Date: Sun Oct 30 17:30:08 2016 -0700 > > > > iscsi-target: Propigate queue_data_in + queue_status errors > > > > Yes, the patch is applied. > Just to confirm, the following four patches where required to get Potnuri up and running on iser-target + iw_cxgb4 with a similarly small number of hw SGEs: 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f iser-target: Fix queue-full response handling a446701 iscsi-target: Propigate queue_data_in + queue_status errors fa7e25c target: Fix unknown fabric callback queue-full errors So Did you test with Q-Logic/Cavium with RoCE using these four patches, or just with commit a4467018..? Note these have not been CC'ed to stable yet, as I was reluctant since they didn't have much mileage on them at the time.. Now however, they should be OK to consider for stable, especially if they get you unblocked as well. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes 2017-07-13 18:29 ` Nicholas A. Bellinger @ 2017-07-17 9:26 ` Amrani, Ram [not found] ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Amrani, Ram @ 2017-07-17 9:26 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Nicholas, > Just to confirm, the following four patches where required to get > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small > number of hw SGEs: > > 7a56dc8 iser-target: avoid posting a recv buffer twice > 555a65f iser-target: Fix queue-full response handling > a446701 iscsi-target: Propigate queue_data_in + queue_status errors > fa7e25c target: Fix unknown fabric callback queue-full errors > > So Did you test with Q-Logic/Cavium with RoCE using these four patches, > or just with commit a4467018..? > > Note these have not been CC'ed to stable yet, as I was reluctant since > they didn't have much mileage on them at the time.. > > Now however, they should be OK to consider for stable, especially if > they get you unblocked as well. The issue is still seen with these four patches. Thanks, Ram ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2017-10-06 22:40 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2017-10-06 22:40 UTC (permalink / raw) To: Amrani, Ram Cc: Nicholas A. Bellinger, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > Hi Nicholas, > > > Just to confirm, the following four patches where required to get > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small > > number of hw SGEs: > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice > > 555a65f iser-target: Fix queue-full response handling > > a446701 iscsi-target: Propigate queue_data_in + queue_status errors > > fa7e25c target: Fix unknown fabric callback queue-full errors > > > > So Did you test with Q-Logic/Cavium with RoCE using these four patches, > > or just with commit a4467018..? > > > > Note these have not been CC'ed to stable yet, as I was reluctant since > > they didn't have much mileage on them at the time.. > > > > Now however, they should be OK to consider for stable, especially if > > they get you unblocked as well. > > The issue is still seen with these four patches. > > Thanks, > Ram Hi, On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on isert for larger block sizes. 4.14-rc2 kernel. Eventually there is a timeout/conn-error on iser initiator and the connection is torn down. The aforementioned patches dont seem to be alleviating the SQ overflow issue? Initiator ------------ [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2. [17007.466924] iscsi: Rounding can_queue to 32. [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 PQ: 0 ANSI: 5 [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS [17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [17031.269821] sdb: [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) [17049.056155] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352 [17049.057499] connection2:0: detected conn error (1022) [17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3 [..] Target ---------- [....] [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397251] QP 3 flush_issued [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397253] Got unknown fabric queue status: -22 [17066.397254] QP 3 flush_issued [17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown fabric queue status: -22 [17066.397267] QP 3 flush_issued [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397268] Got unknown fabric queue status: -22 [17066.397287] QP 3 flush_issued [17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown fabric queue status: -22 [17066.397295] QP 3 flush_issued [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397297] Got unknown fabric queue status: -22 [17066.397307] QP 3 flush_issued [17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert: isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw i40iw_qp_disconnect Call close API [....] Shiraz -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2017-10-06 22:40 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2017-10-06 22:40 UTC (permalink / raw) To: Amrani, Ram Cc: Nicholas A. Bellinger, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > Hi Nicholas, > > > Just to confirm, the following four patches where required to get > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small > > number of hw SGEs: > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice > > 555a65f iser-target: Fix queue-full response handling > > a446701 iscsi-target: Propigate queue_data_in + queue_status errors > > fa7e25c target: Fix unknown fabric callback queue-full errors > > > > So Did you test with Q-Logic/Cavium with RoCE using these four patches, > > or just with commit a4467018..? > > > > Note these have not been CC'ed to stable yet, as I was reluctant since > > they didn't have much mileage on them at the time.. > > > > Now however, they should be OK to consider for stable, especially if > > they get you unblocked as well. > > The issue is still seen with these four patches. > > Thanks, > Ram Hi, On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on isert for larger block sizes. 4.14-rc2 kernel. Eventually there is a timeout/conn-error on iser initiator and the connection is torn down. The aforementioned patches dont seem to be alleviating the SQ overflow issue? Initiator ------------ [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2. [17007.466924] iscsi: Rounding can_queue to 32. [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 PQ: 0 ANSI: 5 [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS [17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [17031.269821] sdb: [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) [17049.056155] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352 [17049.057499] connection2:0: detected conn error (1022) [17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3 [..] Target ---------- [....] [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397251] QP 3 flush_issued [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res [17066.397253] Got unknown fabric queue status: -22 [17066.397254] QP 3 flush_issued [17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown fabric queue status: -22 [17066.397267] QP 3 flush_issued [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res [17066.397268] Got unknown fabric queue status: -22 [17066.397287] QP 3 flush_issued [17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown fabric queue status: -22 [17066.397295] QP 3 flush_issued [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res [17066.397297] Got unknown fabric queue status: -22 [17066.397307] QP 3 flush_issued [17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert: isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw i40iw_qp_disconnect Call close API [....] Shiraz ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org> @ 2018-01-15 4:56 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-15 4:56 UTC (permalink / raw) To: Shiraz Saleem Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz, Ram, Ariel, & Potnuri, Following up on this old thread, as it relates to Potnuri's recent fix for a iser-target queue-full memory leak: https://www.spinics.net/lists/target-devel/msg16282.html Just curious how frequent this happens in practice with sustained large block workloads, as it appears to effect at least three different iwarp RNICS (i40iw, qedr and iw_cxgb4)..? Is there anything else from an iser-target consumer level that should be changed for iwarp to avoid repeated ib_post_send() failures..? On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote: > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > > Hi Nicholas, > > > > > Just to confirm, the following four patches where required to get > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small > > > number of hw SGEs: > > > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice > > > 555a65f iser-target: Fix queue-full response handling > > > a446701 iscsi-target: Propigate queue_data_in + queue_status errors > > > fa7e25c target: Fix unknown fabric callback queue-full errors > > > > > > So Did you test with Q-Logic/Cavium with RoCE using these four patches, > > > or just with commit a4467018..? > > > > > > Note these have not been CC'ed to stable yet, as I was reluctant since > > > they didn't have much mileage on them at the time.. > > > > > > Now however, they should be OK to consider for stable, especially if > > > they get you unblocked as well. > > > > The issue is still seen with these four patches. > > > > Thanks, > > Ram > > Hi, > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on > isert for larger block sizes. 4.14-rc2 kernel. > > Eventually there is a timeout/conn-error on iser initiator and the connection is torn down. > > The aforementioned patches dont seem to be alleviating the SQ overflow issue? > > Initiator > ------------ > > [17007.465524] scsi host11: iSCSI Initiator over iSER > [17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2. > [17007.466924] iscsi: Rounding can_queue to 32. > [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 PQ: 0 ANSI: 5 > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS > [17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB) > [17007.472405] sd 11:0:0:0: [sdb] Write Protect is off > [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk > [17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds > [17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA > [17031.269821] sdb: > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) > [17049.056155] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352 > [17049.057499] connection2:0: detected conn error (1022) > [17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3 > [..] > > > Target > ---------- > [....] > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 > [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 > [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397251] QP 3 flush_issued > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397253] Got unknown fabric queue status: -22 > [17066.397254] QP 3 flush_issued > [17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397255] Got unknown fabric queue status: -22 > [17066.397258] QP 3 flush_issued > [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397259] Got unknown fabric queue status: -22 > [17066.397267] QP 3 flush_issued > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397268] Got unknown fabric queue status: -22 > [17066.397287] QP 3 flush_issued > [17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397288] Got unknown fabric queue status: -22 > [17066.397291] QP 3 flush_issued > [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397292] Got unknown fabric queue status: -22 > [17066.397295] QP 3 flush_issued > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397297] Got unknown fabric queue status: -22 > [17066.397307] QP 3 flush_issued > [17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 > [17066.397308] isert: isert_post_response: ib_post_send failed with -22 > [17066.397309] i40iw i40iw_qp_disconnect Call close API > [....] > > Shiraz > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-15 4:56 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-15 4:56 UTC (permalink / raw) To: Shiraz Saleem Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz, Ram, Ariel, & Potnuri, Following up on this old thread, as it relates to Potnuri's recent fix for a iser-target queue-full memory leak: https://www.spinics.net/lists/target-devel/msg16282.html Just curious how frequent this happens in practice with sustained large block workloads, as it appears to effect at least three different iwarp RNICS (i40iw, qedr and iw_cxgb4)..? Is there anything else from an iser-target consumer level that should be changed for iwarp to avoid repeated ib_post_send() failures..? On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote: > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > > Hi Nicholas, > > > > > Just to confirm, the following four patches where required to get > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly small > > > number of hw SGEs: > > > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice > > > 555a65f iser-target: Fix queue-full response handling > > > a446701 iscsi-target: Propigate queue_data_in + queue_status errors > > > fa7e25c target: Fix unknown fabric callback queue-full errors > > > > > > So Did you test with Q-Logic/Cavium with RoCE using these four patches, > > > or just with commit a4467018..? > > > > > > Note these have not been CC'ed to stable yet, as I was reluctant since > > > they didn't have much mileage on them at the time.. > > > > > > Now however, they should be OK to consider for stable, especially if > > > they get you unblocked as well. > > > > The issue is still seen with these four patches. > > > > Thanks, > > Ram > > Hi, > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ overflow being hit on > isert for larger block sizes. 4.14-rc2 kernel. > > Eventually there is a timeout/conn-error on iser initiator and the connection is torn down. > > The aforementioned patches dont seem to be alleviating the SQ overflow issue? > > Initiator > ------------ > > [17007.465524] scsi host11: iSCSI Initiator over iSER > [17007.466295] iscsi: invalid can_queue of 55. can_queue must be a power of 2. > [17007.466924] iscsi: Rounding can_queue to 32. > [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 PQ: 0 ANSI: 5 > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit TPGS > [17007.471656] scsi 11:0:0:0: alua: device naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB) > [17007.472405] sd 11:0:0:0: [sdb] Write Protect is off > [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [17007.473412] sd 11:0:0:0: [sdb] Attached SCSI disk > [17007.478184] sd 11:0:0:0: alua: transition timeout set to 60 seconds > [17007.478186] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA > [17031.269821] sdb: > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) > [17049.056155] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311705998, last ping 4311711232, now 4311716352 > [17049.057499] connection2:0: detected conn error (1022) > [17049.057558] modifyQP to CLOSING qp 3 next_iw_state 3 > [..] > > > Target > ---------- > [....] > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397183] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397184] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397184] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397192] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397195] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397196] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397200] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397204] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397204] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 > [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397211] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397215] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397215] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397219] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397232] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397233] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397238] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397242] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397242] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 > [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397251] QP 3 flush_issued > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > [17066.397253] Got unknown fabric queue status: -22 > [17066.397254] QP 3 flush_issued > [17066.397254] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397254] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > [17066.397255] Got unknown fabric queue status: -22 > [17066.397258] QP 3 flush_issued > [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > [17066.397259] Got unknown fabric queue status: -22 > [17066.397267] QP 3 flush_issued > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > [17066.397268] Got unknown fabric queue status: -22 > [17066.397287] QP 3 flush_issued > [17066.397287] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397288] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > [17066.397288] Got unknown fabric queue status: -22 > [17066.397291] QP 3 flush_issued > [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > [17066.397292] Got unknown fabric queue status: -22 > [17066.397295] QP 3 flush_issued > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > [17066.397297] Got unknown fabric queue status: -22 > [17066.397307] QP 3 flush_issued > [17066.397307] i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 > [17066.397308] isert: isert_post_response: ib_post_send failed with -22 > [17066.397309] i40iw i40iw_qp_disconnect Call close API > [....] > > Shiraz > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>]
* RE: SQ overflow seen running isert traffic with high block sizes [not found] ` <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> @ 2018-01-15 10:12 ` Kalderon, Michal 0 siblings, 0 replies; 44+ messages in thread From: Kalderon, Michal @ 2018-01-15 10:12 UTC (permalink / raw) To: Nicholas A. Bellinger, Shiraz Saleem Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 10474 bytes --] > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > Sent: Monday, January 15, 2018 6:57 AM > To: Shiraz Saleem <shiraz.saleem@intel.com> > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > Potnuri Bharat Teja <bharat@chelsio.com> > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > Hi Shiraz, Ram, Ariel, & Potnuri, > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > target queue-full memory leak: > > https://www.spinics.net/lists/target-devel/msg16282.html > > Just curious how frequent this happens in practice with sustained large block > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > qedr and iw_cxgb4)..? > > Is there anything else from an iser-target consumer level that should be > changed for iwarp to avoid repeated ib_post_send() failures..? > Would like to mention, that although we are an iWARP RNIC as well, we've hit this Issue when running RoCE. It's not iWARP related. This is easily reproduced within seconds with IO size of 5121K Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. IO Command used: maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 thanks, Michal > On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote: > > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > > > Hi Nicholas, > > > > > > > Just to confirm, the following four patches where required to get > > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly > > > > small number of hw SGEs: > > > > > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f > > > > iser-target: Fix queue-full response handling > > > > a446701 iscsi-target: Propigate queue_data_in + queue_status > > > > errors fa7e25c target: Fix unknown fabric callback queue-full > > > > errors > > > > > > > > So Did you test with Q-Logic/Cavium with RoCE using these four > > > > patches, or just with commit a4467018..? > > > > > > > > Note these have not been CC'ed to stable yet, as I was reluctant > > > > since they didn't have much mileage on them at the time.. > > > > > > > > Now however, they should be OK to consider for stable, especially > > > > if they get you unblocked as well. > > > > > > The issue is still seen with these four patches. > > > > > > Thanks, > > > Ram > > > > Hi, > > > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ > > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel. > > > > Eventually there is a timeout/conn-error on iser initiator and the > connection is torn down. > > > > The aforementioned patches dont seem to be alleviating the SQ overflow > issue? > > > > Initiator > > ------------ > > > > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295] > > iscsi: invalid can_queue of 55. can_queue must be a power of 2. > > [17007.466924] iscsi: Rounding can_queue to 32. > > [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 > PQ: 0 ANSI: 5 > > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit > > TPGS [17007.471656] scsi 11:0:0:0: alua: device > > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 > > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 > > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: > > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is > > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 > > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: > > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb] > > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition > > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00 > state A non-preferred supports TOlUSNA [17031.269821] sdb: > > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data > > mode. Opts: (null) [17049.056155] connection2:0: ping timeout of 5 > > secs expired, recv timeout 5, last rx 4311705998, last ping > > 4311711232, now 4311716352 [17049.057499] connection2:0: detected > > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3 > > next_iw_state 3 [..] > > > > > > Target > > ---------- > > [....] > > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 > > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 > > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8f20a0 failed to post RDMA res [17066.397195] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 > > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ec020 failed to post RDMA res [17066.397204] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = > > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 > > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ecc30 failed to post RDMA res [17066.397215] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ede48 failed to post RDMA res [17066.397232] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode > > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = > > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > failed to post RDMA res [17066.397251] QP 3 flush_issued > > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > failed to post RDMA res [17066.397253] Got unknown fabric queue > > status: -22 [17066.397254] QP 3 flush_issued [17066.397254] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP > > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 > > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown > > fabric queue status: -22 [17066.397267] QP 3 flush_issued > > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > failed to post RDMA res [17066.397268] Got unknown fabric queue > > status: -22 [17066.397287] QP 3 flush_issued [17066.397287] > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert: > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP > > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 > > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: > > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown > > fabric queue status: -22 [17066.397295] QP 3 flush_issued > > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > failed to post RDMA res [17066.397297] Got unknown fabric queue > > status: -22 [17066.397307] QP 3 flush_issued [17066.397307] > > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert: > > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw > > i40iw_qp_disconnect Call close API [....] > > > > Shiraz > > -- > > To unsubscribe from this list: send the line "unsubscribe > > target-devel" in the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes @ 2018-01-15 10:12 ` Kalderon, Michal 0 siblings, 0 replies; 44+ messages in thread From: Kalderon, Michal @ 2018-01-15 10:12 UTC (permalink / raw) To: Nicholas A. Bellinger, Shiraz Saleem Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja PiBGcm9tOiBsaW51eC1yZG1hLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LXJk bWEtDQo+IG93bmVyQHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIE5pY2hvbGFzIEEuIEJl bGxpbmdlcg0KPiBTZW50OiBNb25kYXksIEphbnVhcnkgMTUsIDIwMTggNjo1NyBBTQ0KPiBUbzog U2hpcmF6IFNhbGVlbSA8c2hpcmF6LnNhbGVlbUBpbnRlbC5jb20+DQo+IENjOiBBbXJhbmksIFJh bSA8UmFtLkFtcmFuaUBjYXZpdW0uY29tPjsgU2FnaSBHcmltYmVyZw0KPiA8c2FnaUBncmltYmVy Zy5tZT47IGxpbnV4LXJkbWFAdmdlci5rZXJuZWwub3JnOyBFbGlvciwgQXJpZWwNCj4gPEFyaWVs LkVsaW9yQGNhdml1bS5jb20+OyB0YXJnZXQtZGV2ZWwgPHRhcmdldC1kZXZlbEB2Z2VyLmtlcm5l bC5vcmc+Ow0KPiBQb3RudXJpIEJoYXJhdCBUZWphIDxiaGFyYXRAY2hlbHNpby5jb20+DQo+IFN1 YmplY3Q6IFJlOiBTUSBvdmVyZmxvdyBzZWVuIHJ1bm5pbmcgaXNlcnQgdHJhZmZpYyB3aXRoIGhp Z2ggYmxvY2sgc2l6ZXMNCj4gDQo+IEhpIFNoaXJheiwgUmFtLCBBcmllbCwgJiBQb3RudXJpLA0K PiANCj4gRm9sbG93aW5nIHVwIG9uIHRoaXMgb2xkIHRocmVhZCwgYXMgaXQgcmVsYXRlcyB0byBQ b3RudXJpJ3MgcmVjZW50IGZpeCBmb3IgYSBpc2VyLQ0KPiB0YXJnZXQgcXVldWUtZnVsbCBtZW1v cnkgbGVhazoNCj4gDQo+IGh0dHBzOi8vd3d3LnNwaW5pY3MubmV0L2xpc3RzL3RhcmdldC1kZXZl bC9tc2cxNjI4Mi5odG1sDQo+IA0KPiBKdXN0IGN1cmlvdXMgaG93IGZyZXF1ZW50IHRoaXMgaGFw cGVucyBpbiBwcmFjdGljZSB3aXRoIHN1c3RhaW5lZCBsYXJnZSBibG9jaw0KPiB3b3JrbG9hZHMs IGFzIGl0IGFwcGVhcnMgdG8gZWZmZWN0IGF0IGxlYXN0IHRocmVlIGRpZmZlcmVudCBpd2FycCBS TklDUyAoaTQwaXcsDQo+IHFlZHIgYW5kIGl3X2N4Z2I0KS4uPw0KPiANCj4gSXMgdGhlcmUgYW55 dGhpbmcgZWxzZSBmcm9tIGFuIGlzZXItdGFyZ2V0IGNvbnN1bWVyIGxldmVsIHRoYXQgc2hvdWxk IGJlDQo+IGNoYW5nZWQgZm9yIGl3YXJwIHRvIGF2b2lkIHJlcGVhdGVkIGliX3Bvc3Rfc2VuZCgp IGZhaWx1cmVzLi4/DQo+IA0KV291bGQgbGlrZSB0byBtZW50aW9uLCB0aGF0IGFsdGhvdWdoIHdl IGFyZSBhbiBpV0FSUCBSTklDIGFzIHdlbGwsIHdlJ3ZlIGhpdCB0aGlzDQpJc3N1ZSB3aGVuIHJ1 bm5pbmcgUm9DRS4gSXQncyBub3QgaVdBUlAgcmVsYXRlZC4gDQpUaGlzIGlzIGVhc2lseSByZXBy b2R1Y2VkIHdpdGhpbiBzZWNvbmRzIHdpdGggSU8gc2l6ZSBvZiA1MTIxSw0KVXNpbmcgNSBUYXJn ZXRzIHdpdGggMiBSYW0gRGlzayBlYWNoIGFuZCA1IHRhcmdldHMgd2l0aCBGaWxlSU8gRGlza3Mg ZWFjaC4NCg0KSU8gQ29tbWFuZCB1c2VkOg0KbWFpbSAtYjUxMmsgLVQzMiAtdDIgLVE4IC1NMCAt byAtdSAtbiAtbTE3IC1mdGFyZ2V0cy5kYXQgLWQxDQoNCnRoYW5rcywNCk1pY2hhbA0KDQo+IE9u IEZyaSwgMjAxNy0xMC0wNiBhdCAxNzo0MCAtMDUwMCwgU2hpcmF6IFNhbGVlbSB3cm90ZToNCj4g PiBPbiBNb24sIEp1bCAxNywgMjAxNyBhdCAwMzoyNjowNEFNIC0wNjAwLCBBbXJhbmksIFJhbSB3 cm90ZToNCj4gPiA+IEhpIE5pY2hvbGFzLA0KPiA+ID4NCj4gPiA+ID4gSnVzdCB0byBjb25maXJt LCB0aGUgZm9sbG93aW5nIGZvdXIgcGF0Y2hlcyB3aGVyZSByZXF1aXJlZCB0byBnZXQNCj4gPiA+ ID4gUG90bnVyaSB1cCBhbmQgcnVubmluZyBvbiBpc2VyLXRhcmdldCArIGl3X2N4Z2I0IHdpdGgg YSBzaW1pbGFybHkNCj4gPiA+ID4gc21hbGwgbnVtYmVyIG9mIGh3IFNHRXM6DQo+ID4gPiA+DQo+ ID4gPiA+IDdhNTZkYzggaXNlci10YXJnZXQ6IGF2b2lkIHBvc3RpbmcgYSByZWN2IGJ1ZmZlciB0 d2ljZSA1NTVhNjVmDQo+ID4gPiA+IGlzZXItdGFyZ2V0OiBGaXggcXVldWUtZnVsbCByZXNwb25z ZSBoYW5kbGluZw0KPiA+ID4gPiBhNDQ2NzAxIGlzY3NpLXRhcmdldDogUHJvcGlnYXRlIHF1ZXVl X2RhdGFfaW4gKyBxdWV1ZV9zdGF0dXMNCj4gPiA+ID4gZXJyb3JzIGZhN2UyNWMgdGFyZ2V0OiBG aXggdW5rbm93biBmYWJyaWMgY2FsbGJhY2sgcXVldWUtZnVsbA0KPiA+ID4gPiBlcnJvcnMNCj4g PiA+ID4NCj4gPiA+ID4gU28gRGlkIHlvdSB0ZXN0IHdpdGggUS1Mb2dpYy9DYXZpdW0gd2l0aCBS b0NFIHVzaW5nIHRoZXNlIGZvdXINCj4gPiA+ID4gcGF0Y2hlcywgb3IganVzdCB3aXRoIGNvbW1p dCBhNDQ2NzAxOC4uPw0KPiA+ID4gPg0KPiA+ID4gPiBOb3RlIHRoZXNlIGhhdmUgbm90IGJlZW4g Q0MnZWQgdG8gc3RhYmxlIHlldCwgYXMgSSB3YXMgcmVsdWN0YW50DQo+ID4gPiA+IHNpbmNlIHRo ZXkgZGlkbid0IGhhdmUgbXVjaCBtaWxlYWdlIG9uIHRoZW0gYXQgdGhlIHRpbWUuLg0KPiA+ID4g Pg0KPiA+ID4gPiBOb3cgaG93ZXZlciwgdGhleSBzaG91bGQgYmUgT0sgdG8gY29uc2lkZXIgZm9y IHN0YWJsZSwgZXNwZWNpYWxseQ0KPiA+ID4gPiBpZiB0aGV5IGdldCB5b3UgdW5ibG9ja2VkIGFz IHdlbGwuDQo+ID4gPg0KPiA+ID4gVGhlIGlzc3VlIGlzIHN0aWxsIHNlZW4gd2l0aCB0aGVzZSBm b3VyIHBhdGNoZXMuDQo+ID4gPg0KPiA+ID4gVGhhbmtzLA0KPiA+ID4gUmFtDQo+ID4NCj4gPiBI aSwNCj4gPg0KPiA+IE9uIFg3MjIgSXdhcnAgTklDcyAoaTQwaXcpIHRvbywgd2UgYXJlIHNlZWlu ZyBhIHNpbWlsYXIgaXNzdWUgb2YgU1ENCj4gPiBvdmVyZmxvdyBiZWluZyBoaXQgb24gaXNlcnQg Zm9yIGxhcmdlciBibG9jayBzaXplcy4gNC4xNC1yYzIga2VybmVsLg0KPiA+DQo+ID4gRXZlbnR1 YWxseSB0aGVyZSBpcyBhIHRpbWVvdXQvY29ubi1lcnJvciBvbiBpc2VyIGluaXRpYXRvciBhbmQg dGhlDQo+IGNvbm5lY3Rpb24gaXMgdG9ybiBkb3duLg0KPiA+DQo+ID4gVGhlIGFmb3JlbWVudGlv bmVkIHBhdGNoZXMgZG9udCBzZWVtIHRvIGJlIGFsbGV2aWF0aW5nIHRoZSBTUSBvdmVyZmxvdw0K PiBpc3N1ZT8NCj4gPg0KPiA+IEluaXRpYXRvcg0KPiA+IC0tLS0tLS0tLS0tLQ0KPiA+DQo+ID4g WzE3MDA3LjQ2NTUyNF0gc2NzaSBob3N0MTE6IGlTQ1NJIEluaXRpYXRvciBvdmVyIGlTRVIgWzE3 MDA3LjQ2NjI5NV0NCj4gPiBpc2NzaTogaW52YWxpZCBjYW5fcXVldWUgb2YgNTUuIGNhbl9xdWV1 ZSBtdXN0IGJlIGEgcG93ZXIgb2YgMi4NCj4gPiBbMTcwMDcuNDY2OTI0XSBpc2NzaTogUm91bmRp bmcgY2FuX3F1ZXVlIHRvIDMyLg0KPiA+IFsxNzAwNy40NzE1MzVdIHNjc2kgMTE6MDowOjA6IERp cmVjdC1BY2Nlc3MgICAgIExJTy1PUkcgIHJhbWRpc2sxXzQwRyAgICAgNC4wDQo+IFBROiAwIEFO U0k6IDUNCj4gPiBbMTcwMDcuNDcxNjUyXSBzY3NpIDExOjA6MDowOiBhbHVhOiBzdXBwb3J0cyBp bXBsaWNpdCBhbmQgZXhwbGljaXQNCj4gPiBUUEdTIFsxNzAwNy40NzE2NTZdIHNjc2kgMTE6MDow OjA6IGFsdWE6IGRldmljZQ0KPiA+IG5hYS42MDAxNDA1YWI3OTBkYjVlOGU5NGIwOTk4YWI0YmYw YiBwb3J0IGdyb3VwIDAgcmVsIHBvcnQgMQ0KPiA+IFsxNzAwNy40NzE3ODJdIHNkIDExOjA6MDow OiBBdHRhY2hlZCBzY3NpIGdlbmVyaWMgc2cyIHR5cGUgMA0KPiA+IFsxNzAwNy40NzIzNzNdIHNk IDExOjA6MDowOiBbc2RiXSA4Mzg4NjA4MCA1MTItYnl0ZSBsb2dpY2FsIGJsb2NrczoNCj4gPiAo NDIuOSBHQi80MC4wIEdpQikgWzE3MDA3LjQ3MjQwNV0gc2QgMTE6MDowOjA6IFtzZGJdIFdyaXRl IFByb3RlY3QgaXMNCj4gPiBvZmYgWzE3MDA3LjQ3MjQwNl0gc2QgMTE6MDowOjA6IFtzZGJdIE1v ZGUgU2Vuc2U6IDQzIDAwIDAwIDA4DQo+ID4gWzE3MDA3LjQ3MjQ2Ml0gc2QgMTE6MDowOjA6IFtz ZGJdIFdyaXRlIGNhY2hlOiBkaXNhYmxlZCwgcmVhZCBjYWNoZToNCj4gPiBlbmFibGVkLCBkb2Vz bid0IHN1cHBvcnQgRFBPIG9yIEZVQSBbMTcwMDcuNDczNDEyXSBzZCAxMTowOjA6MDogW3NkYl0N Cj4gPiBBdHRhY2hlZCBTQ1NJIGRpc2sgWzE3MDA3LjQ3ODE4NF0gc2QgMTE6MDowOjA6IGFsdWE6 IHRyYW5zaXRpb24NCj4gPiB0aW1lb3V0IHNldCB0byA2MCBzZWNvbmRzIFsxNzAwNy40NzgxODZd IHNkIDExOjA6MDowOiBhbHVhOiBwb3J0IGdyb3VwIDAwDQo+IHN0YXRlIEEgbm9uLXByZWZlcnJl ZCBzdXBwb3J0cyBUT2xVU05BIFsxNzAzMS4yNjk4MjFdICBzZGI6DQo+ID4gWzE3MDMzLjM1OTc4 OV0gRVhUNC1mcyAoc2RiMSk6IG1vdW50ZWQgZmlsZXN5c3RlbSB3aXRoIG9yZGVyZWQgZGF0YQ0K PiA+IG1vZGUuIE9wdHM6IChudWxsKSBbMTcwNDkuMDU2MTU1XSAgY29ubmVjdGlvbjI6MDogcGlu ZyB0aW1lb3V0IG9mIDUNCj4gPiBzZWNzIGV4cGlyZWQsIHJlY3YgdGltZW91dCA1LCBsYXN0IHJ4 IDQzMTE3MDU5OTgsIGxhc3QgcGluZw0KPiA+IDQzMTE3MTEyMzIsIG5vdyA0MzExNzE2MzUyIFsx NzA0OS4wNTc0OTldICBjb25uZWN0aW9uMjowOiBkZXRlY3RlZA0KPiA+IGNvbm4gZXJyb3IgKDEw MjIpIFsxNzA0OS4wNTc1NThdIG1vZGlmeVFQIHRvIENMT1NJTkcgcXAgMw0KPiA+IG5leHRfaXdf c3RhdGUgMyBbLi5dDQo+ID4NCj4gPg0KPiA+IFRhcmdldA0KPiA+IC0tLS0tLS0tLS0NCj4gPiBb Li4uLl0NCj4gPiBbMTcwNjYuMzk3MTc5XSBpNDBpd19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2Rl IDAgcmV0X2VyciAtMTINCj4gPiBbMTcwNjYuMzk3MTgwXSBpc2VydDogaXNlcnRfcmRtYV9yd19j dHhfcG9zdDogQ21kOiBmZmZmODgxN2ZiOGVjMDIwDQo+ID4gZmFpbGVkIHRvIHBvc3QgUkRNQSBy ZXMgWzE3MDY2LjM5NzE4M10gaTQwaXdfcG9zdF9zZW5kOiBxcCAzIHdyX29wY29kZQ0KPiA+IDAg cmV0X2VyciAtMTIgWzE3MDY2LjM5NzE4M10gaXNlcnQ6IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6 IENtZDoNCj4gPiBmZmZmODgxN2ZiOGVhMWY4IGZhaWxlZCB0byBwb3N0IFJETUEgcmVzIFsxNzA2 Ni4zOTcxODRdDQo+ID4gaTQwaXdfcG9zdF9zZW5kOiBxcCAzIHdyX29wY29kZSAwIHJldF9lcnIg LTEyIFsxNzA2Ni4zOTcxODRdIGlzZXJ0Og0KPiA+IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENt ZDogZmZmZjg4MTdmYjhlOWJmMCBmYWlsZWQgdG8gcG9zdCBSRE1BIHJlcw0KPiA+IFsxNzA2Ni4z OTcxODddIGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNvZGUgMCByZXRfZXJyIC0xMg0KPiA+ IFsxNzA2Ni4zOTcxODhdIGlzZXJ0OiBpc2VydF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6IGZmZmY4 ODE3ZmI4ZWNjMzANCj4gPiBmYWlsZWQgdG8gcG9zdCBSRE1BIHJlcyBbMTcwNjYuMzk3MTkyXSBp NDBpd19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlDQo+ID4gMCByZXRfZXJyIC0xMiBbMTcwNjYu Mzk3MTkyXSBpc2VydDogaXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21kOg0KPiA+IGZmZmY4ODE3 ZmI4ZjIwYTAgZmFpbGVkIHRvIHBvc3QgUkRNQSByZXMgWzE3MDY2LjM5NzE5NV0NCj4gPiBpNDBp d19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2VyciAtMTIgWzE3MDY2LjM5NzE5Nl0g aXNlcnQ6DQo+ID4gaXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21kOiBmZmZmODgxN2ZiOGVhODAw IGZhaWxlZCB0byBwb3N0IFJETUEgcmVzDQo+ID4gWzE3MDY2LjM5NzE5Nl0gaTQwaXdfcG9zdF9z ZW5kOiBxcCAzIHdyX29wY29kZSAwIHJldF9lcnIgLTEyDQo+ID4gWzE3MDY2LjM5NzE5N10gaXNl cnQ6IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDogZmZmZjg4MTdmYjhlZGU0OA0KPiA+IGZh aWxlZCB0byBwb3N0IFJETUEgcmVzIFsxNzA2Ni4zOTcyMDBdIGk0MGl3X3Bvc3Rfc2VuZDogcXAg MyB3cl9vcGNvZGUNCj4gPiAwIHJldF9lcnIgLTEyIFsxNzA2Ni4zOTcyMDBdIGlzZXJ0OiBpc2Vy dF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6DQo+ID4gZmZmZjg4MTdmYjhlYzAyMCBmYWlsZWQgdG8g cG9zdCBSRE1BIHJlcyBbMTcwNjYuMzk3MjA0XQ0KPiA+IGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3 cl9vcGNvZGUgMCByZXRfZXJyIC0xMiBbMTcwNjYuMzk3MjA0XSBpc2VydDoNCj4gPiBpc2VydF9y ZG1hX3J3X2N0eF9wb3N0OiBDbWQ6IGZmZmY4ODE3ZmI4ZWExZjggZmFpbGVkIHRvIHBvc3QgUkRN QSByZXMNCj4gPiBbMTcwNjYuMzk3MjA2XSBpNDBpdyBpNDBpd19wcm9jZXNzX2FlcSBhZV9pZCA9 IDB4NTAzIGJvb2wgcXA9MSBxcF9pZCA9DQo+ID4gMyBbMTcwNjYuMzk3MjA3XSBpNDBpd19wb3N0 X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2VyciAtMTINCj4gPiBbMTcwNjYuMzk3MjA3XSBp c2VydDogaXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21kOiBmZmZmODgxN2ZiOGU5YmYwDQo+ID4g ZmFpbGVkIHRvIHBvc3QgUkRNQSByZXMgWzE3MDY2LjM5NzIxMV0gaTQwaXdfcG9zdF9zZW5kOiBx cCAzIHdyX29wY29kZQ0KPiA+IDAgcmV0X2VyciAtMTIgWzE3MDY2LjM5NzIxMV0gaXNlcnQ6IGlz ZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDoNCj4gPiBmZmZmODgxN2ZiOGVjYzMwIGZhaWxlZCB0 byBwb3N0IFJETUEgcmVzIFsxNzA2Ni4zOTcyMTVdDQo+ID4gaTQwaXdfcG9zdF9zZW5kOiBxcCAz IHdyX29wY29kZSAwIHJldF9lcnIgLTEyIFsxNzA2Ni4zOTcyMTVdIGlzZXJ0Og0KPiA+IGlzZXJ0 X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDogZmZmZjg4MTdmYjhmMjBhMCBmYWlsZWQgdG8gcG9zdCBS RE1BIHJlcw0KPiA+IFsxNzA2Ni4zOTcyMThdIGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNv ZGUgMCByZXRfZXJyIC0xMg0KPiA+IFsxNzA2Ni4zOTcyMTldIGlzZXJ0OiBpc2VydF9yZG1hX3J3 X2N0eF9wb3N0OiBDbWQ6IGZmZmY4ODE3ZmI4ZWE4MDANCj4gPiBmYWlsZWQgdG8gcG9zdCBSRE1B IHJlcyBbMTcwNjYuMzk3MjE5XSBpNDBpd19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlDQo+ID4g MCByZXRfZXJyIC0xMiBbMTcwNjYuMzk3MjIwXSBpc2VydDogaXNlcnRfcmRtYV9yd19jdHhfcG9z dDogQ21kOg0KPiA+IGZmZmY4ODE3ZmI4ZWRlNDggZmFpbGVkIHRvIHBvc3QgUkRNQSByZXMgWzE3 MDY2LjM5NzIzMl0NCj4gPiBpNDBpd19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2Vy ciAtMTIgWzE3MDY2LjM5NzIzM10gaXNlcnQ6DQo+ID4gaXNlcnRfcmRtYV9yd19jdHhfcG9zdDog Q21kOiBmZmZmODgxN2ZiOGVjMDIwIGZhaWxlZCB0byBwb3N0IFJETUEgcmVzDQo+ID4gWzE3MDY2 LjM5NzIzN10gaTQwaXdfcG9zdF9zZW5kOiBxcCAzIHdyX29wY29kZSAwIHJldF9lcnIgLTEyDQo+ ID4gWzE3MDY2LjM5NzIzN10gaXNlcnQ6IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDogZmZm Zjg4MTdmYjhlYTFmOA0KPiA+IGZhaWxlZCB0byBwb3N0IFJETUEgcmVzIFsxNzA2Ni4zOTcyMzhd IGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNvZGUNCj4gPiAwIHJldF9lcnIgLTEyIFsxNzA2 Ni4zOTcyMzhdIGlzZXJ0OiBpc2VydF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6DQo+ID4gZmZmZjg4 MTdmYjhlOWJmMCBmYWlsZWQgdG8gcG9zdCBSRE1BIHJlcyBbMTcwNjYuMzk3MjQyXQ0KPiA+IGk0 MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNvZGUgMCByZXRfZXJyIC0xMiBbMTcwNjYuMzk3MjQy XSBpc2VydDoNCj4gPiBpc2VydF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6IGZmZmY4ODE3ZmI4ZWNj MzAgZmFpbGVkIHRvIHBvc3QgUkRNQSByZXMNCj4gPiBbMTcwNjYuMzk3MjQ1XSBpNDBpd19wb3N0 X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2VyciAtMTINCj4gPiBbMTcwNjYuMzk3MjQ3XSBp NDBpdyBpNDBpd19wcm9jZXNzX2FlcSBhZV9pZCA9IDB4NTAxIGJvb2wgcXA9MSBxcF9pZCA9DQo+ ID4gMyBbMTcwNjYuMzk3MjQ3XSBpc2VydDogaXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21kOiBm ZmZmODgxN2ZiOGYyMGEwDQo+ID4gZmFpbGVkIHRvIHBvc3QgUkRNQSByZXMgWzE3MDY2LjM5NzI1 MV0gUVAgMyBmbHVzaF9pc3N1ZWQNCj4gPiBbMTcwNjYuMzk3MjUyXSBpNDBpd19wb3N0X3NlbmQ6 IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2VyciAtMjINCj4gPiBbMTcwNjYuMzk3MjUyXSBpc2VydDog aXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21kOiBmZmZmODgxN2ZiOGVhODAwDQo+ID4gZmFpbGVk IHRvIHBvc3QgUkRNQSByZXMgWzE3MDY2LjM5NzI1M10gR290IHVua25vd24gZmFicmljIHF1ZXVl DQo+ID4gc3RhdHVzOiAtMjIgWzE3MDY2LjM5NzI1NF0gUVAgMyBmbHVzaF9pc3N1ZWQgWzE3MDY2 LjM5NzI1NF0NCj4gPiBpNDBpd19wb3N0X3NlbmQ6IHFwIDMgd3Jfb3Bjb2RlIDAgcmV0X2VyciAt MjIgWzE3MDY2LjM5NzI1NF0gaXNlcnQ6DQo+ID4gaXNlcnRfcmRtYV9yd19jdHhfcG9zdDogQ21k OiBmZmZmODgxN2ZiOGVkZTQ4IGZhaWxlZCB0byBwb3N0IFJETUEgcmVzDQo+ID4gWzE3MDY2LjM5 NzI1NV0gR290IHVua25vd24gZmFicmljIHF1ZXVlIHN0YXR1czogLTIyIFsxNzA2Ni4zOTcyNThd IFFQDQo+ID4gMyBmbHVzaF9pc3N1ZWQgWzE3MDY2LjM5NzI1OF0gaTQwaXdfcG9zdF9zZW5kOiBx cCAzIHdyX29wY29kZSAwDQo+ID4gcmV0X2VyciAtMjIgWzE3MDY2LjM5NzI1OV0gaXNlcnQ6IGlz ZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDoNCj4gPiBmZmZmODgxN2ZiOGVjMDIwIGZhaWxlZCB0 byBwb3N0IFJETUEgcmVzIFsxNzA2Ni4zOTcyNTldIEdvdCB1bmtub3duDQo+ID4gZmFicmljIHF1 ZXVlIHN0YXR1czogLTIyIFsxNzA2Ni4zOTcyNjddIFFQIDMgZmx1c2hfaXNzdWVkDQo+ID4gWzE3 MDY2LjM5NzI2N10gaTQwaXdfcG9zdF9zZW5kOiBxcCAzIHdyX29wY29kZSAwIHJldF9lcnIgLTIy DQo+ID4gWzE3MDY2LjM5NzI2OF0gaXNlcnQ6IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDog ZmZmZjg4MTdmYjhlYTFmOA0KPiA+IGZhaWxlZCB0byBwb3N0IFJETUEgcmVzIFsxNzA2Ni4zOTcy NjhdIEdvdCB1bmtub3duIGZhYnJpYyBxdWV1ZQ0KPiA+IHN0YXR1czogLTIyIFsxNzA2Ni4zOTcy ODddIFFQIDMgZmx1c2hfaXNzdWVkIFsxNzA2Ni4zOTcyODddDQo+ID4gaTQwaXdfcG9zdF9zZW5k OiBxcCAzIHdyX29wY29kZSAwIHJldF9lcnIgLTIyIFsxNzA2Ni4zOTcyODhdIGlzZXJ0Og0KPiA+ IGlzZXJ0X3JkbWFfcndfY3R4X3Bvc3Q6IENtZDogZmZmZjg4MTdmYjhlOWJmMCBmYWlsZWQgdG8g cG9zdCBSRE1BIHJlcw0KPiA+IFsxNzA2Ni4zOTcyODhdIEdvdCB1bmtub3duIGZhYnJpYyBxdWV1 ZSBzdGF0dXM6IC0yMiBbMTcwNjYuMzk3MjkxXSBRUA0KPiA+IDMgZmx1c2hfaXNzdWVkIFsxNzA2 Ni4zOTcyOTJdIGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNvZGUgMA0KPiA+IHJldF9lcnIg LTIyIFsxNzA2Ni4zOTcyOTJdIGlzZXJ0OiBpc2VydF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6DQo+ ID4gZmZmZjg4MTdmYjhlY2MzMCBmYWlsZWQgdG8gcG9zdCBSRE1BIHJlcyBbMTcwNjYuMzk3Mjky XSBHb3QgdW5rbm93bg0KPiA+IGZhYnJpYyBxdWV1ZSBzdGF0dXM6IC0yMiBbMTcwNjYuMzk3Mjk1 XSBRUCAzIGZsdXNoX2lzc3VlZA0KPiA+IFsxNzA2Ni4zOTcyOTZdIGk0MGl3X3Bvc3Rfc2VuZDog cXAgMyB3cl9vcGNvZGUgMCByZXRfZXJyIC0yMg0KPiA+IFsxNzA2Ni4zOTcyOTZdIGlzZXJ0OiBp c2VydF9yZG1hX3J3X2N0eF9wb3N0OiBDbWQ6IGZmZmY4ODE3ZmI4ZjIwYTANCj4gPiBmYWlsZWQg dG8gcG9zdCBSRE1BIHJlcyBbMTcwNjYuMzk3Mjk3XSBHb3QgdW5rbm93biBmYWJyaWMgcXVldWUN Cj4gPiBzdGF0dXM6IC0yMiBbMTcwNjYuMzk3MzA3XSBRUCAzIGZsdXNoX2lzc3VlZCBbMTcwNjYu Mzk3MzA3XQ0KPiA+IGk0MGl3X3Bvc3Rfc2VuZDogcXAgMyB3cl9vcGNvZGUgOCByZXRfZXJyIC0y MiBbMTcwNjYuMzk3MzA4XSBpc2VydDoNCj4gPiBpc2VydF9wb3N0X3Jlc3BvbnNlOiBpYl9wb3N0 X3NlbmQgZmFpbGVkIHdpdGggLTIyIFsxNzA2Ni4zOTczMDldIGk0MGl3DQo+ID4gaTQwaXdfcXBf ZGlzY29ubmVjdCBDYWxsIGNsb3NlIEFQSSBbLi4uLl0NCj4gPg0KPiA+IFNoaXJheg0KPiA+IC0t DQo+ID4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vi c2NyaWJlDQo+ID4gdGFyZ2V0LWRldmVsIiBpbiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFq b3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6 Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPiANCj4gDQo+IC0tDQo+IFRv IHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBs aW51eC1yZG1hIiBpbiB0aGUNCj4gYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIu a2VybmVsLm9yZyBNb3JlIG1ham9yZG9tbyBpbmZvIGF0DQo+IGh0dHA6Ly92Z2VyLmtlcm5lbC5v cmcvbWFqb3Jkb21vLWluZm8uaHRtbA0K ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2018-01-15 15:22 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-15 15:22 UTC (permalink / raw) To: Kalderon, Michal, Nicholas A. Bellinger Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger > > Sent: Monday, January 15, 2018 6:57 AM > > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel > > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; > > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > target queue-full memory leak: > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > Just curious how frequent this happens in practice with sustained large block > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > qedr and iw_cxgb4)..? > > > > Is there anything else from an iser-target consumer level that should be > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > Issue when running RoCE. It's not iWARP related. > This is easily reproduced within seconds with IO size of 5121K > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > IO Command used: > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > thanks, > Michal Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; rather quickly, in a matter of seconds. fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb Shiraz > > > On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote: > > > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > > > > Hi Nicholas, > > > > > > > > > Just to confirm, the following four patches where required to get > > > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly > > > > > small number of hw SGEs: > > > > > > > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f > > > > > iser-target: Fix queue-full response handling > > > > > a446701 iscsi-target: Propigate queue_data_in + queue_status > > > > > errors fa7e25c target: Fix unknown fabric callback queue-full > > > > > errors > > > > > > > > > > So Did you test with Q-Logic/Cavium with RoCE using these four > > > > > patches, or just with commit a4467018..? > > > > > > > > > > Note these have not been CC'ed to stable yet, as I was reluctant > > > > > since they didn't have much mileage on them at the time.. > > > > > > > > > > Now however, they should be OK to consider for stable, especially > > > > > if they get you unblocked as well. > > > > > > > > The issue is still seen with these four patches. > > > > > > > > Thanks, > > > > Ram > > > > > > Hi, > > > > > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ > > > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel. > > > > > > Eventually there is a timeout/conn-error on iser initiator and the > > connection is torn down. > > > > > > The aforementioned patches dont seem to be alleviating the SQ overflow > > issue? > > > > > > Initiator > > > ------------ > > > > > > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295] > > > iscsi: invalid can_queue of 55. can_queue must be a power of 2. > > > [17007.466924] iscsi: Rounding can_queue to 32. > > > [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 > > PQ: 0 ANSI: 5 > > > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit > > > TPGS [17007.471656] scsi 11:0:0:0: alua: device > > > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 > > > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 > > > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: > > > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is > > > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 > > > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: > > > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb] > > > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition > > > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00 > > state A non-preferred supports TOlUSNA [17031.269821] sdb: > > > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data > > > mode. Opts: (null) [17049.056155] connection2:0: ping timeout of 5 > > > secs expired, recv timeout 5, last rx 4311705998, last ping > > > 4311711232, now 4311716352 [17049.057499] connection2:0: detected > > > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3 > > > next_iw_state 3 [..] > > > > > > > > > Target > > > ---------- > > > [....] > > > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 > > > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 > > > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8f20a0 failed to post RDMA res [17066.397195] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > > > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 > > > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ec020 failed to post RDMA res [17066.397204] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > > > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = > > > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 > > > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ecc30 failed to post RDMA res [17066.397215] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > > > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ede48 failed to post RDMA res [17066.397232] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > > > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > > > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = > > > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > > failed to post RDMA res [17066.397251] QP 3 flush_issued > > > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > > failed to post RDMA res [17066.397253] Got unknown fabric queue > > > status: -22 [17066.397254] QP 3 flush_issued [17066.397254] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > > > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP > > > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 > > > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown > > > fabric queue status: -22 [17066.397267] QP 3 flush_issued > > > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > > failed to post RDMA res [17066.397268] Got unknown fabric queue > > > status: -22 [17066.397287] QP 3 flush_issued [17066.397287] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP > > > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 > > > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown > > > fabric queue status: -22 [17066.397295] QP 3 flush_issued > > > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > > failed to post RDMA res [17066.397297] Got unknown fabric queue > > > status: -22 [17066.397307] QP 3 flush_issued [17066.397307] > > > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert: > > > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw > > > i40iw_qp_disconnect Call close API [....] > > > > > > Shiraz > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > target-devel" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > > body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at > > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-15 15:22 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-15 15:22 UTC (permalink / raw) To: Kalderon, Michal, Nicholas A. Bellinger Cc: Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > Sent: Monday, January 15, 2018 6:57 AM > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > Potnuri Bharat Teja <bharat@chelsio.com> > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > target queue-full memory leak: > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > Just curious how frequent this happens in practice with sustained large block > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > qedr and iw_cxgb4)..? > > > > Is there anything else from an iser-target consumer level that should be > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > Issue when running RoCE. It's not iWARP related. > This is easily reproduced within seconds with IO size of 5121K > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > IO Command used: > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > thanks, > Michal Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; rather quickly, in a matter of seconds. fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 --size g --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb Shiraz > > > On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote: > > > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote: > > > > Hi Nicholas, > > > > > > > > > Just to confirm, the following four patches where required to get > > > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly > > > > > small number of hw SGEs: > > > > > > > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f > > > > > iser-target: Fix queue-full response handling > > > > > a446701 iscsi-target: Propigate queue_data_in + queue_status > > > > > errors fa7e25c target: Fix unknown fabric callback queue-full > > > > > errors > > > > > > > > > > So Did you test with Q-Logic/Cavium with RoCE using these four > > > > > patches, or just with commit a4467018..? > > > > > > > > > > Note these have not been CC'ed to stable yet, as I was reluctant > > > > > since they didn't have much mileage on them at the time.. > > > > > > > > > > Now however, they should be OK to consider for stable, especially > > > > > if they get you unblocked as well. > > > > > > > > The issue is still seen with these four patches. > > > > > > > > Thanks, > > > > Ram > > > > > > Hi, > > > > > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ > > > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel. > > > > > > Eventually there is a timeout/conn-error on iser initiator and the > > connection is torn down. > > > > > > The aforementioned patches dont seem to be alleviating the SQ overflow > > issue? > > > > > > Initiator > > > ------------ > > > > > > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295] > > > iscsi: invalid can_queue of 55. can_queue must be a power of 2. > > > [17007.466924] iscsi: Rounding can_queue to 32. > > > [17007.471535] scsi 11:0:0:0: Direct-Access LIO-ORG ramdisk1_40G 4.0 > > PQ: 0 ANSI: 5 > > > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit > > > TPGS [17007.471656] scsi 11:0:0:0: alua: device > > > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1 > > > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0 > > > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks: > > > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is > > > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08 > > > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache: > > > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb] > > > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition > > > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00 > > state A non-preferred supports TOlUSNA [17031.269821] sdb: > > > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data > > > mode. Opts: (null) [17049.056155] connection2:0: ping timeout of 5 > > > secs expired, recv timeout 5, last rx 4311705998, last ping > > > 4311711232, now 4311716352 [17049.057499] connection2:0: detected > > > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3 > > > next_iw_state 3 [..] > > > > > > > > > Target > > > ---------- > > > [....] > > > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 > > > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 > > > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8f20a0 failed to post RDMA res [17066.397195] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res > > > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 > > > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ec020 failed to post RDMA res [17066.397204] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res > > > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id > > > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 > > > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ecc30 failed to post RDMA res [17066.397215] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res > > > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ede48 failed to post RDMA res [17066.397232] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res > > > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode > > > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res > > > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 > > > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id > > > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > > failed to post RDMA res [17066.397251] QP 3 flush_issued > > > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 > > > failed to post RDMA res [17066.397253] Got unknown fabric queue > > > status: -22 [17066.397254] QP 3 flush_issued [17066.397254] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res > > > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP > > > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0 > > > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown > > > fabric queue status: -22 [17066.397267] QP 3 flush_issued > > > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 > > > failed to post RDMA res [17066.397268] Got unknown fabric queue > > > status: -22 [17066.397287] QP 3 flush_issued [17066.397287] > > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert: > > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res > > > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP > > > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0 > > > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd: > > > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown > > > fabric queue status: -22 [17066.397295] QP 3 flush_issued > > > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 > > > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 > > > failed to post RDMA res [17066.397297] Got unknown fabric queue > > > status: -22 [17066.397307] QP 3 flush_issued [17066.397307] > > > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert: > > > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw > > > i40iw_qp_disconnect Call close API [....] > > > > > > Shiraz > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > target-devel" in the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > > body of a message to majordomo@vger.kernel.org More majordomo info at > > http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-15 15:22 ` Shiraz Saleem @ 2018-01-18 9:58 ` Nicholas A. Bellinger -1 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-18 9:58 UTC (permalink / raw) To: Shiraz Saleem Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz, Michal & Co, Thanks for the feedback. Comments below. On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > Sent: Monday, January 15, 2018 6:57 AM > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > target queue-full memory leak: > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > Just curious how frequent this happens in practice with sustained large block > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > qedr and iw_cxgb4)..? > > > > > > Is there anything else from an iser-target consumer level that should be > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > Issue when running RoCE. It's not iWARP related. > > This is easily reproduced within seconds with IO size of 5121K > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > IO Command used: > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > thanks, > > Michal > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > rather quickly, in a matter of seconds. > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > A couple of thoughts. First, would it be helpful to limit maximum payload size per I/O for consumers based on number of iser-target sq hw sges..? That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to maximum payload size per I/O being too large there is an existing target_core_fabric_ops mechanism for limiting using SCSI residuals, originally utilized by qla2xxx here: target/qla2xxx: Honor max_data_sg_nents I/O transfer limit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e Note this patch also will return a smaller Block Limits VPD (0x86) MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will automatically limit maximum outgoing payload transfer length, and avoid SCSI residual logic. As-is, iser-target doesn't a propagate max_data_sg_ents limit into iscsi-target, but you can try testing with a smaller value to see if it's useful. Eg: diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf index 0ebc481..d8a4cc5 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) .module = THIS_MODULE, .name = "iscsi", .node_acl_size = sizeof(struct iscsi_node_acl), + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ .get_fabric_name = iscsi_get_fabric_name, .tpg_get_wwn = lio_tpg_get_endpoint_wwn, .tpg_get_tag = lio_tpg_get_tag, Second, if the failures are not SCSI transfer length specific, another option would be to limit the total command sequence number depth (CmdSN) per session. This is controlled at runtime by default_cmdsn_depth TPG attribute: /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth and on per initiator context with cmdsn_depth NodeACL attribute: /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth Note these default to 64, and can be changed at build time via include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH. That said, Sagi, any further comments as what else iser-target should be doing to avoid repeated queue-fulls with limited hw sges..? ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-18 9:58 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-18 9:58 UTC (permalink / raw) To: Shiraz Saleem Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz, Michal & Co, Thanks for the feedback. Comments below. On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > Sent: Monday, January 15, 2018 6:57 AM > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > target queue-full memory leak: > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > Just curious how frequent this happens in practice with sustained large block > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > qedr and iw_cxgb4)..? > > > > > > Is there anything else from an iser-target consumer level that should be > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > Issue when running RoCE. It's not iWARP related. > > This is easily reproduced within seconds with IO size of 5121K > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > IO Command used: > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > thanks, > > Michal > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > rather quickly, in a matter of seconds. > > fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 --size g --loops=1 --ioengine=libaio > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > A couple of thoughts. First, would it be helpful to limit maximum payload size per I/O for consumers based on number of iser-target sq hw sges..? That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to maximum payload size per I/O being too large there is an existing target_core_fabric_ops mechanism for limiting using SCSI residuals, originally utilized by qla2xxx here: target/qla2xxx: Honor max_data_sg_nents I/O transfer limit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id¸9b565482c537821588444e09ff732c7d65ed6e Note this patch also will return a smaller Block Limits VPD (0x86) MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will automatically limit maximum outgoing payload transfer length, and avoid SCSI residual logic. As-is, iser-target doesn't a propagate max_data_sg_ents limit into iscsi-target, but you can try testing with a smaller value to see if it's useful. Eg: diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf index 0ebc481..d8a4cc5 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) .module = THIS_MODULE, .name = "iscsi", .node_acl_size = sizeof(struct iscsi_node_acl), + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ .get_fabric_name = iscsi_get_fabric_name, .tpg_get_wwn = lio_tpg_get_endpoint_wwn, .tpg_get_tag = lio_tpg_get_tag, Second, if the failures are not SCSI transfer length specific, another option would be to limit the total command sequence number depth (CmdSN) per session. This is controlled at runtime by default_cmdsn_depth TPG attribute: /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth and on per initiator context with cmdsn_depth NodeACL attribute: /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth Note these default to 64, and can be changed at build time via include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH. That said, Sagi, any further comments as what else iser-target should be doing to avoid repeated queue-fulls with limited hw sges..? -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-18 9:58 ` Nicholas A. Bellinger @ 2018-01-18 17:53 ` Potnuri Bharat Teja -1 siblings, 0 replies; 44+ messages in thread From: Potnuri Bharat Teja @ 2018-01-18 17:53 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel Hi Nicholas, thanks for the suggestions. Comments below. On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > Hi Shiraz, Michal & Co, > > Thanks for the feedback. Comments below. > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > target queue-full memory leak: > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > qedr and iw_cxgb4)..? > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > Issue when running RoCE. It's not iWARP related. > > > This is easily reproduced within seconds with IO size of 5121K > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > IO Command used: > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > thanks, > > > Michal > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > rather quickly, in a matter of seconds. > > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > A couple of thoughts. > > First, would it be helpful to limit maximum payload size per I/O for > consumers based on number of iser-target sq hw sges..? yes, I think HW num sge needs to be propagated to iscsi target. > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > maximum payload size per I/O being too large there is an existing Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > target_core_fabric_ops mechanism for limiting using SCSI residuals, > originally utilized by qla2xxx here: > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e > > Note this patch also will return a smaller Block Limits VPD (0x86) > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > automatically limit maximum outgoing payload transfer length, and avoid > SCSI residual logic. > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > iscsi-target, but you can try testing with a smaller value to see if > it's useful. Eg: > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > index 0ebc481..d8a4cc5 100644 > --- a/drivers/target/iscsi/iscsi_target_configfs.c > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > .module = THIS_MODULE, > .name = "iscsi", > .node_acl_size = sizeof(struct iscsi_node_acl), > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > .get_fabric_name = iscsi_get_fabric_name, > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > .tpg_get_tag = lio_tpg_get_tag, > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. > Second, if the failures are not SCSI transfer length specific, another > option would be to limit the total command sequence number depth (CmdSN) > per session. > > This is controlled at runtime by default_cmdsn_depth TPG attribute: > > /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth > > and on per initiator context with cmdsn_depth NodeACL attribute: > > /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth > > Note these default to 64, and can be changed at build time via > include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH. > > That said, Sagi, any further comments as what else iser-target should be > doing to avoid repeated queue-fulls with limited hw sges..? > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-18 17:53 ` Potnuri Bharat Teja 0 siblings, 0 replies; 44+ messages in thread From: Potnuri Bharat Teja @ 2018-01-18 17:53 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel Hi Nicholas, thanks for the suggestions. Comments below. On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > Hi Shiraz, Michal & Co, > > Thanks for the feedback. Comments below. > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > target queue-full memory leak: > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > qedr and iw_cxgb4)..? > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > Issue when running RoCE. It's not iWARP related. > > > This is easily reproduced within seconds with IO size of 5121K > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > IO Command used: > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > thanks, > > > Michal > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > rather quickly, in a matter of seconds. > > > > fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 --size g --loops=1 --ioengine=libaio > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > A couple of thoughts. > > First, would it be helpful to limit maximum payload size per I/O for > consumers based on number of iser-target sq hw sges..? yes, I think HW num sge needs to be propagated to iscsi target. > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > maximum payload size per I/O being too large there is an existing Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > target_core_fabric_ops mechanism for limiting using SCSI residuals, > originally utilized by qla2xxx here: > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id¸9b565482c537821588444e09ff732c7d65ed6e > > Note this patch also will return a smaller Block Limits VPD (0x86) > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > automatically limit maximum outgoing payload transfer length, and avoid > SCSI residual logic. > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > iscsi-target, but you can try testing with a smaller value to see if > it's useful. Eg: > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > index 0ebc481..d8a4cc5 100644 > --- a/drivers/target/iscsi/iscsi_target_configfs.c > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > .module = THIS_MODULE, > .name = "iscsi", > .node_acl_size = sizeof(struct iscsi_node_acl), > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > .get_fabric_name = iscsi_get_fabric_name, > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > .tpg_get_tag = lio_tpg_get_tag, > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. > Second, if the failures are not SCSI transfer length specific, another > option would be to limit the total command sequence number depth (CmdSN) > per session. > > This is controlled at runtime by default_cmdsn_depth TPG attribute: > > /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/attrib/default_cmdsn_depth > > and on per initiator context with cmdsn_depth NodeACL attribute: > > /sys/kernel/config/target/iscsi/$TARGET_IQN/$TPG/acls/$ACL_IQN/cmdsn_depth > > Note these default to 64, and can be changed at build time via > include/target/iscsi/iscsi_target_core.h:TA_DEFAULT_CMDSN_DEPTH. > > That said, Sagi, any further comments as what else iser-target should be > doing to avoid repeated queue-fulls with limited hw sges..? > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> @ 2018-01-24 7:25 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 7:25 UTC (permalink / raw) To: Potnuri Bharat Teja Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel Hi Potnuri & Co, On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > Hi Nicholas, > thanks for the suggestions. Comments below. > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > > Hi Shiraz, Michal & Co, > > > > Thanks for the feedback. Comments below. > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > > > > > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > > > > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg > > > > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel > > > > > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; > > > > > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > > target queue-full memory leak: > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > > Issue when running RoCE. It's not iWARP related. > > > > This is easily reproduced within seconds with IO size of 5121K > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > > > IO Command used: > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > thanks, > > > > Michal > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > > rather quickly, in a matter of seconds. > > > > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > > > > A couple of thoughts. > > > > First, would it be helpful to limit maximum payload size per I/O for > > consumers based on number of iser-target sq hw sges..? > yes, I think HW num sge needs to be propagated to iscsi target. > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > maximum payload size per I/O being too large there is an existing > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. Thanks for confirming. > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > originally utilized by qla2xxx here: > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > > automatically limit maximum outgoing payload transfer length, and avoid > > SCSI residual logic. > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > iscsi-target, but you can try testing with a smaller value to see if > > it's useful. Eg: > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > > index 0ebc481..d8a4cc5 100644 > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > .module = THIS_MODULE, > > .name = "iscsi", > > .node_acl_size = sizeof(struct iscsi_node_acl), > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > > .get_fabric_name = iscsi_get_fabric_name, > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > .tpg_get_tag = lio_tpg_get_tag, > > > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages) avoids SQ overflow with iw_cxgb4. What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 7:25 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 7:25 UTC (permalink / raw) To: Potnuri Bharat Teja Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel Hi Potnuri & Co, On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > Hi Nicholas, > thanks for the suggestions. Comments below. > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > > Hi Shiraz, Michal & Co, > > > > Thanks for the feedback. Comments below. > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > > target queue-full memory leak: > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > > Issue when running RoCE. It's not iWARP related. > > > > This is easily reproduced within seconds with IO size of 5121K > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > > > IO Command used: > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > thanks, > > > > Michal > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > > rather quickly, in a matter of seconds. > > > > > > fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 --size g --loops=1 --ioengine=libaio > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > > > > A couple of thoughts. > > > > First, would it be helpful to limit maximum payload size per I/O for > > consumers based on number of iser-target sq hw sges..? > yes, I think HW num sge needs to be propagated to iscsi target. > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > maximum payload size per I/O being too large there is an existing > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. Thanks for confirming. > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > originally utilized by qla2xxx here: > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id¸9b565482c537821588444e09ff732c7d65ed6e > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > > automatically limit maximum outgoing payload transfer length, and avoid > > SCSI residual logic. > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > iscsi-target, but you can try testing with a smaller value to see if > > it's useful. Eg: > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > > index 0ebc481..d8a4cc5 100644 > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > .module = THIS_MODULE, > > .name = "iscsi", > > .node_acl_size = sizeof(struct iscsi_node_acl), > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > > .get_fabric_name = iscsi_get_fabric_name, > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > .tpg_get_tag = lio_tpg_get_tag, > > > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. OK, so max_data_sg_nents2 (MAXIMUM TRANSFER SIZE\x128K with 4k pages) avoids SQ overflow with iw_cxgb4. What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-24 7:25 ` Nicholas A. Bellinger @ 2018-01-24 12:33 ` Potnuri Bharat Teja -1 siblings, 0 replies; 44+ messages in thread From: Potnuri Bharat Teja @ 2018-01-24 12:21 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel On Wednesday, January 01/24/18, 2018 at 12:55:17 +0530, Nicholas A. Bellinger wrote: > Hi Potnuri & Co, > > On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > > Hi Nicholas, > > thanks for the suggestions. Comments below. > > > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > > > Hi Shiraz, Michal & Co, > > > > > > Thanks for the feedback. Comments below. > > > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > > > target queue-full memory leak: > > > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > > > Issue when running RoCE. It's not iWARP related. > > > > > This is easily reproduced within seconds with IO size of 5121K > > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > > > > > IO Command used: > > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > > > thanks, > > > > > Michal > > > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > > > rather quickly, in a matter of seconds. > > > > > > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio > > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > > > > > > > A couple of thoughts. > > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > consumers based on number of iser-target sq hw sges..? > > yes, I think HW num sge needs to be propagated to iscsi target. > > > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > > maximum payload size per I/O being too large there is an existing > > > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > > Thanks for confirming. > > > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > > originally utilized by qla2xxx here: > > > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f9b565482c537821588444e09ff732c7d65ed6e > > > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > > > automatically limit maximum outgoing payload transfer length, and avoid > > > SCSI residual logic. > > > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > > iscsi-target, but you can try testing with a smaller value to see if > > > it's useful. Eg: > > > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > > > index 0ebc481..d8a4cc5 100644 > > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > > .module = THIS_MODULE, > > > .name = "iscsi", > > > .node_acl_size = sizeof(struct iscsi_node_acl), > > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > > > .get_fabric_name = iscsi_get_fabric_name, > > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > > .tpg_get_tag = lio_tpg_get_tag, > > > > > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. > > OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages) > avoids SQ overflow with iw_cxgb4. > > What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? max_send_sge is 4 for iw_cxgb4 > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 12:33 ` Potnuri Bharat Teja 0 siblings, 0 replies; 44+ messages in thread From: Potnuri Bharat Teja @ 2018-01-24 12:33 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel On Wednesday, January 01/24/18, 2018 at 12:55:17 +0530, Nicholas A. Bellinger wrote: > Hi Potnuri & Co, > > On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > > Hi Nicholas, > > thanks for the suggestions. Comments below. > > > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger wrote: > > > Hi Shiraz, Michal & Co, > > > > > > Thanks for the feedback. Comments below. > > > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > > > <Ariel.Elior@cavium.com>; target-devel <target-devel@vger.kernel.org>; > > > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a iser- > > > > > > target queue-full memory leak: > > > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > > > Just curious how frequent this happens in practice with sustained large block > > > > > > workloads, as it appears to effect at least three different iwarp RNICS (i40iw, > > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, we've hit this > > > > > Issue when running RoCE. It's not iWARP related. > > > > > This is easily reproduced within seconds with IO size of 5121K > > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each. > > > > > > > > > > IO Command used: > > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > > > thanks, > > > > > Michal > > > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report; > > > > rather quickly, in a matter of seconds. > > > > > > > > fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 --size g --loops=1 --ioengine=libaio > > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb > > > > > > > > > > A couple of thoughts. > > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > consumers based on number of iser-target sq hw sges..? > > yes, I think HW num sge needs to be propagated to iscsi target. > > > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > > maximum payload size per I/O being too large there is an existing > > > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > > Thanks for confirming. > > > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > > originally utilized by qla2xxx here: > > > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id¸9b565482c537821588444e09ff732c7d65ed6e > > > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, which > > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH will > > > automatically limit maximum outgoing payload transfer length, and avoid > > > SCSI residual logic. > > > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > > iscsi-target, but you can try testing with a smaller value to see if > > > it's useful. Eg: > > > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configf > > > index 0ebc481..d8a4cc5 100644 > > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > > .module = THIS_MODULE, > > > .name = "iscsi", > > > .node_acl_size = sizeof(struct iscsi_node_acl), > > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM TRANSFER LENGTH */ > > > .get_fabric_name = iscsi_get_fabric_name, > > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > > .tpg_get_tag = lio_tpg_get_tag, > > > > > With above change, SQ overflow isn't observed. I started of with max_data_sg_nents = 16. > > OK, so max_data_sg_nents2 (MAXIMUM TRANSFER SIZE\x128K with 4k pages) > avoids SQ overflow with iw_cxgb4. > > What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? max_send_sge is 4 for iw_cxgb4 > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1516778717.24576.319.came l@haakon3.daterainc.com>]
[parent not found: <1516778717.24576.319.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>]
* RE: SQ overflow seen running isert traffic with high block sizes [not found] ` <1516778717.24576.319.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> @ 2018-01-24 16:03 ` Steve Wise 0 siblings, 0 replies; 44+ messages in thread From: Steve Wise @ 2018-01-24 16:03 UTC (permalink / raw) To: 'Nicholas A. Bellinger', 'Potnuri Bharat Teja' Cc: 'Shiraz Saleem', 'Kalderon, Michal', 'Amrani, Ram', 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Elior, Ariel', 'target-devel' Hey all, > > Hi Potnuri & Co, > > On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > > Hi Nicholas, > > thanks for the suggestions. Comments below. > > > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger > wrote: > > > Hi Shiraz, Michal & Co, > > > > > > Thanks for the feedback. Comments below. > > > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- > > > > > > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Nicholas A. Bellinger > > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > > To: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > > > > > Cc: Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; Sagi Grimberg > > > > > > <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Elior, Ariel > > > > > > <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>; target-devel <target- > devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; > > > > > > Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> > > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a > iser- > > > > > > target queue-full memory leak: > > > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > > > Just curious how frequent this happens in practice with sustained large > block > > > > > > workloads, as it appears to effect at least three different iwarp RNICS > (i40iw, > > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, > we've hit this > > > > > Issue when running RoCE. It's not iWARP related. > > > > > This is easily reproduced within seconds with IO size of 5121K > > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks > each. > > > > > > > > > > IO Command used: > > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > > > thanks, > > > > > Michal > > > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And > similar to Michals report; > > > > rather quickly, in a matter of seconds. > > > > > > > > fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 -- > size=20g --loops=1 --ioengine=libaio > > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall -- > filename=/dev/sdb --name=sdb > > > > > > > > > > A couple of thoughts. > > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > consumers based on number of iser-target sq hw sges..? > > yes, I think HW num sge needs to be propagated to iscsi target. > > > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > > maximum payload size per I/O being too large there is an existing > > > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > > Thanks for confirming. > > > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > > originally utilized by qla2xxx here: > > > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8 > f9b565482c537821588444e09ff732c7d65ed6e > > > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, > which > > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH > will > > > automatically limit maximum outgoing payload transfer length, and avoid > > > SCSI residual logic. > > > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > > iscsi-target, but you can try testing with a smaller value to see if > > > it's useful. Eg: > > > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > b/drivers/target/iscsi/iscsi_target_configf > > > index 0ebc481..d8a4cc5 100644 > > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd > *se_cmd) > > > .module = THIS_MODULE, > > > .name = "iscsi", > > > .node_acl_size = sizeof(struct iscsi_node_acl), > > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM > TRANSFER LENGTH */ > > > .get_fabric_name = iscsi_get_fabric_name, > > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > > .tpg_get_tag = lio_tpg_get_tag, > > > > > With above change, SQ overflow isn't observed. I started of with > max_data_sg_nents = 16. > > OK, so max_data_sg_nents=32 (MAXIMUM TRANSFER SIZE=128K with 4k pages) > avoids SQ overflow with iw_cxgb4. > > What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? The ib_device attributes only advertise max_sge, which applies to both send and recv queues. Because the iw_cxgb4 RQ max sge is 4, iw_cxgb4 currently advertises 4 for max_sge, even though the SQ can handle more. So attr.cap.max_send_sge ends up being 4. I have a todo on my list to extend the rdma/core to have max_send_sge and max_recv_sge attributes. If it helps, I can bump up the priority of this. Perhaps Bharat could do the work. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 16:03 ` Steve Wise 0 siblings, 0 replies; 44+ messages in thread From: Steve Wise @ 2018-01-24 16:03 UTC (permalink / raw) To: 'Nicholas A. Bellinger', 'Potnuri Bharat Teja' Cc: 'Shiraz Saleem', 'Kalderon, Michal', 'Amrani, Ram', 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Elior, Ariel', 'target-devel' Hey all, > > Hi Potnuri & Co, > > On Thu, 2018-01-18 at 23:23 +0530, Potnuri Bharat Teja wrote: > > Hi Nicholas, > > thanks for the suggestions. Comments below. > > > > On Thursday, January 01/18/18, 2018 at 15:28:42 +0530, Nicholas A. Bellinger > wrote: > > > Hi Shiraz, Michal & Co, > > > > > > Thanks for the feedback. Comments below. > > > > > > On Mon, 2018-01-15 at 09:22 -0600, Shiraz Saleem wrote: > > > > On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote: > > > > > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > > > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > > > > > > Sent: Monday, January 15, 2018 6:57 AM > > > > > > To: Shiraz Saleem <shiraz.saleem@intel.com> > > > > > > Cc: Amrani, Ram <Ram.Amrani@cavium.com>; Sagi Grimberg > > > > > > <sagi@grimberg.me>; linux-rdma@vger.kernel.org; Elior, Ariel > > > > > > <Ariel.Elior@cavium.com>; target-devel <target- > devel@vger.kernel.org>; > > > > > > Potnuri Bharat Teja <bharat@chelsio.com> > > > > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > > > > Hi Shiraz, Ram, Ariel, & Potnuri, > > > > > > > > > > > > Following up on this old thread, as it relates to Potnuri's recent fix for a > iser- > > > > > > target queue-full memory leak: > > > > > > > > > > > > https://www.spinics.net/lists/target-devel/msg16282.html > > > > > > > > > > > > Just curious how frequent this happens in practice with sustained large > block > > > > > > workloads, as it appears to effect at least three different iwarp RNICS > (i40iw, > > > > > > qedr and iw_cxgb4)..? > > > > > > > > > > > > Is there anything else from an iser-target consumer level that should be > > > > > > changed for iwarp to avoid repeated ib_post_send() failures..? > > > > > > > > > > > Would like to mention, that although we are an iWARP RNIC as well, > we've hit this > > > > > Issue when running RoCE. It's not iWARP related. > > > > > This is easily reproduced within seconds with IO size of 5121K > > > > > Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks > each. > > > > > > > > > > IO Command used: > > > > > maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1 > > > > > > > > > > thanks, > > > > > Michal > > > > > > > > Its seen with block size >= 2M on a single target 1 RAM disk config. And > similar to Michals report; > > > > rather quickly, in a matter of seconds. > > > > > > > > fio --rw=read --bs 48k --numjobs=1 --iodepth\x128 --runtime0 -- > size g --loops=1 --ioengine=libaio > > > > --direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall -- > filename=/dev/sdb --name=sdb > > > > > > > > > > A couple of thoughts. > > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > consumers based on number of iser-target sq hw sges..? > > yes, I think HW num sge needs to be propagated to iscsi target. > > > > > > That is, if rdma_rw_ctx_post() -> ib_post_send() failures are related to > > > maximum payload size per I/O being too large there is an existing > > > > Yes they are IO size specific, I observed SQ overflow with fio for IO sizes above > > 256k and for READ tests only with chelsio(iw_cxgb4) adapters. > > Thanks for confirming. > > > > target_core_fabric_ops mechanism for limiting using SCSI residuals, > > > originally utilized by qla2xxx here: > > > > > > target/qla2xxx: Honor max_data_sg_nents I/O transfer limit > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8 > f9b565482c537821588444e09ff732c7d65ed6e > > > > > > Note this patch also will return a smaller Block Limits VPD (0x86) > > > MAXIMUM TRANSFER LENGTH based on max_data_sg_nents * PAGE_SIZE, > which > > > means for modern SCSI initiators honoring MAXIMUM TRANSFER LENGTH > will > > > automatically limit maximum outgoing payload transfer length, and avoid > > > SCSI residual logic. > > > > > > As-is, iser-target doesn't a propagate max_data_sg_ents limit into > > > iscsi-target, but you can try testing with a smaller value to see if > > > it's useful. Eg: > > > > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > b/drivers/target/iscsi/iscsi_target_configf > > > index 0ebc481..d8a4cc5 100644 > > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd > *se_cmd) > > > .module = THIS_MODULE, > > > .name = "iscsi", > > > .node_acl_size = sizeof(struct iscsi_node_acl), > > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM > TRANSFER LENGTH */ > > > .get_fabric_name = iscsi_get_fabric_name, > > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > > .tpg_get_tag = lio_tpg_get_tag, > > > > > With above change, SQ overflow isn't observed. I started of with > max_data_sg_nents = 16. > > OK, so max_data_sg_nents2 (MAXIMUM TRANSFER SIZE\x128K with 4k pages) > avoids SQ overflow with iw_cxgb4. > > What is iw_cxgb4 reporting to isert_create_cq():attr.cap.max_send_sge..? The ib_device attributes only advertise max_sge, which applies to both send and recv queues. Because the iw_cxgb4 RQ max sge is 4, iw_cxgb4 currently advertises 4 for max_sge, even though the SQ can handle more. So attr.cap.max_send_sge ends up being 4. I have a todo on my list to extend the rdma/core to have max_send_sge and max_recv_sge attributes. If it helps, I can bump up the priority of this. Perhaps Bharat could do the work. Steve. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-18 9:58 ` Nicholas A. Bellinger (?) (?) @ 2018-01-19 19:33 ` Kalderon, Michal 2018-01-24 7:55 ` Nicholas A. Bellinger -1 siblings, 1 reply; 44+ messages in thread From: Kalderon, Michal @ 2018-01-19 19:33 UTC (permalink / raw) To: Nicholas A. Bellinger, Shiraz Saleem Cc: Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav ________________________________________ From: Nicholas A. Bellinger <nab@linux-iscsi.org> Sent: Thursday, January 18, 2018 11:58 AM > Hi Shiraz, Michal & Co, Hi Nicholas, > Thanks for the feedback. Comments below. > A couple of thoughts. > First, would it be helpful to limit maximum payload size per I/O for > consumers based on number of iser-target sq hw sges..? I don't think you need to limit the maximum payload, but instead initialize the max_wr to be based on the number of supported SGEs Instead of what is there today: #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ ISERT_MAX_TX_MISC_PDUS + \ ISERT_MAX_RX_MISC_PDUS) Add the maximum number of WQEs per command, The calculation of number of WQEs per command needs to be something like "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". For some devices like ours, breaking the IO into multiple WRs according to supported number of SGEs doesn't necessarily means performance penalty. thanks, Michal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-19 19:33 ` Kalderon, Michal @ 2018-01-24 7:55 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 7:55 UTC (permalink / raw) To: Kalderon, Michal Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav Hi Michal & Co, On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote: > ________________________________________ > From: Nicholas A. Bellinger <nab@linux-iscsi.org> > Sent: Thursday, January 18, 2018 11:58 AM > > > Hi Shiraz, Michal & Co, > Hi Nicholas, > > > Thanks for the feedback. Comments below. > > > A couple of thoughts. > > > First, would it be helpful to limit maximum payload size per I/O for > > consumers based on number of iser-target sq hw sges..? > > I don't think you need to limit the maximum payload, but instead > initialize the max_wr to be based on the number of supported SGEs > Instead of what is there today: > #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ > ISERT_MAX_TX_MISC_PDUS + \ > ISERT_MAX_RX_MISC_PDUS) > Add the maximum number of WQEs per command, > The calculation of number of WQEs per command needs to be something like > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, right..? Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited attr.cap.max_send_sge..? > For some devices like ours, breaking the IO into multiple WRs according to supported > number of SGEs doesn't necessarily means performance penalty. > AFAICT ading max_data_sg_nents for iser-target is safe enough work-around to include for stable, assuming we agree on what the max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a larger default. I don't have a string preference either way, as long as it can be picked up for 4.x stable. Sagi, WDYT..? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 7:55 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 7:55 UTC (permalink / raw) To: Kalderon, Michal Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav Hi Michal & Co, On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote: > ________________________________________ > From: Nicholas A. Bellinger <nab@linux-iscsi.org> > Sent: Thursday, January 18, 2018 11:58 AM > > > Hi Shiraz, Michal & Co, > Hi Nicholas, > > > Thanks for the feedback. Comments below. > > > A couple of thoughts. > > > First, would it be helpful to limit maximum payload size per I/O for > > consumers based on number of iser-target sq hw sges..? > > I don't think you need to limit the maximum payload, but instead > initialize the max_wr to be based on the number of supported SGEs > Instead of what is there today: > #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ > ISERT_MAX_TX_MISC_PDUS + \ > ISERT_MAX_RX_MISC_PDUS) > Add the maximum number of WQEs per command, > The calculation of number of WQEs per command needs to be something like > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, right..? Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime vs. exposing a smaller max_data_sg_nents2 for ib_devices with limited attr.cap.max_send_sge..? > For some devices like ours, breaking the IO into multiple WRs according to supported > number of SGEs doesn't necessarily means performance penalty. > AFAICT ading max_data_sg_nents for iser-target is safe enough work-around to include for stable, assuming we agree on what the max_send_sg cut-off is for setting max_data_sg_nents2 usage from a larger default. I don't have a string preference either way, as long as it can be picked up for 4.x stable. Sagi, WDYT..? ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes 2018-01-24 7:55 ` Nicholas A. Bellinger @ 2018-01-24 8:09 ` Kalderon, Michal -1 siblings, 0 replies; 44+ messages in thread From: Kalderon, Michal @ 2018-01-24 8:09 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav > From: Nicholas A. Bellinger [mailto:nab@linux-iscsi.org] > Sent: Wednesday, January 24, 2018 9:56 AM > > Hi Michal & Co, > > On Fri, 2018-01-19 at 19:33 +0000, Kalderon, Michal wrote: > > ________________________________________ > > From: Nicholas A. Bellinger <nab@linux-iscsi.org> > > Sent: Thursday, January 18, 2018 11:58 AM > > > > > Hi Shiraz, Michal & Co, > > Hi Nicholas, > > > > > Thanks for the feedback. Comments below. > > > > > A couple of thoughts. > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > consumers based on number of iser-target sq hw sges..? > > > > I don't think you need to limit the maximum payload, but instead > > initialize the max_wr to be based on the number of supported SGEs > > Instead of what is there today: > > #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ > > ISERT_MAX_TX_MISC_PDUS + \ > > ISERT_MAX_RX_MISC_PDUS) Add the > > maximum number of WQEs per command, The calculation of number of > WQEs > > per command needs to be something like > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". > > > > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, > right..? Globally or perhaps configurable by sysfs configuration? > > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + > ISER_MAX_CQ_LEN, which currently depend on > ISERT_QP_MAX_REQ_DTOS.. I think it can remain dependent on MAX_REQ_DTOS. > > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime > vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited > attr.cap.max_send_sge..? For our device defining max_data_sg_nents didn't help on some scenarios, It seems that Frequency of the issue occurring increases with number of luns we Try to run over. > > > For some devices like ours, breaking the IO into multiple WRs > > according to supported number of SGEs doesn't necessarily means > performance penalty. > > > > AFAICT ading max_data_sg_nents for iser-target is safe enough work-around > to include for stable, assuming we agree on what the max_send_sg cut-off is > for setting max_data_sg_nents=32 usage from a larger default. I don't have > a string preference either way, as long as it can be picked up for 4.x stable. > > Sagi, WDYT..? ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 8:09 ` Kalderon, Michal 0 siblings, 0 replies; 44+ messages in thread From: Kalderon, Michal @ 2018-01-24 8:09 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Shiraz Saleem, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav PiBGcm9tOiBOaWNob2xhcyBBLiBCZWxsaW5nZXIgW21haWx0bzpuYWJAbGludXgtaXNjc2kub3Jn XQ0KPiBTZW50OiBXZWRuZXNkYXksIEphbnVhcnkgMjQsIDIwMTggOTo1NiBBTQ0KPiANCj4gSGkg TWljaGFsICYgQ28sDQo+IA0KPiBPbiBGcmksIDIwMTgtMDEtMTkgYXQgMTk6MzMgKzAwMDAsIEth bGRlcm9uLCBNaWNoYWwgd3JvdGU6DQo+ID4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fXw0KPiA+IEZyb206IE5pY2hvbGFzIEEuIEJlbGxpbmdlciA8bmFiQGxpbnV4LWlz Y3NpLm9yZz4NCj4gPiBTZW50OiBUaHVyc2RheSwgSmFudWFyeSAxOCwgMjAxOCAxMTo1OCBBTQ0K PiA+DQo+ID4gPiBIaSBTaGlyYXosIE1pY2hhbCAmIENvLA0KPiA+IEhpIE5pY2hvbGFzLA0KPiA+ DQo+ID4gPiBUaGFua3MgZm9yIHRoZSBmZWVkYmFjay4gIENvbW1lbnRzIGJlbG93Lg0KPiA+DQo+ ID4gPiBBIGNvdXBsZSBvZiB0aG91Z2h0cy4NCj4gPg0KPiA+ID4gRmlyc3QsIHdvdWxkIGl0IGJl IGhlbHBmdWwgdG8gbGltaXQgbWF4aW11bSBwYXlsb2FkIHNpemUgcGVyIEkvTyBmb3INCj4gPiA+ IGNvbnN1bWVycyBiYXNlZCBvbiBudW1iZXIgb2YgaXNlci10YXJnZXQgc3EgaHcgc2dlcy4uPw0K PiA+DQo+ID4gSSBkb24ndCB0aGluayB5b3UgbmVlZCB0byBsaW1pdCB0aGUgbWF4aW11bSBwYXls b2FkLCBidXQgaW5zdGVhZA0KPiA+IGluaXRpYWxpemUgdGhlIG1heF93ciB0byBiZSBiYXNlZCBv biB0aGUgbnVtYmVyIG9mIHN1cHBvcnRlZCBTR0VzDQo+ID4gSW5zdGVhZCBvZiB3aGF0IGlzIHRo ZXJlIHRvZGF5Og0KPiA+ICNkZWZpbmUgSVNFUlRfUVBfTUFYX1JFUV9EVE9TICAgKElTQ1NJX0RF Rl9YTUlUX0NNRFNfTUFYICsgICAgXA0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgSVNFUlRfTUFYX1RYX01JU0NfUERVUyAgKyBcDQo+ID4gICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICBJU0VSVF9NQVhfUlhfTUlTQ19QRFVTKSBBZGQgdGhlDQo+ID4gbWF4aW11bSBu dW1iZXIgb2YgV1FFcyBwZXIgY29tbWFuZCwgVGhlIGNhbGN1bGF0aW9uIG9mIG51bWJlciBvZg0K PiBXUUVzDQo+ID4gcGVyIGNvbW1hbmQgbmVlZHMgdG8gYmUgc29tZXRoaW5nIGxpa2UNCj4gPiAi TUFYX1RSQU5TRkVSX1NJWkUvKG51bVNnZXMqUEFHRV9TSVpFKSIuDQo+ID4NCj4gDQo+IE1ha2Vz IHNlbnNlLCBNQVhfVFJBTlNGRVJfU0laRSB3b3VsZCBiZSBkZWZpbmVkIGdsb2JhbGx5IGJ5IGlz ZXItdGFyZ2V0LA0KPiByaWdodC4uPw0KR2xvYmFsbHkgb3IgcGVyaGFwcyBjb25maWd1cmFibGUg Ynkgc3lzZnMgY29uZmlndXJhdGlvbj8NCj4gDQo+IEJ0dywgSSdtIG5vdCBzdXJlIGhvdyB0aGlz IGVmZmVjdHMgdXNhZ2Ugb2YgSVNFUl9NQVhfVFhfQ1FfTEVOICsNCj4gSVNFUl9NQVhfQ1FfTEVO LCB3aGljaCBjdXJyZW50bHkgZGVwZW5kIG9uDQo+IElTRVJUX1FQX01BWF9SRVFfRFRPUy4uDQpJ IHRoaW5rIGl0IGNhbiByZW1haW4gZGVwZW5kZW50IG9uIE1BWF9SRVFfRFRPUy4gDQo+IA0KPiBT YWdpLCB3aGF0IGFyZSB5b3VyIHRob3VnaHRzIHdydCBjaGFuZ2luZyBhdHRyLmNhcC5tYXhfc2Vu ZF93ciBhdCBydW50aW1lDQo+IHZzLiBleHBvc2luZyBhIHNtYWxsZXIgbWF4X2RhdGFfc2dfbmVu dHM9MzIgZm9yIGliX2RldmljZXMgd2l0aCBsaW1pdGVkDQo+IGF0dHIuY2FwLm1heF9zZW5kX3Nn ZS4uPw0KRm9yIG91ciBkZXZpY2UgZGVmaW5pbmcgbWF4X2RhdGFfc2dfbmVudHMgZGlkbid0IGhl bHAgb24gc29tZSBzY2VuYXJpb3MsDQpJdCBzZWVtcyB0aGF0IEZyZXF1ZW5jeSBvZiB0aGUgaXNz dWUgb2NjdXJyaW5nIGluY3JlYXNlcyB3aXRoIG51bWJlciBvZiBsdW5zIHdlDQpUcnkgdG8gcnVu IG92ZXIuIA0KDQo+IA0KPiA+IEZvciBzb21lIGRldmljZXMgbGlrZSBvdXJzLCBicmVha2luZyB0 aGUgSU8gaW50byBtdWx0aXBsZSBXUnMNCj4gPiBhY2NvcmRpbmcgdG8gc3VwcG9ydGVkIG51bWJl ciBvZiBTR0VzIGRvZXNuJ3QgbmVjZXNzYXJpbHkgbWVhbnMNCj4gcGVyZm9ybWFuY2UgcGVuYWx0 eS4NCj4gPg0KPiANCj4gQUZBSUNUIGFkaW5nIG1heF9kYXRhX3NnX25lbnRzIGZvciBpc2VyLXRh cmdldCBpcyBzYWZlIGVub3VnaCB3b3JrLWFyb3VuZA0KPiB0byBpbmNsdWRlIGZvciBzdGFibGUs IGFzc3VtaW5nIHdlIGFncmVlIG9uIHdoYXQgdGhlIG1heF9zZW5kX3NnIGN1dC1vZmYgaXMNCj4g Zm9yIHNldHRpbmcgbWF4X2RhdGFfc2dfbmVudHM9MzIgdXNhZ2UgZnJvbSBhIGxhcmdlciBkZWZh dWx0LiAgSSBkb24ndCBoYXZlDQo+IGEgc3RyaW5nIHByZWZlcmVuY2UgZWl0aGVyIHdheSwgYXMg bG9uZyBhcyBpdCBjYW4gYmUgcGlja2VkIHVwIGZvciA0Lnggc3RhYmxlLg0KPiANCj4gU2FnaSwg V0RZVC4uPw0KDQo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-24 8:09 ` Kalderon, Michal @ 2018-01-29 19:20 ` Sagi Grimberg -1 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:20 UTC (permalink / raw) To: Kalderon, Michal, Nicholas A. Bellinger Cc: Shiraz Saleem, Amrani, Ram, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav >> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime >> vs. exposing a smaller max_data_sg_nents=32 for ib_devices with limited >> attr.cap.max_send_sge..? > For our device defining max_data_sg_nents didn't help on some scenarios, > It seems that Frequency of the issue occurring increases with number of luns we > Try to run over. Maybe this is related to the queue-full strategy the target core takes by simply scheduling another attempt unconditionally without any hard guarantees that the next attempt will succeed? This flow is per se_device which might be a hint why its happening more with a larger number of luns? maybe the isert completion handler context (which is also workqueue) struggles with finding cpu quota? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-29 19:20 ` Sagi Grimberg 0 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:20 UTC (permalink / raw) To: Kalderon, Michal, Nicholas A. Bellinger Cc: Shiraz Saleem, Amrani, Ram, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav >> Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at runtime >> vs. exposing a smaller max_data_sg_nents2 for ib_devices with limited >> attr.cap.max_send_sge..? > For our device defining max_data_sg_nents didn't help on some scenarios, > It seems that Frequency of the issue occurring increases with number of luns we > Try to run over. Maybe this is related to the queue-full strategy the target core takes by simply scheduling another attempt unconditionally without any hard guarantees that the next attempt will succeed? This flow is per se_device which might be a hint why its happening more with a larger number of luns? maybe the isert completion handler context (which is also workqueue) struggles with finding cpu quota? ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> @ 2018-01-29 19:17 ` Sagi Grimberg 0 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:17 UTC (permalink / raw) To: Nicholas A. Bellinger, Kalderon, Michal Cc: Shiraz Saleem, Amrani, Ram, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav >>> First, would it be helpful to limit maximum payload size per I/O for >>> consumers based on number of iser-target sq hw sges..? >> >> I don't think you need to limit the maximum payload, but instead >> initialize the max_wr to be based on the number of supported SGEs >> Instead of what is there today: >> #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ >> ISERT_MAX_TX_MISC_PDUS + \ >> ISERT_MAX_RX_MISC_PDUS) >> Add the maximum number of WQEs per command, >> The calculation of number of WQEs per command needs to be something like >> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". >> > > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, > right..? > > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. > > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at > runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with > limited attr.cap.max_send_sge..? Sorry for the late reply, Can we go back and understand why do we need to limit isert transfer size? I would suggest that we handle queue-full scenarios instead of limiting the transfered payload size. From the trace Shiraz sent, it looks that: a) we are too chatty when failing to post a wr on a queue-pair (something that can happen by design), and b) isert escalates to terminating the connection which means we screwed up handling it. Shiraz, can you explain these messages: [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 Who is initiating the connection teardown? the initiator or the target? (looks like the initiator gave up on iscsi ping timeout expiration) Nic, Currently, what I see is that queue-full handler simply schedules qf work to re-issue the I/O again. The issue is that the only way that a new send queue entry becomes available again is that isert process one or more send completions. If at all, this work is interfering with the isert_send_done handler. Will it be possible that some transports will be able to schedule qf_work_queue themselves? I guess It would also hold if iscsit were to use non-blocking sockets and continue at .write_space()? Something like transport_process_wait_list() that would be triggered from the transport completion handler (or from centralize place like target_sess_put_cmd or something...)? Also, I see that this wait list is singular accross the se_device. Maybe it would be a better idea to have it per se_session as it maps to iscsi connection (or srp channel for that matter)? For large I/O sizes this should happen quite a lot so its a bit of a shame that we need will to compete over the list_empty check... If we prefer to make this go away by limiting the transfer size then its fine I guess, but maybe we can do better? (although it can take some extra work...) >> For some devices like ours, breaking the IO into multiple WRs according to supported >> number of SGEs doesn't necessarily means performance penalty. >> > > AFAICT ading max_data_sg_nents for iser-target is safe enough > work-around to include for stable, assuming we agree on what the > max_send_sg cut-off is for setting max_data_sg_nents=32 usage from a > larger default. I don't have a string preference either way, as long as > it can be picked up for 4.x stable. > > Sagi, WDYT..? I think its an easier fix for sure. What I don't know, is weather this introduces a regression for devices that can handle more sges on large I/O sizes. I very much doubt it will though... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-29 19:17 ` Sagi Grimberg 0 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:17 UTC (permalink / raw) To: Nicholas A. Bellinger, Kalderon, Michal Cc: Shiraz Saleem, Amrani, Ram, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav >>> First, would it be helpful to limit maximum payload size per I/O for >>> consumers based on number of iser-target sq hw sges..? >> >> I don't think you need to limit the maximum payload, but instead >> initialize the max_wr to be based on the number of supported SGEs >> Instead of what is there today: >> #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ >> ISERT_MAX_TX_MISC_PDUS + \ >> ISERT_MAX_RX_MISC_PDUS) >> Add the maximum number of WQEs per command, >> The calculation of number of WQEs per command needs to be something like >> "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". >> > > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, > right..? > > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. > > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at > runtime vs. exposing a smaller max_data_sg_nents2 for ib_devices with > limited attr.cap.max_send_sge..? Sorry for the late reply, Can we go back and understand why do we need to limit isert transfer size? I would suggest that we handle queue-full scenarios instead of limiting the transfered payload size. From the trace Shiraz sent, it looks that: a) we are too chatty when failing to post a wr on a queue-pair (something that can happen by design), and b) isert escalates to terminating the connection which means we screwed up handling it. Shiraz, can you explain these messages: [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 Who is initiating the connection teardown? the initiator or the target? (looks like the initiator gave up on iscsi ping timeout expiration) Nic, Currently, what I see is that queue-full handler simply schedules qf work to re-issue the I/O again. The issue is that the only way that a new send queue entry becomes available again is that isert process one or more send completions. If at all, this work is interfering with the isert_send_done handler. Will it be possible that some transports will be able to schedule qf_work_queue themselves? I guess It would also hold if iscsit were to use non-blocking sockets and continue at .write_space()? Something like transport_process_wait_list() that would be triggered from the transport completion handler (or from centralize place like target_sess_put_cmd or something...)? Also, I see that this wait list is singular accross the se_device. Maybe it would be a better idea to have it per se_session as it maps to iscsi connection (or srp channel for that matter)? For large I/O sizes this should happen quite a lot so its a bit of a shame that we need will to compete over the list_empty check... If we prefer to make this go away by limiting the transfer size then its fine I guess, but maybe we can do better? (although it can take some extra work...) >> For some devices like ours, breaking the IO into multiple WRs according to supported >> number of SGEs doesn't necessarily means performance penalty. >> > > AFAICT ading max_data_sg_nents for iser-target is safe enough > work-around to include for stable, assuming we agree on what the > max_send_sg cut-off is for setting max_data_sg_nents2 usage from a > larger default. I don't have a string preference either way, as long as > it can be picked up for 4.x stable. > > Sagi, WDYT..? I think its an easier fix for sure. What I don't know, is weather this introduces a regression for devices that can handle more sges on large I/O sizes. I very much doubt it will though... ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>]
* Re: SQ overflow seen running isert traffic with high block sizes [not found] ` <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> @ 2018-01-30 16:30 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-30 16:30 UTC (permalink / raw) To: Sagi Grimberg Cc: Nicholas A. Bellinger, Kalderon, Michal, Amrani, Ram, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav On Mon, Jan 29, 2018 at 09:17:02PM +0200, Sagi Grimberg wrote: > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > > consumers based on number of iser-target sq hw sges..? > > > > > > I don't think you need to limit the maximum payload, but instead > > > initialize the max_wr to be based on the number of supported SGEs > > > Instead of what is there today: > > > #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ > > > ISERT_MAX_TX_MISC_PDUS + \ > > > ISERT_MAX_RX_MISC_PDUS) > > > Add the maximum number of WQEs per command, > > > The calculation of number of WQEs per command needs to be something like > > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". > > > > > > > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, > > right..? > > > > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + > > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. > > > > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at > > runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with > > limited attr.cap.max_send_sge..? > > Sorry for the late reply, > > Can we go back and understand why do we need to limit isert transfer > size? I would suggest that we handle queue-full scenarios instead > of limiting the transfered payload size. > > From the trace Shiraz sent, it looks that: > a) we are too chatty when failing to post a wr on a queue-pair > (something that can happen by design), and > b) isert escalates to terminating the connection which means we > screwed up handling it. > > Shiraz, can you explain these messages: > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 These are some device specific Asynchronous Event logging I turned on. It indicates the QP received a FIN while in RTS and eventually was moved to CLOSED state. > Who is initiating the connection teardown? the initiator or the target? > (looks like the initiator gave up on iscsi ping timeout expiration) > Initiator -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-30 16:30 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-30 16:30 UTC (permalink / raw) To: Sagi Grimberg Cc: Nicholas A. Bellinger, Kalderon, Michal, Amrani, Ram, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja, Radzi, Amit, Galon, Yoav On Mon, Jan 29, 2018 at 09:17:02PM +0200, Sagi Grimberg wrote: > > > > > First, would it be helpful to limit maximum payload size per I/O for > > > > consumers based on number of iser-target sq hw sges..? > > > > > > I don't think you need to limit the maximum payload, but instead > > > initialize the max_wr to be based on the number of supported SGEs > > > Instead of what is there today: > > > #define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \ > > > ISERT_MAX_TX_MISC_PDUS + \ > > > ISERT_MAX_RX_MISC_PDUS) > > > Add the maximum number of WQEs per command, > > > The calculation of number of WQEs per command needs to be something like > > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)". > > > > > > > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target, > > right..? > > > > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN + > > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS.. > > > > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at > > runtime vs. exposing a smaller max_data_sg_nents2 for ib_devices with > > limited attr.cap.max_send_sge..? > > Sorry for the late reply, > > Can we go back and understand why do we need to limit isert transfer > size? I would suggest that we handle queue-full scenarios instead > of limiting the transfered payload size. > > From the trace Shiraz sent, it looks that: > a) we are too chatty when failing to post a wr on a queue-pair > (something that can happen by design), and > b) isert escalates to terminating the connection which means we > screwed up handling it. > > Shiraz, can you explain these messages: > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3 > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3 These are some device specific Asynchronous Event logging I turned on. It indicates the QP received a FIN while in RTS and eventually was moved to CLOSED state. > Who is initiating the connection teardown? the initiator or the target? > (looks like the initiator gave up on iscsi ping timeout expiration) > Initiator ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org>]
* RE: SQ overflow seen running isert traffic with high block sizes [not found] ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> @ 2018-01-22 17:49 ` Saleem, Shiraz 0 siblings, 0 replies; 44+ messages in thread From: Saleem, Shiraz @ 2018-01-22 17:49 UTC (permalink / raw) To: 'Nicholas A. Bellinger' Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > First, would it be helpful to limit maximum payload size per I/O for consumers > based on number of iser-target sq hw sges..? > Assuming data is not able to be fast registered as if virtually contiguous; artificially limiting the data size might not be the best solution. But max SGEs does need to be exposed higher. Somewhere in the stack, there might need to be multiple WRs submitted or data copied. > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > b/drivers/target/iscsi/iscsi_target_configf > index 0ebc481..d8a4cc5 100644 > --- a/drivers/target/iscsi/iscsi_target_configfs.c > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > .module = THIS_MODULE, > .name = "iscsi", > .node_acl_size = sizeof(struct iscsi_node_acl), > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM > TRANSFER LENGTH */ > .get_fabric_name = iscsi_get_fabric_name, > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > .tpg_get_tag = lio_tpg_get_tag, > BTW, this is helping the SQ overflow issue. ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: SQ overflow seen running isert traffic with high block sizes @ 2018-01-22 17:49 ` Saleem, Shiraz 0 siblings, 0 replies; 44+ messages in thread From: Saleem, Shiraz @ 2018-01-22 17:49 UTC (permalink / raw) To: 'Nicholas A. Bellinger' Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Elior, Ariel, target-devel, Potnuri Bharat Teja PiBTdWJqZWN0OiBSZTogU1Egb3ZlcmZsb3cgc2VlbiBydW5uaW5nIGlzZXJ0IHRyYWZmaWMgd2l0 aCBoaWdoIGJsb2NrIHNpemVzDQo+IA0KPiANCj4gRmlyc3QsIHdvdWxkIGl0IGJlIGhlbHBmdWwg dG8gbGltaXQgbWF4aW11bSBwYXlsb2FkIHNpemUgcGVyIEkvTyBmb3IgY29uc3VtZXJzDQo+IGJh c2VkIG9uIG51bWJlciBvZiBpc2VyLXRhcmdldCBzcSBodyBzZ2VzLi4/DQo+IA0KQXNzdW1pbmcg ZGF0YSBpcyBub3QgYWJsZSB0byBiZSBmYXN0IHJlZ2lzdGVyZWQgYXMgaWYgdmlydHVhbGx5IGNv bnRpZ3VvdXM7DQphcnRpZmljaWFsbHkgbGltaXRpbmcgdGhlIGRhdGEgc2l6ZSBtaWdodCBub3Qg YmUgdGhlIGJlc3Qgc29sdXRpb24uDQoNCkJ1dCBtYXggU0dFcyBkb2VzIG5lZWQgdG8gYmUgZXhw b3NlZCBoaWdoZXIuIFNvbWV3aGVyZSBpbiB0aGUgc3RhY2ssDQp0aGVyZSBtaWdodCBuZWVkIHRv IGJlIG11bHRpcGxlIFdScyBzdWJtaXR0ZWQgb3IgZGF0YSBjb3BpZWQuDQoNCj4gZGlmZiAtLWdp dCBhL2RyaXZlcnMvdGFyZ2V0L2lzY3NpL2lzY3NpX3RhcmdldF9jb25maWdmcy5jDQo+IGIvZHJp dmVycy90YXJnZXQvaXNjc2kvaXNjc2lfdGFyZ2V0X2NvbmZpZ2YNCj4gaW5kZXggMGViYzQ4MS4u ZDhhNGNjNSAxMDA2NDQNCj4gLS0tIGEvZHJpdmVycy90YXJnZXQvaXNjc2kvaXNjc2lfdGFyZ2V0 X2NvbmZpZ2ZzLmMNCj4gKysrIGIvZHJpdmVycy90YXJnZXQvaXNjc2kvaXNjc2lfdGFyZ2V0X2Nv bmZpZ2ZzLmMNCj4gQEAgLTE1NTMsNiArMTU1Myw3IEBAIHN0YXRpYyB2b2lkIGxpb19yZWxlYXNl X2NtZChzdHJ1Y3Qgc2VfY21kICpzZV9jbWQpDQo+ICAgICAgICAgLm1vZHVsZSAgICAgICAgICAg ICAgICAgICAgICAgICA9IFRISVNfTU9EVUxFLA0KPiAgICAgICAgIC5uYW1lICAgICAgICAgICAg ICAgICAgICAgICAgICAgPSAiaXNjc2kiLA0KPiAgICAgICAgIC5ub2RlX2FjbF9zaXplICAgICAg ICAgICAgICAgICAgPSBzaXplb2Yoc3RydWN0IGlzY3NpX25vZGVfYWNsKSwNCj4gKyAgICAgICAu bWF4X2RhdGFfc2dfbmVudHMgICAgICAgICAgICAgID0gMzIsIC8qIDMyICogUEFHRV9TSVpFID0g TUFYSU1VTQ0KPiBUUkFOU0ZFUiBMRU5HVEggKi8NCj4gICAgICAgICAuZ2V0X2ZhYnJpY19uYW1l ICAgICAgICAgICAgICAgID0gaXNjc2lfZ2V0X2ZhYnJpY19uYW1lLA0KPiAgICAgICAgIC50cGdf Z2V0X3d3biAgICAgICAgICAgICAgICAgICAgPSBsaW9fdHBnX2dldF9lbmRwb2ludF93d24sDQo+ ICAgICAgICAgLnRwZ19nZXRfdGFnICAgICAgICAgICAgICAgICAgICA9IGxpb190cGdfZ2V0X3Rh ZywNCj4gDQoNCkJUVywgdGhpcyBpcyBoZWxwaW5nIHRoZSBTUSBvdmVyZmxvdyBpc3N1ZS4NCg= ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-22 17:49 ` Saleem, Shiraz @ 2018-01-24 8:01 ` Nicholas A. Bellinger -1 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 8:01 UTC (permalink / raw) To: Saleem, Shiraz Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz & Co, Thanks for the feedback. On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote: > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > First, would it be helpful to limit maximum payload size per I/O for consumers > > based on number of iser-target sq hw sges..? > > > Assuming data is not able to be fast registered as if virtually contiguous; > artificially limiting the data size might not be the best solution. > > But max SGEs does need to be exposed higher. Somewhere in the stack, > there might need to be multiple WRs submitted or data copied. > Sagi..? > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > > b/drivers/target/iscsi/iscsi_target_configf > > index 0ebc481..d8a4cc5 100644 > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > .module = THIS_MODULE, > > .name = "iscsi", > > .node_acl_size = sizeof(struct iscsi_node_acl), > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM > > TRANSFER LENGTH */ > > .get_fabric_name = iscsi_get_fabric_name, > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > .tpg_get_tag = lio_tpg_get_tag, > > > > BTW, this is helping the SQ overflow issue. Thanks for confirming as a possible work-around. For reference, what is i40iw's max_send_sg reporting..? Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER LENGTH to avoid consistent SQ overflow as-is with i40iw..? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-24 8:01 ` Nicholas A. Bellinger 0 siblings, 0 replies; 44+ messages in thread From: Nicholas A. Bellinger @ 2018-01-24 8:01 UTC (permalink / raw) To: Saleem, Shiraz Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi Shiraz & Co, Thanks for the feedback. On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote: > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > First, would it be helpful to limit maximum payload size per I/O for consumers > > based on number of iser-target sq hw sges..? > > > Assuming data is not able to be fast registered as if virtually contiguous; > artificially limiting the data size might not be the best solution. > > But max SGEs does need to be exposed higher. Somewhere in the stack, > there might need to be multiple WRs submitted or data copied. > Sagi..? > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > > b/drivers/target/iscsi/iscsi_target_configf > > index 0ebc481..d8a4cc5 100644 > > --- a/drivers/target/iscsi/iscsi_target_configfs.c > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > > @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) > > .module = THIS_MODULE, > > .name = "iscsi", > > .node_acl_size = sizeof(struct iscsi_node_acl), > > + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM > > TRANSFER LENGTH */ > > .get_fabric_name = iscsi_get_fabric_name, > > .tpg_get_wwn = lio_tpg_get_endpoint_wwn, > > .tpg_get_tag = lio_tpg_get_tag, > > > > BTW, this is helping the SQ overflow issue. Thanks for confirming as a possible work-around. For reference, what is i40iw's max_send_sg reporting..? Is max_data_sg_nents2 + 4k pages = 128K the largest MAX TRANSFER LENGTH to avoid consistent SQ overflow as-is with i40iw..? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-24 8:01 ` Nicholas A. Bellinger @ 2018-01-26 18:52 ` Shiraz Saleem -1 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-26 18:52 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja On Wed, Jan 24, 2018 at 01:01:58AM -0700, Nicholas A. Bellinger wrote: > Hi Shiraz & Co, > > Thanks for the feedback. > > On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote: > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > First, would it be helpful to limit maximum payload size per I/O for consumers > > > based on number of iser-target sq hw sges..? > > > > > Assuming data is not able to be fast registered as if virtually contiguous; > > artificially limiting the data size might not be the best solution. > > > > But max SGEs does need to be exposed higher. Somewhere in the stack, > > there might need to be multiple WRs submitted or data copied. > > > > Sagi..? > > > For reference, what is i40iw's max_send_sg reporting..? 3 > > Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER > LENGTH to avoid consistent SQ overflow as-is with i40iw..? For the configuration I am testing, max_data_sg_nents=32 & 64 worked, and the SQ overflow issue reproduced at 128. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-26 18:52 ` Shiraz Saleem 0 siblings, 0 replies; 44+ messages in thread From: Shiraz Saleem @ 2018-01-26 18:52 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Kalderon, Michal, Amrani, Ram, Sagi Grimberg, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja On Wed, Jan 24, 2018 at 01:01:58AM -0700, Nicholas A. Bellinger wrote: > Hi Shiraz & Co, > > Thanks for the feedback. > > On Mon, 2018-01-22 at 17:49 +0000, Saleem, Shiraz wrote: > > > Subject: Re: SQ overflow seen running isert traffic with high block sizes > > > > > > > > > First, would it be helpful to limit maximum payload size per I/O for consumers > > > based on number of iser-target sq hw sges..? > > > > > Assuming data is not able to be fast registered as if virtually contiguous; > > artificially limiting the data size might not be the best solution. > > > > But max SGEs does need to be exposed higher. Somewhere in the stack, > > there might need to be multiple WRs submitted or data copied. > > > > Sagi..? > > > For reference, what is i40iw's max_send_sg reporting..? 3 > > Is max_data_sg_nents2 + 4k pages = 128K the largest MAX TRANSFER > LENGTH to avoid consistent SQ overflow as-is with i40iw..? For the configuration I am testing, max_data_sg_nents2 & 64 worked, and the SQ overflow issue reproduced at 128. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes 2018-01-24 8:01 ` Nicholas A. Bellinger @ 2018-01-29 19:36 ` Sagi Grimberg -1 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:36 UTC (permalink / raw) To: Nicholas A. Bellinger, Saleem, Shiraz Cc: Kalderon, Michal, Amrani, Ram, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi, >>> First, would it be helpful to limit maximum payload size per I/O for consumers >>> based on number of iser-target sq hw sges..? >>> >> Assuming data is not able to be fast registered as if virtually contiguous; >> artificially limiting the data size might not be the best solution. >> >> But max SGEs does need to be exposed higher. Somewhere in the stack, >> there might need to be multiple WRs submitted or data copied. >> > > Sagi..? I tend to agree that if the adapter support just a hand-full of sges its counter-productive to expose infinite data transfer size. On the other hand, I think we should be able to chunk more with memory registrations (although rdma rw code never even allocates them for non-iwarp devices). We have an API for check this in the RDMA core (thanks to chuck) introduced in: commit 0062818298662d0d05061949d12880146b5ebd65 Author: Chuck Lever <chuck.lever@oracle.com> Date: Mon Aug 28 15:06:14 2017 -0400 rdma core: Add rdma_rw_mr_payload() The amount of payload per MR depends on device capabilities and the memory registration mode in use. The new rdma_rw API hides both, making it difficult for ULPs to determine how large their transport send queues need to be. Expose the MR payload information via a new API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> So the easy way out would be to use that and plug it to max_sg_data_nents. Regardless, queue-full logic today yields a TX attack on the transport. >>> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c >>> b/drivers/target/iscsi/iscsi_target_configf >>> index 0ebc481..d8a4cc5 100644 >>> --- a/drivers/target/iscsi/iscsi_target_configfs.c >>> +++ b/drivers/target/iscsi/iscsi_target_configfs.c >>> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) >>> .module = THIS_MODULE, >>> .name = "iscsi", >>> .node_acl_size = sizeof(struct iscsi_node_acl), >>> + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM >>> TRANSFER LENGTH */ >>> .get_fabric_name = iscsi_get_fabric_name, >>> .tpg_get_wwn = lio_tpg_get_endpoint_wwn, >>> .tpg_get_tag = lio_tpg_get_tag, >>> >> >> BTW, this is helping the SQ overflow issue. > > Thanks for confirming as a possible work-around. > > For reference, what is i40iw's max_send_sg reporting..? > > Is max_data_sg_nents=32 + 4k pages = 128K the largest MAX TRANSFER > LENGTH to avoid consistent SQ overflow as-is with i40iw..? I vaguely recall that this is the maximum mr length for i40e (and cxgb4 if I'm not mistaken). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: SQ overflow seen running isert traffic with high block sizes @ 2018-01-29 19:36 ` Sagi Grimberg 0 siblings, 0 replies; 44+ messages in thread From: Sagi Grimberg @ 2018-01-29 19:36 UTC (permalink / raw) To: Nicholas A. Bellinger, Saleem, Shiraz Cc: Kalderon, Michal, Amrani, Ram, linux-rdma, Elior, Ariel, target-devel, Potnuri Bharat Teja Hi, >>> First, would it be helpful to limit maximum payload size per I/O for consumers >>> based on number of iser-target sq hw sges..? >>> >> Assuming data is not able to be fast registered as if virtually contiguous; >> artificially limiting the data size might not be the best solution. >> >> But max SGEs does need to be exposed higher. Somewhere in the stack, >> there might need to be multiple WRs submitted or data copied. >> > > Sagi..? I tend to agree that if the adapter support just a hand-full of sges its counter-productive to expose infinite data transfer size. On the other hand, I think we should be able to chunk more with memory registrations (although rdma rw code never even allocates them for non-iwarp devices). We have an API for check this in the RDMA core (thanks to chuck) introduced in: commit 0062818298662d0d05061949d12880146b5ebd65 Author: Chuck Lever <chuck.lever@oracle.com> Date: Mon Aug 28 15:06:14 2017 -0400 rdma core: Add rdma_rw_mr_payload() The amount of payload per MR depends on device capabilities and the memory registration mode in use. The new rdma_rw API hides both, making it difficult for ULPs to determine how large their transport send queues need to be. Expose the MR payload information via a new API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> So the easy way out would be to use that and plug it to max_sg_data_nents. Regardless, queue-full logic today yields a TX attack on the transport. >>> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c >>> b/drivers/target/iscsi/iscsi_target_configf >>> index 0ebc481..d8a4cc5 100644 >>> --- a/drivers/target/iscsi/iscsi_target_configfs.c >>> +++ b/drivers/target/iscsi/iscsi_target_configfs.c >>> @@ -1553,6 +1553,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd) >>> .module = THIS_MODULE, >>> .name = "iscsi", >>> .node_acl_size = sizeof(struct iscsi_node_acl), >>> + .max_data_sg_nents = 32, /* 32 * PAGE_SIZE = MAXIMUM >>> TRANSFER LENGTH */ >>> .get_fabric_name = iscsi_get_fabric_name, >>> .tpg_get_wwn = lio_tpg_get_endpoint_wwn, >>> .tpg_get_tag = lio_tpg_get_tag, >>> >> >> BTW, this is helping the SQ overflow issue. > > Thanks for confirming as a possible work-around. > > For reference, what is i40iw's max_send_sg reporting..? > > Is max_data_sg_nents2 + 4k pages = 128K the largest MAX TRANSFER > LENGTH to avoid consistent SQ overflow as-is with i40iw..? I vaguely recall that this is the maximum mr length for i40e (and cxgb4 if I'm not mistaken). ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2018-01-30 16:30 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-28 9:25 SQ overflow seen running isert traffic with high block sizes Amrani, Ram [not found] ` <BN3PR07MB25784033E7FCD062FA0A7855F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2017-06-28 10:35 ` Potnuri Bharat Teja [not found] ` <20170628103505.GA27517-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> 2017-06-28 11:29 ` Amrani, Ram 2017-06-28 10:39 ` Sagi Grimberg 2017-06-28 11:32 ` Amrani, Ram [not found] ` <BN3PR07MB25786338EADC77A369A6D493F8DD0-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2017-07-13 18:29 ` Nicholas A. Bellinger 2017-07-17 9:26 ` Amrani, Ram [not found] ` <BN3PR07MB2578E6561CC669922A322245F8A00-EldUQEzkDQfpW3VS/XPqkOFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2017-10-06 22:40 ` Shiraz Saleem 2017-10-06 22:40 ` Shiraz Saleem [not found] ` <20171006224025.GA23364-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org> 2018-01-15 4:56 ` Nicholas A. Bellinger 2018-01-15 4:56 ` Nicholas A. Bellinger [not found] ` <1515992195.24576.156.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> 2018-01-15 10:12 ` Kalderon, Michal 2018-01-15 10:12 ` Kalderon, Michal [not found] ` <CY1PR0701MB2012E53C69D1CE3E16BA320B88EB0-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2018-01-15 15:22 ` Shiraz Saleem 2018-01-15 15:22 ` Shiraz Saleem 2018-01-18 9:58 ` Nicholas A. Bellinger 2018-01-18 9:58 ` Nicholas A. Bellinger 2018-01-18 17:53 ` Potnuri Bharat Teja 2018-01-18 17:53 ` Potnuri Bharat Teja [not found] ` <20180118175316.GA11338-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org> 2018-01-24 7:25 ` Nicholas A. Bellinger 2018-01-24 7:25 ` Nicholas A. Bellinger 2018-01-24 12:21 ` Potnuri Bharat Teja 2018-01-24 12:33 ` Potnuri Bharat Teja [not found] ` <1516778717.24576.319.came l@haakon3.daterainc.com> [not found] ` <1516778717.24576.319.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> 2018-01-24 16:03 ` Steve Wise 2018-01-24 16:03 ` Steve Wise 2018-01-19 19:33 ` Kalderon, Michal 2018-01-24 7:55 ` Nicholas A. Bellinger 2018-01-24 7:55 ` Nicholas A. Bellinger 2018-01-24 8:09 ` Kalderon, Michal 2018-01-24 8:09 ` Kalderon, Michal 2018-01-29 19:20 ` Sagi Grimberg 2018-01-29 19:20 ` Sagi Grimberg [not found] ` <1516780534.24576.335.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> 2018-01-29 19:17 ` Sagi Grimberg 2018-01-29 19:17 ` Sagi Grimberg [not found] ` <55569d98-7f8c-7414-ab03-e52e2bfc518b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 2018-01-30 16:30 ` Shiraz Saleem 2018-01-30 16:30 ` Shiraz Saleem [not found] ` <1516269522.24576.274.camel-XoQW25Eq2zs8TOCF0fvnoXxStJ4P+DSV@public.gmane.org> 2018-01-22 17:49 ` Saleem, Shiraz 2018-01-22 17:49 ` Saleem, Shiraz 2018-01-24 8:01 ` Nicholas A. Bellinger 2018-01-24 8:01 ` Nicholas A. Bellinger 2018-01-26 18:52 ` Shiraz Saleem 2018-01-26 18:52 ` Shiraz Saleem 2018-01-29 19:36 ` Sagi Grimberg 2018-01-29 19:36 ` Sagi Grimberg
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.