From: Potnuri Bharat Teja <bharat@chelsio.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: SWise OGC <swise@opengridcomputing.com>,
'Sagi Grimberg' <sagi@grimberg.me>,
"target-devel@vger.kernel.org" <target-devel@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: SQ overflow seen running isert traffic
Date: Mon, 20 Mar 2017 15:45:28 +0530 [thread overview]
Message-ID: <20170320101526.GA11699@chelsio.com> (raw)
In-Reply-To: <20161108100617.GA2812@chelsio.com>
On Tuesday, November 11/08/16, 2016 at 15:36:18 +0530, Potnuri Bharat Teja wrote:
> On Monday, October 10/31/16, 2016 at 09:10:08 +0530, Nicholas A. Bellinger
> wrote:
Hi Nicholas,
Tested the following 3 patches for SQ overflow.
https://www.spinics.net/lists/target-devel/msg13571.html
They along with an additional change added on top of those seems to hold good.( Aborts seen
intermittently as shown at the end of mail)
I have been testing the changes using fio with IO sizes ranging from 1KB to 32MB and test is
running fine (despite few Intermittent Aborts).
Here is what I see with the 3 patches alone applied:
-> In isert_put_datain() and isert_post_response() a corresponding recv WR is posted before
posting a send and hence for every post failure a recv is already posted into a tightly packed
RQ, causing it to overflow.
Below are the logs suggesting it:
[ 189.944714] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944718] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944721] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944724] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944729] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944731] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944731] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944732] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944733] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944734] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944735] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944736] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944736] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944737] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944738] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944739] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944740] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944740] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944741] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944746] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944747] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944748] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944749] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 251.543847] ABORT_TASK: Found referenced iSCSI task_tag: 68
[ 251.543858] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 68
[ 251.544071] ABORT_TASK: Found referenced iSCSI task_tag: 46
[ 251.544076] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 46
[ 251.544300] ABORT_TASK: Found referenced iSCSI task_tag: 51
[ 251.544307] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 51
[ 251.544528] ABORT_TASK: Found referenced iSCSI task_tag: 117
[ 251.544533] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 117
[ 282.778402] Unable to locate ITT: 0x00000062 on CID: 0
[ 282.778404] Unable to locate RefTaskTag: 0x00000062 on CID: 0.
[ 282.778407] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[ 282.778436] ABORT_TASK: Found referenced iSCSI task_tag: 37
[ 282.778439] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 37
[ 282.778519] ABORT_TASK: Found referenced iSCSI task_tag: 65
[ 282.778522] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 65
[ 282.778559] Unable to locate ITT: 0x00000019 on CID: 0
[ 282.778560] Unable to locate RefTaskTag: 0x00000019 on CID: 0.
[ 282.778563] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[ 293.004632] ABORT_TASK: Found referenced iSCSI task_tag: 22
[ 293.004643] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 22
[ 293.004808] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.1994-05.com.redhat:37fc2b6a5331
[ 310.265993] iSCSI Login timeout on Network Portal 10.0.2.2:3260
[ 493.567143] INFO: task iscsi_np:3998 blocked for more than 120 seconds.
[ 493.567149] Tainted: G OE 4.9.0-For-DSGL-Debug+ #1
[ 493.567151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 493.567153] iscsi_np D 0 3998 2 0x00000084
[ 493.567159] 0000000000000000 ffff883028a88b40 ffff88302693f7c0 ffff88302de3e300
[ 493.567163] ffff88303e057a40 ffffc900008abc00 ffffffff815e415b ffffc900008abca0
[ 493.567166] ffffffff81af4180 ffff883028a88b40 7fffffffffffffff ffff882f7426db68
[ 493.567170] Call Trace:
[ 493.567180] [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[ 493.567184] [<ffffffff815e4616>] schedule+0x36/0x80
[ 493.567187] [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[ 493.567194] [<ffffffff812c1f2e>] ? number+0x2ce/0x300
[ 493.567197] [<ffffffff812c42be>] ? vsnprintf+0x32e/0x4b0
[ 493.567199] [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[ 493.567201] [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[ 493.567216] [<ffffffffa05284d1>] iscsit_stop_session+0x1b1/0x1c0 [iscsi_target_mod]
[ 493.567220] [<ffffffffa0519db2>] iscsi_check_for_session_reinstatement+0x1e2/0x270 [iscsi_target_mod]
[ 493.567225] [<ffffffffa051c8c0>] iscsi_target_check_for_existing_instances+0x30/0x40 [iscsi_target_mod]
[ 493.567230] [<ffffffffa051c9ee>] iscsi_target_do_login+0x11e/0x610 [iscsi_target_mod]
[ 493.567235] [<ffffffffa051dbbe>] iscsi_target_start_negotiation+0x4e/0xa0 [iscsi_target_mod]
[ 493.567239] [<ffffffffa051b933>] iscsi_target_login_thread+0xa73/0xfc0 [iscsi_target_mod]
[ 493.567244] [<ffffffffa051aec0>] ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
[ 493.567246] [<ffffffff810660c4>] ? SyS_exit_group+0x14/0x20
[ 493.567249] [<ffffffff8107ee5a>] kthread+0xca/0xe0
[ 493.567250] [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[ 493.567252] [<ffffffff815e8752>] ret_from_fork+0x22/0x30
[ 493.567254] INFO: task iscsi_trx:4149 blocked for more than 120 seconds.
[ 493.567255] Tainted: G OE 4.9.0-For-DSGL-Debug+ #1
[ 493.567256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 493.567256] iscsi_trx D 0 4149 2 0x00000084
[ 493.567258] 0000000000000000 ffff882f5ef70340 ffff882f76253040 ffff88302de40340
[ 493.567259] ffff88303e097a40 ffffc90000f93c90 ffffffff815e415b ffffffff815e56c2
[ 493.567261] 0000000000000001 ffff882f5ef70340 7fffffffffffffff ffff882d233cc9b0
[ 493.567262] Call Trace:
[ 493.567264] [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[ 493.567265] [<ffffffff815e56c2>] ? wait_for_completion+0xd2/0xf0
[ 493.567267] [<ffffffff815e4616>] schedule+0x36/0x80
[ 493.567268] [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[ 493.567281] [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[ 493.567285] [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[ 493.567286] [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[ 493.567287] [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[ 493.567302] [<ffffffffa04d5399>] target_wait_for_sess_cmds+0x49/0x1a0 [target_core_mod]
[ 493.567306] [<ffffffffa0471644>] isert_wait_conn+0x1a4/0x2d0 [ib_isert]
[ 493.567311] [<ffffffffa05289ab>] iscsit_close_connection+0x14b/0x820 [iscsi_target_mod]
[ 493.567315] [<ffffffffa05172ef>] iscsit_take_action_for_connection_exit+0x7f/0x100 [iscsi_target_mod]
[ 493.567320] [<ffffffffa0528265>] iscsi_target_rx_thread+0x95/0xa0 [iscsi_target_mod]
[ 493.567324] [<ffffffffa05281d0>] ? iscsi_target_tx_thread+0x1b0/0x1b0 [iscsi_target_mod]
[ 493.567325] [<ffffffff8107ee5a>] kthread+0xca/0xe0
[ 493.567327] [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[ 493.567328] [<ffffffff815e8752>] ret_from_fork+0x22/0x30
-> In order to address this I tried the following (experimental) change on top of your 3 patches:
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 501a0ad124a7..0aead5faa2a2 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1817,16 +1817,16 @@ isert_post_response(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd)
struct ib_send_wr *wr_failed;
int ret;
- ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
+ ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
+ &wr_failed);
if (ret) {
- isert_err("ib_post_recv failed with %d\n", ret);
+ isert_err("ib_post_send failed with %d\n", ret);
return ret;
}
- ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
- &wr_failed);
+ ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
if (ret) {
- isert_err("ib_post_send failed with %d\n", ret);
+ isert_err("ib_post_recv failed with %d\n", ret);
return ret;
}
return ret;
@@ -2166,6 +2166,13 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
if (isert_prot_cmd(isert_conn, se_cmd)) {
isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
cqe = &isert_cmd->tx_desc.tx_cqe;
+
+ rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+ if (rc) {
+ isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+ isert_cmd, rc);
+ return rc;
+ }
} else {
/*
* Build isert_conn->tx_desc for iSCSI response PDU and attach
@@ -2178,18 +2185,21 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
isert_init_send_wr(isert_conn, isert_cmd,
&isert_cmd->tx_desc.send_wr);
+ chain_wr = &isert_cmd->tx_desc.send_wr;
+
+ rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+ if (rc) {
+ isert_err("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+ isert_cmd, rc);
+ return rc;
+ }
+
rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
if (rc) {
isert_err("ib_post_recv failed with %d\n", rc);
return rc;
}
-
- chain_wr = &isert_cmd->tx_desc.send_wr;
}
- rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
- isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
- isert_cmd, rc);
-
return rc;
}
-> With these changes it works fine, But I still see few Aborts intermittently as follows:
[10674.434643] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb961138 failed to post RDMA res
[10674.434690] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb9434c0 failed to post RDMA res
[10715.186372] ABORT_TASK: Found referenced iSCSI task_tag: 58
[10715.186417] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 58
-> Apart from the Aborts and post failure logs filling dmesg, luns are in state "running" and
fio countinues to run.
[root@bharat ~]# iscsiadm -m session -P3 | grep "Attached scsi disk"
Attached scsi disk sdj State: running
Attached scsi disk sdc State: running
Attached scsi disk sdd State: running
Attached scsi disk sde State: running
Attached scsi disk sdf State: running
Attached scsi disk sdg State: running
Attached scsi disk sdh State: running
Attached scsi disk sdi State: running
Thanks,
Bharat.
> > Hi Steve, Potnuri, & Co,
> >
> > On Tue, 2016-10-18 at 09:34 -0500, Steve Wise wrote:
> > > >
> > > > > I tried out this change and it works fine with iwarp. I dont see
> SQ
> > > > > overflow. Apparently we have increased the sq too big to overflow.
> I am
> > > going
> > > > > to let it run with higher workloads for longer time, to see if it
> holds
> > > good.
> > > >
> > > > Actually on second thought, this patch is an overkill. Effectively
> we
> > > > now set:
> > > >
> > > > MAX_CMD=266
> > > > and max_rdma_ctx=128 so together we take 394 which seems to too
> much.
> > > >
> > > > If we go by the scheme of 1 rdma + 1 send for each IO we need:
> > > > - 128 sends
> > > > - 128 rdmas
> > > > - 10 miscs
> > > >
> > > > so this gives 266.
> > > >
> > > > Perhaps this is due to the fact that iWARP needs to register memory
> for
> > > > rdma reads as well? (and also rdma writes > 128k for chelsio HW
> right?)
> > > >
> > >
> > > iWARP definitely needs to register memory for the target of reads, due
> to
> > > REMOTE_WRITE requirement for the protocol. The source of a write
> doesn't need
> > > to register memory, but the SGE depth can cause multiple WRITE WRs to
> be
> > > required to service the IO. And in theory there should be some
> threshold where
> > > it might be better performance-wise to do a memory register + 1 WRITE
> vs X
> > > WRITEs.
> > >
> > > As you mentioned, the RW API should account for this, but perhaps it
> is still
> > > off some. Bharat, have a look into the RDMA-RW API and let us see if
> we can
> > > figure out if the additional SQ depth it adds is sufficient.
> > >
> > > > What is the workload you are running? with immediatedata enabled you
> > > > should issue reg+rdma_read+send only for writes > 8k.
> > > >
> > > > Does this happen when you run only reads for example?
> > > >
> > > > I guess its time to get the sq accounting into shape...
> > >
> > > So to sum up - 2 issues:
> > >
> > > 1) we believe the iSER + RW API correctly sizes the SQ, yet we're
> seeing SQ
> > > overflows. So the SQ sizing needs more investigation.
> > >
> > > 2) if the SQ is full, then the iSER/target code is supposed to
> resubmit. And
> > > apparently that isn't working.
> > >
> >
> > For #2, target-core expects -ENOMEM or -EAGAIN return from fabric driver
> > callbacks to signal internal queue-full retry logic. Otherwise, the
> > extra se_cmd->cmd_kref response SCF_ACK_KREF is leaked until session
> > shutdown and/or reinstatement occurs.
> >
> > AFAICT, Potunri's earlier hung task with v4.8.y + ABORT_TASK is likely
> > the earlier v4.1+ regression:
> >
> >
> [1]https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
> >
> > That said, there is room for improvement in target-core queue-full error
> > signaling, and iscsi-target/iser-target callback error propagation.
> >
> > Sending out a series shortly to address these particular items.
> > Please have a look.
> >
> Thanks for the changes Nicholas.
> Testing them now.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at [2]http://vger.kernel.org/majordomo-info.html
>
> References
>
> Visible links
> 1. https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
> 2. http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-03-20 10:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-27 7:01 RQ overflow seen running isert traffic Potnuri Bharat Teja
[not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12 ` Steve Wise
2016-10-05 6:14 ` Sagi Grimberg
2016-10-17 11:16 ` Potnuri Bharat Teja
2016-10-17 18:29 ` Steve Wise
2016-10-18 8:04 ` Sagi Grimberg
2016-10-18 11:28 ` SQ " Potnuri Bharat Teja
2016-10-18 13:17 ` Sagi Grimberg
[not found] ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34 ` Steve Wise
2016-10-18 16:13 ` Jason Gunthorpe
2016-10-18 19:03 ` Steve Wise
2016-10-20 8:34 ` Sagi Grimberg
[not found] ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05 ` Potnuri Bharat Teja
2017-03-20 15:04 ` Steve Wise
2016-10-31 3:40 ` Nicholas A. Bellinger
2016-11-02 17:03 ` Steve Wise
[not found] ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06 ` Potnuri Bharat Teja
2017-03-20 10:15 ` Potnuri Bharat Teja [this message]
2017-03-21 6:32 ` Nicholas A. Bellinger
2017-03-21 7:51 ` Potnuri Bharat Teja
[not found] ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52 ` Sagi Grimberg
[not found] ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25 ` [SPAMMY (7.002)]Re: " Potnuri Bharat Teja
[not found] ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38 ` Sagi Grimberg
[not found] ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50 ` Potnuri Bharat Teja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170320101526.GA11699@chelsio.com \
--to=bharat@chelsio.com \
--cc=linux-rdma@vger.kernel.org \
--cc=nab@linux-iscsi.org \
--cc=sagi@grimberg.me \
--cc=swise@opengridcomputing.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.