All of lore.kernel.org
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat@chelsio.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: SWise OGC <swise@opengridcomputing.com>,
	'Sagi Grimberg' <sagi@grimberg.me>,
	"target-devel@vger.kernel.org" <target-devel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: SQ overflow seen running isert traffic
Date: Mon, 20 Mar 2017 15:45:28 +0530	[thread overview]
Message-ID: <20170320101526.GA11699@chelsio.com> (raw)
In-Reply-To: <20161108100617.GA2812@chelsio.com>

On Tuesday, November 11/08/16, 2016 at 15:36:18 +0530, Potnuri Bharat Teja wrote:
>    On Monday, October 10/31/16, 2016 at 09:10:08 +0530, Nicholas A. Bellinger
>    wrote:
Hi Nicholas,
Tested the following 3 patches for SQ overflow.
https://www.spinics.net/lists/target-devel/msg13571.html
They along with an additional change added on top of those seems to hold good.( Aborts seen 
intermittently as shown at the end of mail)
I have been testing the changes using fio with IO sizes ranging from 1KB to 32MB and test is 
running fine (despite few Intermittent Aborts).

Here is what I see with the 3 patches alone applied:

-> In isert_put_datain() and isert_post_response() a corresponding recv WR is posted before
posting a send and hence for every post failure a recv is already posted into a tightly packed
RQ, causing it to overflow.

Below are the logs suggesting it:

[  189.944714] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944718] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944721] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944724] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944729] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944731] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944731] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944732] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944733] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944734] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944735] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944736] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944736] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944737] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944738] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944739] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944740] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944740] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944741] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944746] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[  189.944747] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  189.944748] isert: isert_put_datain: ib_post_recv failed with -12
[  189.944749] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[  251.543847] ABORT_TASK: Found referenced iSCSI task_tag: 68
[  251.543858] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 68
[  251.544071] ABORT_TASK: Found referenced iSCSI task_tag: 46
[  251.544076] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 46
[  251.544300] ABORT_TASK: Found referenced iSCSI task_tag: 51
[  251.544307] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 51
[  251.544528] ABORT_TASK: Found referenced iSCSI task_tag: 117
[  251.544533] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 117
[  282.778402] Unable to locate ITT: 0x00000062 on CID: 0
[  282.778404] Unable to locate RefTaskTag: 0x00000062 on CID: 0.
[  282.778407] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[  282.778436] ABORT_TASK: Found referenced iSCSI task_tag: 37
[  282.778439] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 37
[  282.778519] ABORT_TASK: Found referenced iSCSI task_tag: 65
[  282.778522] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 65
[  282.778559] Unable to locate ITT: 0x00000019 on CID: 0
[  282.778560] Unable to locate RefTaskTag: 0x00000019 on CID: 0.
[  282.778563] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[  293.004632] ABORT_TASK: Found referenced iSCSI task_tag: 22
[  293.004643] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 22
[  293.004808] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.1994-05.com.redhat:37fc2b6a5331
[  310.265993] iSCSI Login timeout on Network Portal 10.0.2.2:3260
[  493.567143] INFO: task iscsi_np:3998 blocked for more than 120 seconds.
[  493.567149]       Tainted: G           OE   4.9.0-For-DSGL-Debug+ #1
[  493.567151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  493.567153] iscsi_np        D    0  3998      2 0x00000084
[  493.567159]  0000000000000000 ffff883028a88b40 ffff88302693f7c0 ffff88302de3e300
[  493.567163]  ffff88303e057a40 ffffc900008abc00 ffffffff815e415b ffffc900008abca0
[  493.567166]  ffffffff81af4180 ffff883028a88b40 7fffffffffffffff ffff882f7426db68
[  493.567170] Call Trace:
[  493.567180]  [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[  493.567184]  [<ffffffff815e4616>] schedule+0x36/0x80
[  493.567187]  [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[  493.567194]  [<ffffffff812c1f2e>] ? number+0x2ce/0x300
[  493.567197]  [<ffffffff812c42be>] ? vsnprintf+0x32e/0x4b0
[  493.567199]  [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[  493.567201]  [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[  493.567216]  [<ffffffffa05284d1>] iscsit_stop_session+0x1b1/0x1c0 [iscsi_target_mod]
[  493.567220]  [<ffffffffa0519db2>] iscsi_check_for_session_reinstatement+0x1e2/0x270 [iscsi_target_mod]
[  493.567225]  [<ffffffffa051c8c0>] iscsi_target_check_for_existing_instances+0x30/0x40 [iscsi_target_mod]
[  493.567230]  [<ffffffffa051c9ee>] iscsi_target_do_login+0x11e/0x610 [iscsi_target_mod]
[  493.567235]  [<ffffffffa051dbbe>] iscsi_target_start_negotiation+0x4e/0xa0 [iscsi_target_mod]
[  493.567239]  [<ffffffffa051b933>] iscsi_target_login_thread+0xa73/0xfc0 [iscsi_target_mod]
[  493.567244]  [<ffffffffa051aec0>] ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
[  493.567246]  [<ffffffff810660c4>] ? SyS_exit_group+0x14/0x20
[  493.567249]  [<ffffffff8107ee5a>] kthread+0xca/0xe0
[  493.567250]  [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[  493.567252]  [<ffffffff815e8752>] ret_from_fork+0x22/0x30
[  493.567254] INFO: task iscsi_trx:4149 blocked for more than 120 seconds.
[  493.567255]       Tainted: G           OE   4.9.0-For-DSGL-Debug+ #1
[  493.567256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  493.567256] iscsi_trx       D    0  4149      2 0x00000084
[  493.567258]  0000000000000000 ffff882f5ef70340 ffff882f76253040 ffff88302de40340
[  493.567259]  ffff88303e097a40 ffffc90000f93c90 ffffffff815e415b ffffffff815e56c2
[  493.567261]  0000000000000001 ffff882f5ef70340 7fffffffffffffff ffff882d233cc9b0
[  493.567262] Call Trace:
[  493.567264]  [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[  493.567265]  [<ffffffff815e56c2>] ? wait_for_completion+0xd2/0xf0
[  493.567267]  [<ffffffff815e4616>] schedule+0x36/0x80
[  493.567268]  [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[  493.567281]  [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[  493.567285]  [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[  493.567286]  [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[  493.567287]  [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[  493.567302]  [<ffffffffa04d5399>] target_wait_for_sess_cmds+0x49/0x1a0 [target_core_mod]
[  493.567306]  [<ffffffffa0471644>] isert_wait_conn+0x1a4/0x2d0 [ib_isert]
[  493.567311]  [<ffffffffa05289ab>] iscsit_close_connection+0x14b/0x820 [iscsi_target_mod]
[  493.567315]  [<ffffffffa05172ef>] iscsit_take_action_for_connection_exit+0x7f/0x100 [iscsi_target_mod]
[  493.567320]  [<ffffffffa0528265>] iscsi_target_rx_thread+0x95/0xa0 [iscsi_target_mod]
[  493.567324]  [<ffffffffa05281d0>] ? iscsi_target_tx_thread+0x1b0/0x1b0 [iscsi_target_mod]
[  493.567325]  [<ffffffff8107ee5a>] kthread+0xca/0xe0
[  493.567327]  [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[  493.567328]  [<ffffffff815e8752>] ret_from_fork+0x22/0x30

-> In order to address this I tried the following (experimental) change on top of your 3 patches:

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 501a0ad124a7..0aead5faa2a2 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1817,16 +1817,16 @@ isert_post_response(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd)
        struct ib_send_wr *wr_failed;
        int ret;

-       ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
+       ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
+                          &wr_failed);
        if (ret) {
-               isert_err("ib_post_recv failed with %d\n", ret);
+               isert_err("ib_post_send failed with %d\n", ret);
                return ret;
        }

-       ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
-                          &wr_failed);
+       ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
        if (ret) {
-               isert_err("ib_post_send failed with %d\n", ret);
+               isert_err("ib_post_recv failed with %d\n", ret);
                return ret;
        }
        return ret;
@@ -2166,6 +2166,13 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
        if (isert_prot_cmd(isert_conn, se_cmd)) {
                isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
                cqe = &isert_cmd->tx_desc.tx_cqe;
+
+               rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+               if (rc) {
+                       isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+                                  isert_cmd, rc);
+                       return rc;
+               }
        } else {
                /*
                 * Build isert_conn->tx_desc for iSCSI response PDU and attach
@@ -2178,18 +2185,21 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
                isert_init_send_wr(isert_conn, isert_cmd,
                                   &isert_cmd->tx_desc.send_wr);

+               chain_wr = &isert_cmd->tx_desc.send_wr;
+
+               rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+               if (rc) {
+                       isert_err("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+                                  isert_cmd, rc);
+                       return rc;
+               }
+
                rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
                if (rc) {
                        isert_err("ib_post_recv failed with %d\n", rc);
                        return rc;
                }
-
-               chain_wr = &isert_cmd->tx_desc.send_wr;
        }
-       rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
-       isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
-                  isert_cmd, rc);
-
        return rc;
 }

-> With these changes it works fine, But I still see few Aborts intermittently as follows:

[10674.434643] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb961138 failed to post RDMA res
[10674.434690] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb9434c0 failed to post RDMA res
[10715.186372] ABORT_TASK: Found referenced iSCSI task_tag: 58
[10715.186417] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 58

-> Apart from the Aborts and post failure logs filling dmesg, luns are in state "running" and 
fio countinues to run.
[root@bharat ~]# iscsiadm -m session -P3 | grep "Attached scsi disk"
                        Attached scsi disk sdj          State: running
                        Attached scsi disk sdc          State: running
                        Attached scsi disk sdd          State: running
                        Attached scsi disk sde          State: running
                        Attached scsi disk sdf          State: running
                        Attached scsi disk sdg          State: running
                        Attached scsi disk sdh          State: running
                        Attached scsi disk sdi          State: running

Thanks,
Bharat.
>    > Hi Steve, Potnuri, & Co,
>    >
>    > On Tue, 2016-10-18 at 09:34 -0500, Steve Wise wrote:
>    > > >
>    > > > > I tried out this change and it works fine with iwarp. I dont see
>    SQ
>    > > > > overflow. Apparently we have increased the sq too big to overflow.
>    I am
>    > > going
>    > > > > to let it run with higher workloads for longer time, to see if it
>    holds
>    > > good.
>    > > >
>    > > > Actually on second thought, this patch is an overkill. Effectively
>    we
>    > > > now set:
>    > > >
>    > > > MAX_CMD=266
>    > > > and max_rdma_ctx=128 so together we take 394 which seems to too
>    much.
>    > > >
>    > > > If we go by the scheme of 1 rdma + 1 send for each IO we need:
>    > > > - 128 sends
>    > > > - 128 rdmas
>    > > > - 10 miscs
>    > > >
>    > > > so this gives 266.
>    > > >
>    > > > Perhaps this is due to the fact that iWARP needs to register memory
>    for
>    > > > rdma reads as well? (and also rdma writes > 128k for chelsio HW
>    right?)
>    > > >
>    > >
>    > > iWARP definitely needs to register memory for the target of reads, due
>    to
>    > > REMOTE_WRITE requirement for the protocol.  The source of a write
>    doesn't need
>    > > to register memory, but the SGE depth can cause multiple WRITE WRs to
>    be
>    > > required to service the IO.  And in theory there should be some
>    threshold where
>    > > it might be better performance-wise to do a memory register + 1 WRITE
>    vs X
>    > > WRITEs.   
>    > >
>    > > As you mentioned, the RW API should account for this, but perhaps it
>    is still
>    > > off some.  Bharat, have a look into the RDMA-RW API and let us see if
>    we can
>    > > figure out if the additional SQ depth it adds is sufficient.
>    > > 
>    > > > What is the workload you are running? with immediatedata enabled you
>    > > > should issue reg+rdma_read+send only for writes > 8k.
>    > > >
>    > > > Does this happen when you run only reads for example?
>    > > >
>    > > > I guess its time to get the sq accounting into shape...
>    > >
>    > > So to sum up - 2 issues:
>    > >
>    > > 1) we believe the iSER + RW API correctly sizes the SQ, yet we're
>    seeing SQ
>    > > overflows.  So the SQ sizing needs more investigation.
>    > >
>    > > 2) if the SQ is full, then the iSER/target code is supposed to
>    resubmit.  And
>    > > apparently that isn't working.
>    > >
>    >
>    > For #2, target-core expects -ENOMEM or -EAGAIN return from fabric driver
>    > callbacks to signal internal queue-full retry logic.  Otherwise, the
>    > extra se_cmd->cmd_kref response SCF_ACK_KREF is leaked until session
>    > shutdown and/or reinstatement occurs.
>    >
>    > AFAICT, Potunri's earlier hung task with v4.8.y + ABORT_TASK is likely
>    > the earlier v4.1+ regression:
>    >
>    >
>    [1]https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
>    >
>    > That said, there is room for improvement in target-core queue-full error
>    > signaling, and iscsi-target/iser-target callback error propagation. 
>    >
>    > Sending out a series shortly to address these particular items.
>    > Please have a look.
>    >
>    Thanks for the changes Nicholas.
>    Testing them now.
>    --
>    To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>    the body of a message to majordomo@vger.kernel.org
>    More majordomo info at  [2]http://vger.kernel.org/majordomo-info.html
> 
> References
> 
>    Visible links
>    1. https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
>    2. http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-03-20 10:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27  7:01 RQ overflow seen running isert traffic Potnuri Bharat Teja
     [not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12   ` Steve Wise
2016-10-05  6:14 ` Sagi Grimberg
2016-10-17 11:16   ` Potnuri Bharat Teja
2016-10-17 18:29     ` Steve Wise
2016-10-18  8:04       ` Sagi Grimberg
2016-10-18 11:28         ` SQ " Potnuri Bharat Teja
2016-10-18 13:17           ` Sagi Grimberg
     [not found]             ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34               ` Steve Wise
2016-10-18 16:13                 ` Jason Gunthorpe
2016-10-18 19:03                   ` Steve Wise
2016-10-20  8:34                   ` Sagi Grimberg
     [not found]                     ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05                       ` Potnuri Bharat Teja
2017-03-20 15:04                         ` Steve Wise
2016-10-31  3:40                 ` Nicholas A. Bellinger
2016-11-02 17:03                   ` Steve Wise
     [not found]                   ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06                     ` Potnuri Bharat Teja
2017-03-20 10:15                       ` Potnuri Bharat Teja [this message]
2017-03-21  6:32                         ` Nicholas A. Bellinger
2017-03-21  7:51                           ` Potnuri Bharat Teja
     [not found]                             ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52                               ` Sagi Grimberg
     [not found]                                 ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25                                   ` [SPAMMY (7.002)]Re: " Potnuri Bharat Teja
     [not found]                                     ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38                                       ` Sagi Grimberg
     [not found]                                         ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50                                           ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170320101526.GA11699@chelsio.com \
    --to=bharat@chelsio.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nab@linux-iscsi.org \
    --cc=sagi@grimberg.me \
    --cc=swise@opengridcomputing.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.