All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
To: Potnuri Bharat Teja
	<bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>,
	"Nicholas A. Bellinger"
	<nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
Cc: SWise OGC
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	"target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: SQ overflow seen running isert traffic
Date: Tue, 21 Mar 2017 15:52:30 +0200	[thread overview]
Message-ID: <945e2947-f67a-4202-cd27-d4631fe10f68@grimberg.me> (raw)
In-Reply-To: <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Hi Baharat and Nic,

Apologies for the late reply,

> Hi Nicholas,
> I see them from 2MB onwards.
>>
>>    > Here is what I see with the 3 patches alone applied:
>>    >
>>    > -> In isert_put_datain() and isert_post_response() a corresponding recv
>>    WR is posted before
>>    > posting a send and hence for every post failure a recv is already posted
>>    into a tightly packed
>>    > RQ, causing it to overflow.
>>
>>    Just for me to understand, the intermittent TMR ABORT_TASKs are caused
>>    by the repeated failure to post RDMA_WRITE WRs for a large ISER Data-In
>>    payload, due to mis-sizing of needed WRs from RDMA R/W API vs.
>>    underlying hardware capabilities.
> Yes.
>>
>>    Moving the recv posts after the send post for RDMA_WRITEs helps to
>>    reduce the severity of the issue with iw_cxgb4, but doesn't completely
>>    eliminate the issue under load.
> Moving recv posts only comes in to effect along with your changes.

...

>>    So the reason why your patch to swap post_recv -> post_send to post_send
>>    -> post_recv makes a difference is because it allows enough trickle of
>>    RDMA_WRITEs to make it through, where iser-initiator doesn't attempt to
>>    escalate recovery and doesn't attempt session reinstatement.
> I dont exactly know if above thing comes into play but the actual reason I did
> swap posting RQ and SQ is, unlike SQ, RQ is posted with WRs to the brim during
> the intialisation itself. From thereon we post a RQ WR for every RQ completion
> That makes it almost full at any point of time.
>
> Now in our scenario, SQ is miscalulated and too small for few adapters and so
> filled gradually as the IO starts. Once SQ is full, according to your patches
> isert queues it and tries to repost the command again. Here in iser functions
> like isert_post_response(), isert_put_datain() post send is done after post recv.
> For the first post send failure in say isert_put_datain(), the corresponding
> post recv is already posted, then on queuing the command and trying reposting
> an extra recv is again posted which fills up the RQ also.
>
>  By swapping post recv and send as in my incermental patch, we dont post that
> extra recv, and post recv only on successful post send.
> Therfore I think this incremental patch is necessary.

Reversing the order to recv and send posting will cause problems
in stress IO workloads (especially for iWARP). The problem of sending
a reply before reposting the recv buffer is that the initiator can send
immediately a new request and we don't have a recv buffer waiting for
it, which will cause RNR-NAK. This *will* cause performance drops and
jitters for sure.

How about we just track the rx_desc to know if we already posted it as
a start (untested as I don't have access to RDMA HW this week):
--
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 9b33c0c97468..fcbed35e95a8 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -817,6 +817,7 @@ isert_post_recvm(struct isert_conn *isert_conn, u32 
count)
                 rx_wr->sg_list = &rx_desc->rx_sg;
                 rx_wr->num_sge = 1;
                 rx_wr->next = rx_wr + 1;
+               rx_desc->in_use = false;
         }
         rx_wr--;
         rx_wr->next = NULL; /* mark end of work requests list */
@@ -835,6 +836,15 @@ isert_post_recv(struct isert_conn *isert_conn, 
struct iser_rx_desc *rx_desc)
         struct ib_recv_wr *rx_wr_failed, rx_wr;
         int ret;

+       if (!rx_desc->in_use) {
+               /*
+                * if the descriptor is not in-use we already reposted it
+                * for recv, so just silently return
+                */
+               return 0;
+       }
+
+       rx_desc->in_use = false;
         rx_wr.wr_cqe = &rx_desc->rx_cqe;
         rx_wr.sg_list = &rx_desc->rx_sg;
         rx_wr.num_sge = 1;
@@ -1397,6 +1407,8 @@ isert_recv_done(struct ib_cq *cq, struct ib_wc *wc)
                 return;
         }

+       rx_desc->in_use = true;
+
         ib_dma_sync_single_for_cpu(ib_dev, rx_desc->dma_addr,
                         ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE);

diff --git a/drivers/infiniband/ulp/isert/ib_isert.h 
b/drivers/infiniband/ulp/isert/ib_isert.h
index c02ada57d7f5..87d994de8c91 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -60,7 +60,7 @@

  #define ISER_RX_PAD_SIZE       (ISCSI_DEF_MAX_RECV_SEG_LEN + 4096 - \
                 (ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct 
ib_sge) + \
-                sizeof(struct ib_cqe)))
+                sizeof(struct ib_cqe) + sizeof(bool)))

  #define ISCSI_ISER_SG_TABLESIZE                256

@@ -85,6 +85,7 @@ struct iser_rx_desc {
         u64             dma_addr;
         struct ib_sge   rx_sg;
         struct ib_cqe   rx_cqe;
+       bool            in_use;
         char            pad[ISER_RX_PAD_SIZE];
  } __packed;
--

We have a lot of room for cleanups in isert... I'll need to
make some time to get it going...

I'll be waiting to hear from you if it makes your issue go away.

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-03-21 13:52 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27  7:01 RQ " Potnuri Bharat Teja
     [not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12   ` Steve Wise
2016-10-05  6:14 ` Sagi Grimberg
2016-10-17 11:16   ` Potnuri Bharat Teja
2016-10-17 18:29     ` Steve Wise
2016-10-18  8:04       ` Sagi Grimberg
2016-10-18 11:28         ` SQ " Potnuri Bharat Teja
2016-10-18 13:17           ` Sagi Grimberg
     [not found]             ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34               ` Steve Wise
2016-10-18 16:13                 ` Jason Gunthorpe
2016-10-18 19:03                   ` Steve Wise
2016-10-20  8:34                   ` Sagi Grimberg
     [not found]                     ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05                       ` Potnuri Bharat Teja
2017-03-20 15:04                         ` Steve Wise
2016-10-31  3:40                 ` Nicholas A. Bellinger
2016-11-02 17:03                   ` Steve Wise
     [not found]                   ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06                     ` Potnuri Bharat Teja
2017-03-20 10:15                       ` Potnuri Bharat Teja
2017-03-21  6:32                         ` Nicholas A. Bellinger
2017-03-21  7:51                           ` Potnuri Bharat Teja
     [not found]                             ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52                               ` Sagi Grimberg [this message]
     [not found]                                 ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25                                   ` [SPAMMY (7.002)]Re: " Potnuri Bharat Teja
     [not found]                                     ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38                                       ` Sagi Grimberg
     [not found]                                         ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50                                           ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=945e2947-f67a-4202-cd27-d4631fe10f68@grimberg.me \
    --to=sagi-nqwnxtmzq1alnmji0ikvqw@public.gmane.org \
    --cc=bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    --cc=target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --subject='Re: SQ overflow seen running isert traffic' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.