All of lore.kernel.org
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
To: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: "Nicholas A. Bellinger"
	<nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>,
	SWise OGC
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	"target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [SPAMMY (7.002)]Re: SQ overflow seen running isert traffic
Date: Tue, 21 Mar 2017 20:55:07 +0530	[thread overview]
Message-ID: <20170321152506.GA32655@chelsio.com> (raw)
In-Reply-To: <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

On Tuesday, March 03/21/17, 2017 at 19:22:30 +0530, Sagi Grimberg wrote:
Hi Sagi,
The patch works fine!
>    Hi Baharat and Nic,
> 
>    Apologies for the late reply,
> 
>    > Hi Nicholas,
>    > I see them from 2MB onwards.
>    >>
>    >>    > Here is what I see with the 3 patches alone applied:
>    >>    >
>    >>    > -> In isert_put_datain() and isert_post_response() a corresponding
>    recv
>    >>    WR is posted before
>    >>    > posting a send and hence for every post failure a recv is already
>    posted
>    >>    into a tightly packed
>    >>    > RQ, causing it to overflow.
>    >>
>    >>    Just for me to understand, the intermittent TMR ABORT_TASKs are
>    caused
>    >>    by the repeated failure to post RDMA_WRITE WRs for a large ISER
>    Data-In
>    >>    payload, due to mis-sizing of needed WRs from RDMA R/W API vs.
>    >>    underlying hardware capabilities.
>    > Yes.
>    >>
>    >>    Moving the recv posts after the send post for RDMA_WRITEs helps to
>    >>    reduce the severity of the issue with iw_cxgb4, but doesn't
>    completely
>    >>    eliminate the issue under load.
>    > Moving recv posts only comes in to effect along with your changes.
> 
>    ...
> 
>    >>    So the reason why your patch to swap post_recv -> post_send to
>    post_send
>    >>    -> post_recv makes a difference is because it allows enough trickle
>    of
>    >>    RDMA_WRITEs to make it through, where iser-initiator doesn't attempt
>    to
>    >>    escalate recovery and doesn't attempt session reinstatement.
>    > I dont exactly know if above thing comes into play but the actual reason
>    I did
>    > swap posting RQ and SQ is, unlike SQ, RQ is posted with WRs to the brim
>    during
>    > the intialisation itself. From thereon we post a RQ WR for every RQ
>    completion
>    > That makes it almost full at any point of time.
>    >
>    > Now in our scenario, SQ is miscalulated and too small for few adapters
>    and so
>    > filled gradually as the IO starts. Once SQ is full, according to your
>    patches
>    > isert queues it and tries to repost the command again. Here in iser
>    functions
>    > like isert_post_response(), isert_put_datain() post send is done after
>    post recv.
>    > For the first post send failure in say isert_put_datain(), the
>    corresponding
>    > post recv is already posted, then on queuing the command and trying
>    reposting
>    > an extra recv is again posted which fills up the RQ also.
>    >
>    >  By swapping post recv and send as in my incermental patch, we dont post
>    that
>    > extra recv, and post recv only on successful post send.
>    > Therfore I think this incremental patch is necessary.
> 
>    Reversing the order to recv and send posting will cause problems
>    in stress IO workloads (especially for iWARP). The problem of sending
>    a reply before reposting the recv buffer is that the initiator can send
>    immediately a new request and we don't have a recv buffer waiting for
>    it, which will cause RNR-NAK. This *will* cause performance drops and
>    jitters for sure.
Totally agree with you.
> 
>    How about we just track the rx_desc to know if we already posted it as
>    a start (untested as I don't have access to RDMA HW this week):
>    --
>    diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
>    b/drivers/infiniband/ulp/isert/ib_isert.c
>    index 9b33c0c97468..fcbed35e95a8 100644
>    --- a/drivers/infiniband/ulp/isert/ib_isert.c
>    +++ b/drivers/infiniband/ulp/isert/ib_isert.c
>    @@ -817,6 +817,7 @@ isert_post_recvm(struct isert_conn *isert_conn, u32
>    count)
>                     rx_wr->sg_list = &rx_desc->rx_sg;
>                     rx_wr->num_sge = 1;
>                     rx_wr->next = rx_wr + 1;
>    +               rx_desc->in_use = false;
>             }
>             rx_wr--;
>             rx_wr->next = NULL; /* mark end of work requests list */
>    @@ -835,6 +836,15 @@ isert_post_recv(struct isert_conn *isert_conn,
>    struct iser_rx_desc *rx_desc)
>             struct ib_recv_wr *rx_wr_failed, rx_wr;
>             int ret;
> 
>    +       if (!rx_desc->in_use) {
>    +               /*
>    +                * if the descriptor is not in-use we already reposted it
>    +                * for recv, so just silently return
>    +                */
>    +               return 0;
>    +       }
>    +
>    +       rx_desc->in_use = false;
>             rx_wr.wr_cqe = &rx_desc->rx_cqe;
>             rx_wr.sg_list = &rx_desc->rx_sg;
>             rx_wr.num_sge = 1;
>    @@ -1397,6 +1407,8 @@ isert_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>                     return;
>             }
> 
>    +       rx_desc->in_use = true;
>    +
>             ib_dma_sync_single_for_cpu(ib_dev, rx_desc->dma_addr,
>                             ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE);
> 
>    diff --git a/drivers/infiniband/ulp/isert/ib_isert.h
>    b/drivers/infiniband/ulp/isert/ib_isert.h
>    index c02ada57d7f5..87d994de8c91 100644
>    --- a/drivers/infiniband/ulp/isert/ib_isert.h
>    +++ b/drivers/infiniband/ulp/isert/ib_isert.h
>    @@ -60,7 +60,7 @@
> 
>      #define ISER_RX_PAD_SIZE       (ISCSI_DEF_MAX_RECV_SEG_LEN + 4096 - \
>                     (ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct
>    ib_sge) + \
>    -                sizeof(struct ib_cqe)))
>    +                sizeof(struct ib_cqe) + sizeof(bool)))
> 
>      #define ISCSI_ISER_SG_TABLESIZE                256
> 
>    @@ -85,6 +85,7 @@ struct iser_rx_desc {
>             u64             dma_addr;
>             struct ib_sge   rx_sg;
>             struct ib_cqe   rx_cqe;
>    +       bool            in_use;
>             char            pad[ISER_RX_PAD_SIZE];
>      } __packed;
>    --
> 
>    We have a lot of room for cleanups in isert... I'll need to
>    make some time to get it going...
> 
>    I'll be waiting to hear from you if it makes your issue go away.
Test runs fine and it solved the issue.
Thanks for the patch!
> 
>    Cheers,
>    Sagi.
Thanks,
Bharat.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-03-21 15:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27  7:01 RQ overflow seen running isert traffic Potnuri Bharat Teja
     [not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12   ` Steve Wise
2016-10-05  6:14 ` Sagi Grimberg
2016-10-17 11:16   ` Potnuri Bharat Teja
2016-10-17 18:29     ` Steve Wise
2016-10-18  8:04       ` Sagi Grimberg
2016-10-18 11:28         ` SQ " Potnuri Bharat Teja
2016-10-18 13:17           ` Sagi Grimberg
     [not found]             ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34               ` Steve Wise
2016-10-18 16:13                 ` Jason Gunthorpe
2016-10-18 19:03                   ` Steve Wise
2016-10-20  8:34                   ` Sagi Grimberg
     [not found]                     ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05                       ` Potnuri Bharat Teja
2017-03-20 15:04                         ` Steve Wise
2016-10-31  3:40                 ` Nicholas A. Bellinger
2016-11-02 17:03                   ` Steve Wise
     [not found]                   ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06                     ` Potnuri Bharat Teja
2017-03-20 10:15                       ` Potnuri Bharat Teja
2017-03-21  6:32                         ` Nicholas A. Bellinger
2017-03-21  7:51                           ` Potnuri Bharat Teja
     [not found]                             ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52                               ` Sagi Grimberg
     [not found]                                 ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25                                   ` Potnuri Bharat Teja [this message]
     [not found]                                     ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38                                       ` [SPAMMY (7.002)]Re: " Sagi Grimberg
     [not found]                                         ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50                                           ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170321152506.GA32655@chelsio.com \
    --to=bharat-ut6up61k2wzbdgjk7y7tuq@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org \
    --cc=sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    --cc=target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.