linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Bernard Metzler <BMT@zurich.ibm.com>
Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Potnuri Bharat Teja <bharat@chelsio.com>,
	Nirranjan Kirubaharan <nirranjan@chelsio.com>
Subject: Re: [PATCH for-next] RDMA/siw: fix SQ/RQ drain logic to support ib_drain_qp
Date: Tue, 1 Oct 2019 15:22:27 +0530	[thread overview]
Message-ID: <20191001095224.GA5448@chelsio.com> (raw)
In-Reply-To: <OFFA5BB431.AD96EB3F-ON00258485.0054053B-00258485.0055D206@notes.na.collabserv.com>

On Monday, September 09/30/19, 2019 at 21:07:23 +0530, Bernard Metzler wrote:
> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
> 
> >To: jgg@ziepe.ca, bmt@zurich.ibm.com
> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >Date: 09/28/2019 12:16AM
> >Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
> >nirranjan@chelsio.com, "Krishnamraju Eraparaju"
> ><krishna2@chelsio.com>
> >Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: fix SQ/RQ drain logic
> >to support ib_drain_qp
> >
> >The storage ULPs(iSER & NVMeOF) uses ib_drain_qp() to drain
> >QP/CQ properly. But SIW is currently using it's own routines to
> >drain SQ & RQ, which can't help ULPs to determine the last CQE.
> >Failing to wait until last CQE causes touch after free issues:
> 
> Hi Krishna,
> 
> Before reviewing, please let me fully understand what is
> going on here/why we need that patch.
> 
> Is this issue caused since the ulp expects the ib_drain_xx
> driver method to be blocking until all completions are reaped,
> and siw does not block?
Yes, SIW is currently using provider-specific drain_qp logic,
IE: siw_verbs_rq/sq_flush(), with this logic though all the SQ & RQ
entries are flushed to CQ but ULPs cannot ensure when exactly the
processing of all CQEs for those WRs, posted prior to ib_drain_xx(),
got completed.
Due to this uncertainity, sometimes iSER/NVMeOF driver(assuming all
the CQEs are processed) proceed to release resouces/destroy_qp,
causing touch after free issues.
> 
> Is the ULP expected to call ib_drain_xx only if no other
> work is pending (SQ/RQ/CQ)? If not, shall all previous
> work be completed with FLUSH error?
In error cases(eg: link-down), I see iSER/NVMEOF drivers performing
disconnect/drain_qp though there is some pending work to be processed.
This may be valid due to the ERROR.
And all that pending work gets completed with FLUSH error. 
> 
> Many thanks!
> Bernard.
> 
> 
> 
> 
> 
> >
> >[  +0.001831] general protection fault: 0000 [#1] SMP PTI
> >[  +0.000002] Call Trace:
> >[  +0.000026]  ? __ib_process_cq+0x7a/0xc0 [ib_core]
> >[  +0.000008]  ? ib_poll_handler+0x2c/0x80 [ib_core]
> >[  +0.000005]  ? irq_poll_softirq+0xae/0x110
> >[  +0.000005]  ? __do_softirq+0xda/0x2a8
> >[  +0.000004]  ? run_ksoftirqd+0x26/0x40
> >[  +0.000005]  ? smpboot_thread_fn+0x10e/0x160
> >[  +0.000004]  ? kthread+0xf8/0x130
> >[  +0.000003]  ? sort_range+0x20/0x20
> >[  +0.000003]  ? kthread_bind+0x10/0x10
> >[  +0.000003]  ? ret_from_fork+0x35/0x40
> >
> >Hence, changing the SQ & RQ drain logic to support current IB/core
> >drain semantics, though this drain method does not naturally aligns
> >to iWARP spec(where post_send/recv calls are not allowed in
> >QP ERROR state). More on this was described in below commit:
> >commit 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
> >
> >Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
> >---
> > drivers/infiniband/sw/siw/siw.h       |  3 +-
> > drivers/infiniband/sw/siw/siw_cm.c    |  4 +-
> > drivers/infiniband/sw/siw/siw_main.c  | 20 ---------
> > drivers/infiniband/sw/siw/siw_verbs.c | 60
> >+++++++++++++++++++++++++++
> > 4 files changed, 64 insertions(+), 23 deletions(-)
> >
> >diff --git a/drivers/infiniband/sw/siw/siw.h
> >b/drivers/infiniband/sw/siw/siw.h
> >index dba4535494ab..ad4f078e4587 100644
> >--- a/drivers/infiniband/sw/siw/siw.h
> >+++ b/drivers/infiniband/sw/siw/siw.h
> >@@ -240,7 +240,8 @@ enum siw_qp_flags {
> > 	SIW_RDMA_READ_ENABLED = (1 << 2),
> > 	SIW_SIGNAL_ALL_WR = (1 << 3),
> > 	SIW_MPA_CRC = (1 << 4),
> >-	SIW_QP_IN_DESTROY = (1 << 5)
> >+	SIW_QP_IN_DESTROY = (1 << 5),
> >+	SIW_QP_DRAINED_FINAL = (1 << 6)
> > };
> > 
> > enum siw_qp_attr_mask {
> >diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> >b/drivers/infiniband/sw/siw/siw_cm.c
> >index 8c1931a57f4a..fb830622d32e 100644
> >--- a/drivers/infiniband/sw/siw/siw_cm.c
> >+++ b/drivers/infiniband/sw/siw/siw_cm.c
> >@@ -857,7 +857,7 @@ static int siw_proc_mpareply(struct siw_cep *cep)
> > 	memset(&qp_attrs, 0, sizeof(qp_attrs));
> > 
> > 	if (rep->params.bits & MPA_RR_FLAG_CRC)
> >-		qp_attrs.flags = SIW_MPA_CRC;
> >+		qp_attrs.flags |= SIW_MPA_CRC;
> > 
> > 	qp_attrs.irq_size = cep->ird;
> > 	qp_attrs.orq_size = cep->ord;
> >@@ -1675,7 +1675,7 @@ int siw_accept(struct iw_cm_id *id, struct
> >iw_cm_conn_param *params)
> > 	qp_attrs.irq_size = cep->ird;
> > 	qp_attrs.sk = cep->sock;
> > 	if (cep->mpa.hdr.params.bits & MPA_RR_FLAG_CRC)
> >-		qp_attrs.flags = SIW_MPA_CRC;
> >+		qp_attrs.flags |= SIW_MPA_CRC;
> > 	qp_attrs.state = SIW_QP_STATE_RTS;
> > 
> > 	siw_dbg_cep(cep, "[QP%u]: moving to rts\n", qp_id(qp));
> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
> >b/drivers/infiniband/sw/siw/siw_main.c
> >index 05a92f997f60..fb01407a310f 100644
> >--- a/drivers/infiniband/sw/siw/siw_main.c
> >+++ b/drivers/infiniband/sw/siw/siw_main.c
> >@@ -248,24 +248,6 @@ static struct ib_qp *siw_get_base_qp(struct
> >ib_device *base_dev, int id)
> > 	return NULL;
> > }
> > 
> >-static void siw_verbs_sq_flush(struct ib_qp *base_qp)
> >-{
> >-	struct siw_qp *qp = to_siw_qp(base_qp);
> >-
> >-	down_write(&qp->state_lock);
> >-	siw_sq_flush(qp);
> >-	up_write(&qp->state_lock);
> >-}
> >-
> >-static void siw_verbs_rq_flush(struct ib_qp *base_qp)
> >-{
> >-	struct siw_qp *qp = to_siw_qp(base_qp);
> >-
> >-	down_write(&qp->state_lock);
> >-	siw_rq_flush(qp);
> >-	up_write(&qp->state_lock);
> >-}
> >-
> > static const struct ib_device_ops siw_device_ops = {
> > 	.owner = THIS_MODULE,
> > 	.uverbs_abi_ver = SIW_ABI_VERSION,
> >@@ -284,8 +266,6 @@ static const struct ib_device_ops siw_device_ops
> >= {
> > 	.destroy_cq = siw_destroy_cq,
> > 	.destroy_qp = siw_destroy_qp,
> > 	.destroy_srq = siw_destroy_srq,
> >-	.drain_rq = siw_verbs_rq_flush,
> >-	.drain_sq = siw_verbs_sq_flush,
> > 	.get_dma_mr = siw_get_dma_mr,
> > 	.get_port_immutable = siw_get_port_immutable,
> > 	.iw_accept = siw_accept,
> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
> >b/drivers/infiniband/sw/siw/siw_verbs.c
> >index 869e02b69a01..5dd62946a649 100644
> >--- a/drivers/infiniband/sw/siw/siw_verbs.c
> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
> >@@ -596,6 +596,13 @@ int siw_verbs_modify_qp(struct ib_qp *base_qp,
> >struct ib_qp_attr *attr,
> > 
> > 	rv = siw_qp_modify(qp, &new_attrs, siw_attr_mask);
> > 
> >+	/* QP state ERROR here ensures that all the SQ & RQ entries got
> >drained
> >+	 * completely. And henceforth, no more entries will be added to the
> >CQ,
> >+	 * exception is special drain CQEs via ib_drain_qp().
> >+	 */
> >+	if (qp->attrs.state == SIW_QP_STATE_ERROR)
> >+		qp->attrs.flags |= SIW_QP_DRAINED_FINAL;
> >+
> > 	up_write(&qp->state_lock);
> > out:
> > 	return rv;
> >@@ -687,6 +694,44 @@ static int siw_copy_inline_sgl(const struct
> >ib_send_wr *core_wr,
> > 	return bytes;
> > }
> > 
> >+/* SQ final completion routine to support ib_drain_sp(). */
> >+int siw_sq_final_comp(struct siw_qp *qp, const struct ib_send_wr
> >*wr,
> >+		      const struct ib_send_wr **bad_wr)
> >+{
> >+	struct siw_sqe sqe = {};
> >+	int rv = 0;
> >+
> >+	while (wr) {
> >+		sqe.id = wr->wr_id;
> >+		sqe.opcode = wr->opcode;
> >+		rv = siw_sqe_complete(qp, &sqe, 0, SIW_WC_WR_FLUSH_ERR);
> >+		if (rv) {
> >+			*bad_wr = wr;
> >+			break;
> >+		}
> >+		wr = wr->next;
> >+	}
> >+	return rv;
> >+}
> >+
> >+/* RQ final completion routine to support ib_drain_rp(). */
> >+int siw_rq_final_comp(struct siw_qp *qp, const struct ib_recv_wr
> >*wr,
> >+		      const struct ib_recv_wr **bad_wr)
> >+{
> >+	struct siw_rqe rqe = {};
> >+	int rv = 0;
> >+
> >+	while (wr) {
> >+		rqe.id = wr->wr_id;
> >+		rv = siw_rqe_complete(qp, &rqe, 0, 0, SIW_WC_WR_FLUSH_ERR);
> >+		if (rv) {
> >+			*bad_wr = wr;
> >+			break;
> >+		}
> >+		wr = wr->next;
> >+	}
> >+	return rv;
> >+}
> > /*
> >  * siw_post_send()
> >  *
> >@@ -705,6 +750,15 @@ int siw_post_send(struct ib_qp *base_qp, const
> >struct ib_send_wr *wr,
> > 	unsigned long flags;
> > 	int rv = 0;
> > 
> >+	/* Currently there is no way to distinguish between special drain
> >+	 * WRs and normal WRs(?), so we do FLUSH_ERR for all the WRs
> >that've
> >+	 * arrived in the ERROR/SIW_QP_DRAINED_FINAL state, assuming we get
> >+	 * only special drain WRs in this state via ib_drain_sq().
> >+	 */
> >+	if (qp->attrs.flags & SIW_QP_DRAINED_FINAL) {
> >+		rv = siw_sq_final_comp(qp, wr, bad_wr);
> >+		return rv;
> >+	}
> > 	/*
> > 	 * Try to acquire QP state lock. Must be non-blocking
> > 	 * to accommodate kernel clients needs.
> >@@ -919,6 +973,12 @@ int siw_post_receive(struct ib_qp *base_qp,
> >const struct ib_recv_wr *wr,
> > 		*bad_wr = wr;
> > 		return -EOPNOTSUPP; /* what else from errno.h? */
> > 	}
> >+
> >+	if (qp->attrs.flags & SIW_QP_DRAINED_FINAL) {
> >+		rv = siw_rq_final_comp(qp, wr, bad_wr);
> >+		return rv;
> >+	}
> >+
> > 	/*
> > 	 * Try to acquire QP state lock. Must be non-blocking
> > 	 * to accommodate kernel clients needs.
> >-- 
> >2.23.0.rc0
> >
> >
> 

  reply	other threads:[~2019-10-01  9:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27 22:15 [PATCH for-next] RDMA/siw: fix SQ/RQ drain logic to support ib_drain_qp Krishnamraju Eraparaju
2019-09-30 15:37 ` Bernard Metzler
2019-10-01  9:52   ` Krishnamraju Eraparaju [this message]
2019-10-01 15:56   ` Bernard Metzler
2019-10-01 17:45     ` Krishnamraju Eraparaju
2019-10-02 11:27     ` Bernard Metzler
2019-10-03 10:51       ` Krishnamraju Eraparaju
2019-10-03 14:03       ` Bernard Metzler
2019-10-04  4:57         ` Krishnamraju Eraparaju
2019-10-04 13:47         ` Bernard Metzler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191001095224.GA5448@chelsio.com \
    --to=krishna2@chelsio.com \
    --cc=BMT@zurich.ibm.com \
    --cc=bharat@chelsio.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nirranjan@chelsio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).