* [PATCH] nvmet: release the sq ref on rdma read errors
[not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com>
@ 2017-05-03 19:51 ` Vijay Immanuel
2017-05-03 20:23 ` Sagi Grimberg
0 siblings, 1 reply; 9+ messages in thread
From: Vijay Immanuel @ 2017-05-03 19:51 UTC (permalink / raw)
On rdma read errors, complete the req and release the sq ref that
was taken when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.
Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>
---
drivers/nvme/target/rdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe8..cf98f0b 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -567,7 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
rsp->n_rdma = 0;
if (unlikely(wc->status != IB_WC_SUCCESS)) {
- nvmet_rdma_release_rsp(rsp);
+ nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR);
if (wc->status != IB_WC_WR_FLUSH_ERR) {
pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n",
wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-03 19:51 ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel
@ 2017-05-03 20:23 ` Sagi Grimberg
2017-05-04 5:50 ` Sagi Grimberg
0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-03 20:23 UTC (permalink / raw)
Nice catch Vijay!
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-03 20:23 ` Sagi Grimberg
@ 2017-05-04 5:50 ` Sagi Grimberg
2017-05-04 8:56 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-04 5:50 UTC (permalink / raw)
> Nice catch Vijay!
>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Wait... let me take that back.
While it is true that we need to drop the reference on
the nvmet_sq, there is no point in queuing a response
message because the rdma qp is in error state and the response
will never make it to the host.
Moreover, posting a send (and a recv) on a qp in error state can
potentially give us a flush completion after we drained the qp which can
trigger a use-after-free condition. We rely on ib_drain_qp to
guarantee that we'll never see more completions for this queue
and we can safely free the resources.
I think we should explicitly drop the sq reference and release
the rsp and avoid triggering the TX path, and provide a detailed
comment on why we are doing this. Maybe a nicer way to do this,
is to introduce a nvme_req_uninit() that would take care of
it in the right layer (that is nvmet core).
CC'ing Steve and Christoph for their thoughts...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-04 5:50 ` Sagi Grimberg
@ 2017-05-04 8:56 ` Christoph Hellwig
2017-05-04 10:46 ` Sagi Grimberg
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-04 8:56 UTC (permalink / raw)
On Thu, May 04, 2017@08:50:56AM +0300, Sagi Grimberg wrote:
> While it is true that we need to drop the reference on
> the nvmet_sq, there is no point in queuing a response
> message because the rdma qp is in error state and the response
> will never make it to the host.
Yeah.
> I think we should explicitly drop the sq reference and release
> the rsp and avoid triggering the TX path, and provide a detailed
> comment on why we are doing this. Maybe a nicer way to do this,
> is to introduce a nvme_req_uninit() that would take care of
> it in the right layer (that is nvmet core).
Agreed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-04 8:56 ` Christoph Hellwig
@ 2017-05-04 10:46 ` Sagi Grimberg
2017-05-04 22:51 ` Vijay Immanuel
0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-04 10:46 UTC (permalink / raw)
>> While it is true that we need to drop the reference on
>> the nvmet_sq, there is no point in queuing a response
>> message because the rdma qp is in error state and the response
>> will never make it to the host.
>
> Yeah.
>
>> I think we should explicitly drop the sq reference and release
>> the rsp and avoid triggering the TX path, and provide a detailed
>> comment on why we are doing this. Maybe a nicer way to do this,
>> is to introduce a nvme_req_uninit() that would take care of
>> it in the right layer (that is nvmet core).
>
> Agreed.
Vijay, care to respin the fix?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-04 10:46 ` Sagi Grimberg
@ 2017-05-04 22:51 ` Vijay Immanuel
2017-05-06 20:31 ` Sagi Grimberg
2017-05-10 16:55 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Vijay Immanuel @ 2017-05-04 22:51 UTC (permalink / raw)
Thanks for the feedback. Here's an updated patch.
----
On rdma read errors, release the sq ref that was taken
when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.
Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>
---
drivers/nvme/target/core.c | 6 ++++++
drivers/nvme/target/nvmet.h | 1 +
drivers/nvme/target/rdma.c | 1 +
3 files changed, 8 insertions(+)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 798653b..fcb2906 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -529,6 +529,12 @@ bool nvmet_req_init(struct nvmet_req *req, struct
nvmet_cq *cq,
}
EXPORT_SYMBOL_GPL(nvmet_req_init);
+void nvmet_req_uninit(struct nvmet_req *req)
+{
+ percpu_ref_put(&req->sq->ref);
+}
+EXPORT_SYMBOL_GPL(nvmet_req_uninit);
+
static inline bool nvmet_cc_en(u32 cc)
{
return cc & 0x1;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f7ff15f..dae9ed6 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -261,6 +261,7 @@ struct nvmet_async_event {
bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops);
+void nvmet_req_uninit(struct nvmet_req *req);
void nvmet_req_complete(struct nvmet_req *req, u16 status);
void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe8..8fc245d 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -567,6 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq,
struct ib_wc *wc)
rsp->n_rdma = 0;
if (unlikely(wc->status != IB_WC_SUCCESS)) {
+ nvmet_req_uninit(&rsp->req);
nvmet_rdma_release_rsp(rsp);
if (wc->status != IB_WC_WR_FLUSH_ERR) {
pr_info("RDMA READ for CQE 0x%p failed with status
%s (%d).\n",
--
1.8.3.1
-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me]
Sent: Thursday, May 4, 2017 3:46 AM
To: Christoph Hellwig <hch at lst.de>; Vijay Immanuel
<vijayi at attalasystems.com>
Cc: linux-nvme at lists.infradead.org; Steve Wise <swise at chelsio.com>
Subject: Re: [PATCH] nvmet: release the sq ref on rdma read errors
>> While it is true that we need to drop the reference on the nvmet_sq,
>> there is no point in queuing a response message because the rdma qp
>> is in error state and the response will never make it to the host.
>
> Yeah.
>
>> I think we should explicitly drop the sq reference and release the
>> rsp and avoid triggering the TX path, and provide a detailed comment
>> on why we are doing this. Maybe a nicer way to do this, is to
>> introduce a nvme_req_uninit() that would take care of it in the right
>> layer (that is nvmet core).
>
> Agreed.
Vijay, care to respin the fix?
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-04 22:51 ` Vijay Immanuel
@ 2017-05-06 20:31 ` Sagi Grimberg
2017-05-10 16:55 ` Christoph Hellwig
2017-05-10 16:55 ` Christoph Hellwig
1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-06 20:31 UTC (permalink / raw)
Vijay,
> Thanks for the feedback. Here's an updated patch.
Can you please resubmit it as a proper patch? it'd make
our lives much easier.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-06 20:31 ` Sagi Grimberg
@ 2017-05-10 16:55 ` Christoph Hellwig
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw)
On Sat, May 06, 2017@11:31:38PM +0300, Sagi Grimberg wrote:
> Vijay,
>
> > Thanks for the feedback. Here's an updated patch.
>
> Can you please resubmit it as a proper patch? it'd make
> our lives much easier.
What's the problem with it? Except for the ---- instead of the usual
--- it looks like the normal way to submit a patch inside a mail
with other content.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors
2017-05-04 22:51 ` Vijay Immanuel
2017-05-06 20:31 ` Sagi Grimberg
@ 2017-05-10 16:55 ` Christoph Hellwig
1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw)
On Thu, May 04, 2017@03:51:09PM -0700, Vijay Immanuel wrote:
> Thanks for the feedback. Here's an updated patch.
> ----
>
> On rdma read errors, release the sq ref that was taken
> when the req was initialized. This avoids a hang in
> nvmet_sq_destroy() when the queue is being freed.
>
> Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>
Looks good:
Reviewed-by: Christoph Hellwig <hch at lst.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-05-10 16:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAN9s3O597CB9d2BoPx1gL5a7NnqhxrXYMhE==Q3E8+BoDxRFiQ@mail.gmail.com>
[not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com>
2017-05-03 19:51 ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel
2017-05-03 20:23 ` Sagi Grimberg
2017-05-04 5:50 ` Sagi Grimberg
2017-05-04 8:56 ` Christoph Hellwig
2017-05-04 10:46 ` Sagi Grimberg
2017-05-04 22:51 ` Vijay Immanuel
2017-05-06 20:31 ` Sagi Grimberg
2017-05-10 16:55 ` Christoph Hellwig
2017-05-10 16:55 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.