All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvmet: release the sq ref on rdma read errors
       [not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com>
@ 2017-05-03 19:51   ` Vijay Immanuel
  2017-05-03 20:23     ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Vijay Immanuel @ 2017-05-03 19:51 UTC (permalink / raw)


On rdma read errors, complete the req and release the sq ref that
was taken when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.

Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>
---
 drivers/nvme/target/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe8..cf98f0b 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -567,7 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
        rsp->n_rdma = 0;

        if (unlikely(wc->status != IB_WC_SUCCESS)) {
-               nvmet_rdma_release_rsp(rsp);
+               nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR);
                if (wc->status != IB_WC_WR_FLUSH_ERR) {
                        pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n",
                                wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status);
--
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-03 19:51   ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel
@ 2017-05-03 20:23     ` Sagi Grimberg
  2017-05-04  5:50       ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-03 20:23 UTC (permalink / raw)


Nice catch Vijay!

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-03 20:23     ` Sagi Grimberg
@ 2017-05-04  5:50       ` Sagi Grimberg
  2017-05-04  8:56         ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-04  5:50 UTC (permalink / raw)



> Nice catch Vijay!
>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>


Wait... let me take that back.

While it is true that we need to drop the reference on
the nvmet_sq, there is no point in queuing a response
message because the rdma qp is in error state and the response
will never make it to the host.

Moreover, posting a send (and a recv) on a qp in error state can
potentially give us a flush completion after we drained the qp which can
trigger a use-after-free condition. We rely on ib_drain_qp to
guarantee that we'll never see more completions for this queue
and we can safely free the resources.

I think we should explicitly drop the sq reference and release
the rsp and avoid triggering the TX path, and provide a detailed
comment on why we are doing this. Maybe a nicer way to do this,
is to introduce a nvme_req_uninit() that would take care of
it in the right layer (that is nvmet core).

CC'ing Steve and Christoph for their thoughts...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-04  5:50       ` Sagi Grimberg
@ 2017-05-04  8:56         ` Christoph Hellwig
  2017-05-04 10:46           ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-04  8:56 UTC (permalink / raw)


On Thu, May 04, 2017@08:50:56AM +0300, Sagi Grimberg wrote:
> While it is true that we need to drop the reference on
> the nvmet_sq, there is no point in queuing a response
> message because the rdma qp is in error state and the response
> will never make it to the host.

Yeah.

> I think we should explicitly drop the sq reference and release
> the rsp and avoid triggering the TX path, and provide a detailed
> comment on why we are doing this. Maybe a nicer way to do this,
> is to introduce a nvme_req_uninit() that would take care of
> it in the right layer (that is nvmet core).

Agreed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-04  8:56         ` Christoph Hellwig
@ 2017-05-04 10:46           ` Sagi Grimberg
  2017-05-04 22:51             ` Vijay Immanuel
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-04 10:46 UTC (permalink / raw)



>> While it is true that we need to drop the reference on
>> the nvmet_sq, there is no point in queuing a response
>> message because the rdma qp is in error state and the response
>> will never make it to the host.
>
> Yeah.
>
>> I think we should explicitly drop the sq reference and release
>> the rsp and avoid triggering the TX path, and provide a detailed
>> comment on why we are doing this. Maybe a nicer way to do this,
>> is to introduce a nvme_req_uninit() that would take care of
>> it in the right layer (that is nvmet core).
>
> Agreed.

Vijay, care to respin the fix?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-04 10:46           ` Sagi Grimberg
@ 2017-05-04 22:51             ` Vijay Immanuel
  2017-05-06 20:31               ` Sagi Grimberg
  2017-05-10 16:55               ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Vijay Immanuel @ 2017-05-04 22:51 UTC (permalink / raw)


Thanks for the feedback. Here's an updated patch.
----

On rdma read errors, release the sq ref that was taken
when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.

Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>
---
 drivers/nvme/target/core.c  | 6 ++++++
 drivers/nvme/target/nvmet.h | 1 +
 drivers/nvme/target/rdma.c  | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 798653b..fcb2906 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -529,6 +529,12 @@ bool nvmet_req_init(struct nvmet_req *req, struct
nvmet_cq *cq,
 }
 EXPORT_SYMBOL_GPL(nvmet_req_init);

+void nvmet_req_uninit(struct nvmet_req *req)
+{
+       percpu_ref_put(&req->sq->ref);
+}
+EXPORT_SYMBOL_GPL(nvmet_req_uninit);
+
 static inline bool nvmet_cc_en(u32 cc)
 {
        return cc & 0x1;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f7ff15f..dae9ed6 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -261,6 +261,7 @@ struct nvmet_async_event {

 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
                struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops);
+void nvmet_req_uninit(struct nvmet_req *req);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);

 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe8..8fc245d 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -567,6 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq,
struct ib_wc *wc)
        rsp->n_rdma = 0;

        if (unlikely(wc->status != IB_WC_SUCCESS)) {
+               nvmet_req_uninit(&rsp->req);
                nvmet_rdma_release_rsp(rsp);
                if (wc->status != IB_WC_WR_FLUSH_ERR) {
                        pr_info("RDMA READ for CQE 0x%p failed with status
%s (%d).\n",
--
1.8.3.1


-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me] 
Sent: Thursday, May 4, 2017 3:46 AM
To: Christoph Hellwig <hch at lst.de>; Vijay Immanuel
<vijayi at attalasystems.com>
Cc: linux-nvme at lists.infradead.org; Steve Wise <swise at chelsio.com>
Subject: Re: [PATCH] nvmet: release the sq ref on rdma read errors


>> While it is true that we need to drop the reference on the nvmet_sq, 
>> there is no point in queuing a response message because the rdma qp 
>> is in error state and the response will never make it to the host.
>
> Yeah.
>
>> I think we should explicitly drop the sq reference and release the 
>> rsp and avoid triggering the TX path, and provide a detailed comment 
>> on why we are doing this. Maybe a nicer way to do this, is to 
>> introduce a nvme_req_uninit() that would take care of it in the right 
>> layer (that is nvmet core).
>
> Agreed.

Vijay, care to respin the fix?

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-04 22:51             ` Vijay Immanuel
@ 2017-05-06 20:31               ` Sagi Grimberg
  2017-05-10 16:55                 ` Christoph Hellwig
  2017-05-10 16:55               ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2017-05-06 20:31 UTC (permalink / raw)


Vijay,

> Thanks for the feedback. Here's an updated patch.

Can you please resubmit it as a proper patch? it'd make
our lives much easier.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-06 20:31               ` Sagi Grimberg
@ 2017-05-10 16:55                 ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw)


On Sat, May 06, 2017@11:31:38PM +0300, Sagi Grimberg wrote:
> Vijay,
> 
> > Thanks for the feedback. Here's an updated patch.
> 
> Can you please resubmit it as a proper patch? it'd make
> our lives much easier.

What's the problem with it?  Except for the ---- instead of the usual
--- it looks like the normal way to submit a patch inside a mail
with other content.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvmet: release the sq ref on rdma read errors
  2017-05-04 22:51             ` Vijay Immanuel
  2017-05-06 20:31               ` Sagi Grimberg
@ 2017-05-10 16:55               ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw)


On Thu, May 04, 2017@03:51:09PM -0700, Vijay Immanuel wrote:
> Thanks for the feedback. Here's an updated patch.
> ----
> 
> On rdma read errors, release the sq ref that was taken
> when the req was initialized. This avoids a hang in
> nvmet_sq_destroy() when the queue is being freed.
> 
> Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-10 16:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAN9s3O597CB9d2BoPx1gL5a7NnqhxrXYMhE==Q3E8+BoDxRFiQ@mail.gmail.com>
     [not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com>
2017-05-03 19:51   ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel
2017-05-03 20:23     ` Sagi Grimberg
2017-05-04  5:50       ` Sagi Grimberg
2017-05-04  8:56         ` Christoph Hellwig
2017-05-04 10:46           ` Sagi Grimberg
2017-05-04 22:51             ` Vijay Immanuel
2017-05-06 20:31               ` Sagi Grimberg
2017-05-10 16:55                 ` Christoph Hellwig
2017-05-10 16:55               ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.