* [PATCH] nvmet: release the sq ref on rdma read errors [not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com> @ 2017-05-03 19:51 ` Vijay Immanuel 2017-05-03 20:23 ` Sagi Grimberg 0 siblings, 1 reply; 9+ messages in thread From: Vijay Immanuel @ 2017-05-03 19:51 UTC (permalink / raw) On rdma read errors, complete the req and release the sq ref that was taken when the req was initialized. This avoids a hang in nvmet_sq_destroy() when the queue is being freed. Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com> --- drivers/nvme/target/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index ecc4fe8..cf98f0b 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -567,7 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc) rsp->n_rdma = 0; if (unlikely(wc->status != IB_WC_SUCCESS)) { - nvmet_rdma_release_rsp(rsp); + nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR); if (wc->status != IB_WC_WR_FLUSH_ERR) { pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n", wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-03 19:51 ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel @ 2017-05-03 20:23 ` Sagi Grimberg 2017-05-04 5:50 ` Sagi Grimberg 0 siblings, 1 reply; 9+ messages in thread From: Sagi Grimberg @ 2017-05-03 20:23 UTC (permalink / raw) Nice catch Vijay! Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-03 20:23 ` Sagi Grimberg @ 2017-05-04 5:50 ` Sagi Grimberg 2017-05-04 8:56 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Sagi Grimberg @ 2017-05-04 5:50 UTC (permalink / raw) > Nice catch Vijay! > > Reviewed-by: Sagi Grimberg <sagi at grimberg.me> Wait... let me take that back. While it is true that we need to drop the reference on the nvmet_sq, there is no point in queuing a response message because the rdma qp is in error state and the response will never make it to the host. Moreover, posting a send (and a recv) on a qp in error state can potentially give us a flush completion after we drained the qp which can trigger a use-after-free condition. We rely on ib_drain_qp to guarantee that we'll never see more completions for this queue and we can safely free the resources. I think we should explicitly drop the sq reference and release the rsp and avoid triggering the TX path, and provide a detailed comment on why we are doing this. Maybe a nicer way to do this, is to introduce a nvme_req_uninit() that would take care of it in the right layer (that is nvmet core). CC'ing Steve and Christoph for their thoughts... ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-04 5:50 ` Sagi Grimberg @ 2017-05-04 8:56 ` Christoph Hellwig 2017-05-04 10:46 ` Sagi Grimberg 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2017-05-04 8:56 UTC (permalink / raw) On Thu, May 04, 2017@08:50:56AM +0300, Sagi Grimberg wrote: > While it is true that we need to drop the reference on > the nvmet_sq, there is no point in queuing a response > message because the rdma qp is in error state and the response > will never make it to the host. Yeah. > I think we should explicitly drop the sq reference and release > the rsp and avoid triggering the TX path, and provide a detailed > comment on why we are doing this. Maybe a nicer way to do this, > is to introduce a nvme_req_uninit() that would take care of > it in the right layer (that is nvmet core). Agreed. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-04 8:56 ` Christoph Hellwig @ 2017-05-04 10:46 ` Sagi Grimberg 2017-05-04 22:51 ` Vijay Immanuel 0 siblings, 1 reply; 9+ messages in thread From: Sagi Grimberg @ 2017-05-04 10:46 UTC (permalink / raw) >> While it is true that we need to drop the reference on >> the nvmet_sq, there is no point in queuing a response >> message because the rdma qp is in error state and the response >> will never make it to the host. > > Yeah. > >> I think we should explicitly drop the sq reference and release >> the rsp and avoid triggering the TX path, and provide a detailed >> comment on why we are doing this. Maybe a nicer way to do this, >> is to introduce a nvme_req_uninit() that would take care of >> it in the right layer (that is nvmet core). > > Agreed. Vijay, care to respin the fix? ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-04 10:46 ` Sagi Grimberg @ 2017-05-04 22:51 ` Vijay Immanuel 2017-05-06 20:31 ` Sagi Grimberg 2017-05-10 16:55 ` Christoph Hellwig 0 siblings, 2 replies; 9+ messages in thread From: Vijay Immanuel @ 2017-05-04 22:51 UTC (permalink / raw) Thanks for the feedback. Here's an updated patch. ---- On rdma read errors, release the sq ref that was taken when the req was initialized. This avoids a hang in nvmet_sq_destroy() when the queue is being freed. Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com> --- drivers/nvme/target/core.c | 6 ++++++ drivers/nvme/target/nvmet.h | 1 + drivers/nvme/target/rdma.c | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 798653b..fcb2906 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -529,6 +529,12 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, } EXPORT_SYMBOL_GPL(nvmet_req_init); +void nvmet_req_uninit(struct nvmet_req *req) +{ + percpu_ref_put(&req->sq->ref); +} +EXPORT_SYMBOL_GPL(nvmet_req_uninit); + static inline bool nvmet_cc_en(u32 cc) { return cc & 0x1; diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index f7ff15f..dae9ed6 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -261,6 +261,7 @@ struct nvmet_async_event { bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops); +void nvmet_req_uninit(struct nvmet_req *req); void nvmet_req_complete(struct nvmet_req *req, u16 status); void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid, diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index ecc4fe8..8fc245d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -567,6 +567,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc) rsp->n_rdma = 0; if (unlikely(wc->status != IB_WC_SUCCESS)) { + nvmet_req_uninit(&rsp->req); nvmet_rdma_release_rsp(rsp); if (wc->status != IB_WC_WR_FLUSH_ERR) { pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n", -- 1.8.3.1 -----Original Message----- From: Sagi Grimberg [mailto:sagi@grimberg.me] Sent: Thursday, May 4, 2017 3:46 AM To: Christoph Hellwig <hch at lst.de>; Vijay Immanuel <vijayi at attalasystems.com> Cc: linux-nvme at lists.infradead.org; Steve Wise <swise at chelsio.com> Subject: Re: [PATCH] nvmet: release the sq ref on rdma read errors >> While it is true that we need to drop the reference on the nvmet_sq, >> there is no point in queuing a response message because the rdma qp >> is in error state and the response will never make it to the host. > > Yeah. > >> I think we should explicitly drop the sq reference and release the >> rsp and avoid triggering the TX path, and provide a detailed comment >> on why we are doing this. Maybe a nicer way to do this, is to >> introduce a nvme_req_uninit() that would take care of it in the right >> layer (that is nvmet core). > > Agreed. Vijay, care to respin the fix? ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-04 22:51 ` Vijay Immanuel @ 2017-05-06 20:31 ` Sagi Grimberg 2017-05-10 16:55 ` Christoph Hellwig 2017-05-10 16:55 ` Christoph Hellwig 1 sibling, 1 reply; 9+ messages in thread From: Sagi Grimberg @ 2017-05-06 20:31 UTC (permalink / raw) Vijay, > Thanks for the feedback. Here's an updated patch. Can you please resubmit it as a proper patch? it'd make our lives much easier. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-06 20:31 ` Sagi Grimberg @ 2017-05-10 16:55 ` Christoph Hellwig 0 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw) On Sat, May 06, 2017@11:31:38PM +0300, Sagi Grimberg wrote: > Vijay, > > > Thanks for the feedback. Here's an updated patch. > > Can you please resubmit it as a proper patch? it'd make > our lives much easier. What's the problem with it? Except for the ---- instead of the usual --- it looks like the normal way to submit a patch inside a mail with other content. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] nvmet: release the sq ref on rdma read errors 2017-05-04 22:51 ` Vijay Immanuel 2017-05-06 20:31 ` Sagi Grimberg @ 2017-05-10 16:55 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2017-05-10 16:55 UTC (permalink / raw) On Thu, May 04, 2017@03:51:09PM -0700, Vijay Immanuel wrote: > Thanks for the feedback. Here's an updated patch. > ---- > > On rdma read errors, release the sq ref that was taken > when the req was initialized. This avoids a hang in > nvmet_sq_destroy() when the queue is being freed. > > Signed-off-by: Vijay Immanuel <vijayi at attalasystems.com> Looks good: Reviewed-by: Christoph Hellwig <hch at lst.de> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-05-10 16:55 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAN9s3O597CB9d2BoPx1gL5a7NnqhxrXYMhE==Q3E8+BoDxRFiQ@mail.gmail.com> [not found] ` <01cb01d2c446$20cd83a0$62688ae0$@attalasystems.com> 2017-05-03 19:51 ` [PATCH] nvmet: release the sq ref on rdma read errors Vijay Immanuel 2017-05-03 20:23 ` Sagi Grimberg 2017-05-04 5:50 ` Sagi Grimberg 2017-05-04 8:56 ` Christoph Hellwig 2017-05-04 10:46 ` Sagi Grimberg 2017-05-04 22:51 ` Vijay Immanuel 2017-05-06 20:31 ` Sagi Grimberg 2017-05-10 16:55 ` Christoph Hellwig 2017-05-10 16:55 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.