netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Fix error handle in 'rdma_request()'
@ 2022-11-21  8:00 Ye Bin
  2022-11-21  8:00 ` [PATCH 1/5] 9p: fix miss unmap request " Ye Bin
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10, Ye Bin

Ye Bin (5):
  9p: fix miss unmap request in 'rdma_request()'
  9p: fix miss release semaphore in 'rdma_request()'
  9p: fix error handle in 'post_recv()'
  9p: factor out 'post_send()'
  9p: refactor 'post_recv()'

 net/9p/trans_rdma.c | 181 ++++++++++++++++++++++++--------------------
 1 file changed, 101 insertions(+), 80 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] 9p: fix miss unmap request in 'rdma_request()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
@ 2022-11-21  8:00 ` Ye Bin
  2022-11-21  8:00 ` [PATCH 2/5] 9p: fix miss release semaphore " Ye Bin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10

From: Ye Bin <yebin10@huawei.com>

If send request failed or fail to get semaphore will not call 'send_done()',
request may miss to unmap. So add unmap handle above error.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 net/9p/trans_rdma.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 6ff706760676..e498208ed72b 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -500,7 +500,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 
 	if (down_interruptible(&rdma->sq_sem)) {
 		err = -EINTR;
-		goto send_error;
+		goto mapping_error;
 	}
 
 	/* Mark request as `sent' *before* we actually send it,
@@ -510,12 +510,15 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 	req->status = REQ_STATUS_SENT;
 	err = ib_post_send(rdma->qp, &wr, NULL);
 	if (err)
-		goto send_error;
+		goto mapping_error;
 
 	/* Success */
 	return 0;
 
  /* Handle errors that happened during or while preparing the send: */
+ mapping_error:
+	ib_dma_unmap_single(rdma->cm_id->device, c->busa,
+			    c->req->tc.size, DMA_TO_DEVICE);
  send_error:
 	req->status = REQ_STATUS_ERROR;
 	kfree(c);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] 9p: fix miss release semaphore in 'rdma_request()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
  2022-11-21  8:00 ` [PATCH 1/5] 9p: fix miss unmap request " Ye Bin
@ 2022-11-21  8:00 ` Ye Bin
  2022-11-21  8:00 ` [PATCH 3/5] 9p: fix error handle in 'post_recv()' Ye Bin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10

From: Ye Bin <yebin10@huawei.com>

If send request failed will not call 'send_done()', but already get
'(&rdma->sq_sem'. So add release '(&rdma->sq_sem' after send request failed.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 net/9p/trans_rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index e498208ed72b..ae2bac9bf510 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -509,8 +509,10 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 	 */
 	req->status = REQ_STATUS_SENT;
 	err = ib_post_send(rdma->qp, &wr, NULL);
-	if (err)
+	if (err) {
+		up(&rdma->sq_sem);
 		goto mapping_error;
+	}
 
 	/* Success */
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] 9p: fix error handle in 'post_recv()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
  2022-11-21  8:00 ` [PATCH 1/5] 9p: fix miss unmap request " Ye Bin
  2022-11-21  8:00 ` [PATCH 2/5] 9p: fix miss release semaphore " Ye Bin
@ 2022-11-21  8:00 ` Ye Bin
  2022-11-21  8:00 ` [PATCH 4/5] 9p: factor out 'post_send()' Ye Bin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10

From: Ye Bin <yebin10@huawei.com>

There are two issues in 'post_recv()':
1. Miss unmap request if receive request failed;
2. Miss release 'rdma->rq_sem' if post receive failed or mapping failed;

So add miss release 'rdma->rq_sem' and unmap request when do error handle.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 net/9p/trans_rdma.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index ae2bac9bf510..bcab550c2e2c 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -386,6 +386,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
 	struct p9_trans_rdma *rdma = client->trans;
 	struct ib_recv_wr wr;
 	struct ib_sge sge;
+	int err = -EIO;
 
 	c->busa = ib_dma_map_single(rdma->cm_id->device,
 				    c->rc.sdata, client->msize,
@@ -403,11 +404,17 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
 	wr.wr_cqe = &c->cqe;
 	wr.sg_list = &sge;
 	wr.num_sge = 1;
-	return ib_post_recv(rdma->qp, &wr, NULL);
-
+	err = ib_post_recv(rdma->qp, &wr, NULL);
+	if (err) {
+		ib_dma_unmap_single(rdma->cm_id->device, c->busa,
+				    client->msize, DMA_FROM_DEVICE);
+		goto error;
+	}
+	return 0;
  error:
+	up(&rdma->rq_sem);
 	p9_debug(P9_DEBUG_ERROR, "EIO\n");
-	return -EIO;
+	return err;
 }
 
 static int rdma_request(struct p9_client *client, struct p9_req_t *req)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] 9p: factor out 'post_send()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
                   ` (2 preceding siblings ...)
  2022-11-21  8:00 ` [PATCH 3/5] 9p: fix error handle in 'post_recv()' Ye Bin
@ 2022-11-21  8:00 ` Ye Bin
  2022-11-21  8:00 ` [PATCH 5/5] 9p: refactor 'post_recv()' Ye Bin
  2022-11-22  0:27 ` [PATCH 0/5] Fix error handle in 'rdma_request()' asmadeus
  5 siblings, 0 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10

From: Ye Bin <yebin10@huawei.com>

Factor out 'post_send()' to send request. No functional change.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 net/9p/trans_rdma.c | 130 +++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 61 deletions(-)

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index bcab550c2e2c..bb917389adc9 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -417,14 +417,72 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
 	return err;
 }
 
-static int rdma_request(struct p9_client *client, struct p9_req_t *req)
+static int post_send(struct p9_client *client, struct p9_req_t *req)
 {
 	struct p9_trans_rdma *rdma = client->trans;
+	struct p9_rdma_context *c = NULL;
 	struct ib_send_wr wr;
 	struct ib_sge sge;
+	int err;
+
+	c = kmalloc(sizeof *c, GFP_NOFS);
+	if (!c) {
+		err = -ENOMEM;
+		goto error;
+	}
+	c->req = req;
+
+	c->busa = ib_dma_map_single(rdma->cm_id->device,
+				    c->req->tc.sdata, c->req->tc.size,
+				    DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
+		err = -EIO;
+		goto error;
+	}
+
+	c->cqe.done = send_done;
+
+	sge.addr = c->busa;
+	sge.length = c->req->tc.size;
+	sge.lkey = rdma->pd->local_dma_lkey;
+
+	wr.next = NULL;
+	wr.wr_cqe = &c->cqe;
+	wr.opcode = IB_WR_SEND;
+	wr.send_flags = IB_SEND_SIGNALED;
+	wr.sg_list = &sge;
+	wr.num_sge = 1;
+
+	if (down_interruptible(&rdma->sq_sem)) {
+		err = -EINTR;
+		goto mapping_error;
+	}
+
+	/* Mark request as `sent' *before* we actually send it,
+	 * because doing if after could erase the REQ_STATUS_RCVD
+	 * status in case of a very fast reply.
+	 */
+	req->status = REQ_STATUS_SENT;
+	err = ib_post_send(rdma->qp, &wr, NULL);
+	if (err)
+		goto sem_error;
+
+	return 0;
+sem_error:
+	up(&rdma->sq_sem);
+mapping_error:
+	ib_dma_unmap_single(rdma->cm_id->device, c->busa,
+			    c->req->tc.size, DMA_TO_DEVICE);
+error:
+	kfree(c);
+	return err;
+}
+
+static int rdma_request(struct p9_client *client, struct p9_req_t *req)
+{
+	struct p9_trans_rdma *rdma = client->trans;
 	int err = 0;
 	unsigned long flags;
-	struct p9_rdma_context *c = NULL;
 	struct p9_rdma_context *rpl_context = NULL;
 
 	/* When an error occurs between posting the recv and the send,
@@ -476,67 +534,17 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 	req->rc.sdata = NULL;
 
 dont_need_post_recv:
-	/* Post the request */
-	c = kmalloc(sizeof *c, GFP_NOFS);
-	if (!c) {
-		err = -ENOMEM;
-		goto send_error;
-	}
-	c->req = req;
-
-	c->busa = ib_dma_map_single(rdma->cm_id->device,
-				    c->req->tc.sdata, c->req->tc.size,
-				    DMA_TO_DEVICE);
-	if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
-		err = -EIO;
-		goto send_error;
-	}
-
-	c->cqe.done = send_done;
-
-	sge.addr = c->busa;
-	sge.length = c->req->tc.size;
-	sge.lkey = rdma->pd->local_dma_lkey;
-
-	wr.next = NULL;
-	wr.wr_cqe = &c->cqe;
-	wr.opcode = IB_WR_SEND;
-	wr.send_flags = IB_SEND_SIGNALED;
-	wr.sg_list = &sge;
-	wr.num_sge = 1;
-
-	if (down_interruptible(&rdma->sq_sem)) {
-		err = -EINTR;
-		goto mapping_error;
-	}
-
-	/* Mark request as `sent' *before* we actually send it,
-	 * because doing if after could erase the REQ_STATUS_RCVD
-	 * status in case of a very fast reply.
-	 */
-	req->status = REQ_STATUS_SENT;
-	err = ib_post_send(rdma->qp, &wr, NULL);
+	err = post_send(client, req);
 	if (err) {
-		up(&rdma->sq_sem);
-		goto mapping_error;
+		req->status = REQ_STATUS_ERROR;
+		p9_debug(P9_DEBUG_ERROR, "Error %d in rdma_request()\n", err);
+
+		/* Ach.
+		 *  We did recv_post(), but not send. We have one recv_post
+		 *  in excess.
+		 */
+		atomic_inc(&rdma->excess_rc);
 	}
-
-	/* Success */
-	return 0;
-
- /* Handle errors that happened during or while preparing the send: */
- mapping_error:
-	ib_dma_unmap_single(rdma->cm_id->device, c->busa,
-			    c->req->tc.size, DMA_TO_DEVICE);
- send_error:
-	req->status = REQ_STATUS_ERROR;
-	kfree(c);
-	p9_debug(P9_DEBUG_ERROR, "Error %d in rdma_request()\n", err);
-
-	/* Ach.
-	 *  We did recv_post(), but not send. We have one recv_post in excess.
-	 */
-	atomic_inc(&rdma->excess_rc);
 	return err;
 
  /* Handle errors that happened during or while preparing post_recv(): */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] 9p: refactor 'post_recv()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
                   ` (3 preceding siblings ...)
  2022-11-21  8:00 ` [PATCH 4/5] 9p: factor out 'post_send()' Ye Bin
@ 2022-11-21  8:00 ` Ye Bin
  2022-11-22  0:27 ` [PATCH 0/5] Fix error handle in 'rdma_request()' asmadeus
  5 siblings, 0 replies; 7+ messages in thread
From: Ye Bin @ 2022-11-21  8:00 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, linux_oss, davem, edumazet, kuba,
	pabeni, v9fs-developer, netdev
  Cc: linux-kernel, yebin10

From: Ye Bin <yebin10@huawei.com>

Refactor 'post_recv()', move receive resource request from 'rdma_request()' to
'post_recv()'.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 net/9p/trans_rdma.c | 77 +++++++++++++++++++++++----------------------
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index bb917389adc9..78452c289f35 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -380,19 +380,40 @@ static void rdma_destroy_trans(struct p9_trans_rdma *rdma)
 	kfree(rdma);
 }
 
-static int
-post_recv(struct p9_client *client, struct p9_rdma_context *c)
+static int post_recv(struct p9_client *client, struct p9_req_t *req)
 {
 	struct p9_trans_rdma *rdma = client->trans;
+	struct p9_rdma_context *c = NULL;
 	struct ib_recv_wr wr;
 	struct ib_sge sge;
-	int err = -EIO;
+	int err;
+
+	c = kmalloc(sizeof *c, GFP_NOFS);
+	if (!c) {
+		err = -ENOMEM;
+		goto error;
+	}
+	c->rc.sdata = req->rc.sdata;
+
+	/*
+	 * Post a receive buffer for this request. We need to ensure
+	 * there is a reply buffer available for every outstanding
+	 * request. A flushed request can result in no reply for an
+	 * outstanding request, so we must keep a count to avoid
+	 * overflowing the RQ.
+	 */
+	if (down_interruptible(&rdma->rq_sem)) {
+		err = -EINTR;
+		goto error;
+	}
 
 	c->busa = ib_dma_map_single(rdma->cm_id->device,
 				    c->rc.sdata, client->msize,
 				    DMA_FROM_DEVICE);
-	if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
-		goto error;
+	if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
+		err = -EIO;
+		goto sem_error;
+	}
 
 	c->cqe.done = recv_done;
 
@@ -405,15 +426,18 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
 	wr.sg_list = &sge;
 	wr.num_sge = 1;
 	err = ib_post_recv(rdma->qp, &wr, NULL);
-	if (err) {
-		ib_dma_unmap_single(rdma->cm_id->device, c->busa,
-				    client->msize, DMA_FROM_DEVICE);
-		goto error;
-	}
+	if (err)
+		goto mapping_error;
+
 	return 0;
- error:
+
+mapping_error:
+	ib_dma_unmap_single(rdma->cm_id->device, c->busa,
+			    client->msize, DMA_FROM_DEVICE);
+sem_error:
 	up(&rdma->rq_sem);
-	p9_debug(P9_DEBUG_ERROR, "EIO\n");
+error:
+	kfree(c);
 	return err;
 }
 
@@ -481,9 +505,8 @@ static int post_send(struct p9_client *client, struct p9_req_t *req)
 static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 {
 	struct p9_trans_rdma *rdma = client->trans;
-	int err = 0;
 	unsigned long flags;
-	struct p9_rdma_context *rpl_context = NULL;
+	int err;
 
 	/* When an error occurs between posting the recv and the send,
 	 * there will be a receive context posted without a pending request.
@@ -505,27 +528,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 		}
 	}
 
-	/* Allocate an fcall for the reply */
-	rpl_context = kmalloc(sizeof *rpl_context, GFP_NOFS);
-	if (!rpl_context) {
-		err = -ENOMEM;
-		goto recv_error;
-	}
-	rpl_context->rc.sdata = req->rc.sdata;
-
-	/*
-	 * Post a receive buffer for this request. We need to ensure
-	 * there is a reply buffer available for every outstanding
-	 * request. A flushed request can result in no reply for an
-	 * outstanding request, so we must keep a count to avoid
-	 * overflowing the RQ.
-	 */
-	if (down_interruptible(&rdma->rq_sem)) {
-		err = -EINTR;
-		goto recv_error;
-	}
-
-	err = post_recv(client, rpl_context);
+	err = post_recv(client, req);
 	if (err) {
 		p9_debug(P9_DEBUG_ERROR, "POST RECV failed: %d\n", err);
 		goto recv_error;
@@ -547,9 +550,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
 	}
 	return err;
 
- /* Handle errors that happened during or while preparing post_recv(): */
- recv_error:
-	kfree(rpl_context);
+recv_error:
 	spin_lock_irqsave(&rdma->req_lock, flags);
 	if (err != -EINTR && rdma->state < P9_RDMA_CLOSING) {
 		rdma->state = P9_RDMA_CLOSING;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] Fix error handle in 'rdma_request()'
  2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
                   ` (4 preceding siblings ...)
  2022-11-21  8:00 ` [PATCH 5/5] 9p: refactor 'post_recv()' Ye Bin
@ 2022-11-22  0:27 ` asmadeus
  5 siblings, 0 replies; 7+ messages in thread
From: asmadeus @ 2022-11-22  0:27 UTC (permalink / raw)
  To: Ye Bin
  Cc: ericvh, lucho, linux_oss, davem, edumazet, kuba, pabeni,
	v9fs-developer, netdev, linux-kernel, yebin10

Ye Bin wrote on Mon, Nov 21, 2022 at 04:00:44PM +0800:
> Ye Bin (5):

Thanks for these patches.
The commit messages are a bit difficult, but code changes mostly look
good to me -- I'll do a proper review when I can find time to test.

Just one question first: do you have RDMA hardware at huawei and/or
actually use this, or is it all static analysis fixes ?
(regardless of whether these problems actually happened with that
hardware, I'd like to know if this has had a first round of test or not)
-- 
Dominique

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-22  0:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21  8:00 [PATCH 0/5] Fix error handle in 'rdma_request()' Ye Bin
2022-11-21  8:00 ` [PATCH 1/5] 9p: fix miss unmap request " Ye Bin
2022-11-21  8:00 ` [PATCH 2/5] 9p: fix miss release semaphore " Ye Bin
2022-11-21  8:00 ` [PATCH 3/5] 9p: fix error handle in 'post_recv()' Ye Bin
2022-11-21  8:00 ` [PATCH 4/5] 9p: factor out 'post_send()' Ye Bin
2022-11-21  8:00 ` [PATCH 5/5] 9p: refactor 'post_recv()' Ye Bin
2022-11-22  0:27 ` [PATCH 0/5] Fix error handle in 'rdma_request()' asmadeus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).