linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xprtrdma: Ensure connect worker is awoken after connect error
@ 2020-06-21 14:59 Chuck Lever
  2020-06-25 19:19 ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2020-06-21 14:59 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-nfs, linux-rdma

From: Dan Aloni <dan@kernelim.com>

The connect worker sleeps waiting for either successful connection
establishment or an error. Commit e28ce90083f0 ("xprtrdma: kmalloc
rpcrdma_ep separate from rpcrdma_xprt") mistakenly removed the
wake-up in cases when connection establishment fails.

Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Dan Aloni <dan@kernelim.com>
[ cel: rebased on recent fixes to 5.8-rc; patch description rewritten ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 2198c8ec8dff..54c6809bf06e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -296,6 +296,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 		ep->re_connect_status = -ECONNABORTED;
 disconnected:
 		rpcrdma_force_disconnect(ep);
+		wake_up_all(&ep->re_connect_wait);
 		return rpcrdma_ep_put(ep);
 	default:
 		break;


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xprtrdma: Ensure connect worker is awoken after connect error
  2020-06-21 14:59 [PATCH] xprtrdma: Ensure connect worker is awoken after connect error Chuck Lever
@ 2020-06-25 19:19 ` Chuck Lever
  2020-06-26  7:10   ` [PATCH] xprtrdma: fix EP destruction logic Dan Aloni
  2020-06-26  7:17   ` [PATCH] xprtrdma: Ensure connect worker is awoken after connect error Dan Aloni
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Lever @ 2020-06-25 19:19 UTC (permalink / raw)
  To: Dan Aloni, Anna Schumaker; +Cc: Linux NFS Mailing List, linux-rdma

Anna, please drop this one. It appears to trigger a particularly nasty
use-after-free. I'll follow up with a more complete fix soon.

(Yes, a wake-up on connect errors is indeed necessary... but the connect
worker needs to be re-organized to deal properly with it).


> On Jun 21, 2020, at 10:59 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> From: Dan Aloni <dan@kernelim.com>
> 
> The connect worker sleeps waiting for either successful connection
> establishment or an error. Commit e28ce90083f0 ("xprtrdma: kmalloc
> rpcrdma_ep separate from rpcrdma_xprt") mistakenly removed the
> wake-up in cases when connection establishment fails.
> 
> Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
> Signed-off-by: Dan Aloni <dan@kernelim.com>
> [ cel: rebased on recent fixes to 5.8-rc; patch description rewritten ]
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/verbs.c |    1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 2198c8ec8dff..54c6809bf06e 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -296,6 +296,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> 		ep->re_connect_status = -ECONNABORTED;
> disconnected:
> 		rpcrdma_force_disconnect(ep);
> +		wake_up_all(&ep->re_connect_wait);
> 		return rpcrdma_ep_put(ep);
> 	default:
> 		break;
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] xprtrdma: fix EP destruction logic
  2020-06-25 19:19 ` Chuck Lever
@ 2020-06-26  7:10   ` Dan Aloni
  2020-06-26 12:56     ` Chuck Lever
  2020-06-26  7:17   ` [PATCH] xprtrdma: Ensure connect worker is awoken after connect error Dan Aloni
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Aloni @ 2020-06-26  7:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

This fixes various issues found when testing disconnections and other
various fault injections under a KASAN-enabled kernel.

- Swap the names of `rpcrdma_ep_destroy` and `rpcrdma_ep_put` to
  reflect what the functions are doing.
- In the cleanup of `rpcrdma_ep_create` do `kfree` on the EP only for
  the case where no one knows about it, and avoid a double free (what
  `rpcrdma_ep_destroy` did, followed by `kfree`). No need to set
  `r_xprt->rx_ep` to NULL because we are doing that only in the success
  path.
- Don't up the EP ref in `RDMA_CM_EVENT_ESTABLISHED` but do it sooner
  before `rdma_connect`. This is needed so that an early wake up of
  `wait_event_interruptible` in `rpcrdma_xprt_connect` in case of
  a failure does not see a freed EP.
- Still try to follow the rule that the last decref of EP performs
  `rdma_destroy_id(id)`.
- Only path to disengage an EP is `rpcrdma_xprt_disconnect`, whether it
  is actually connected or not. This makes the error paths of
  `rpcrdma_xprt_connect` clearer.
- Add a mutex in `rpcrdma_ep_destroy` to guard against concurrent calls
  to `rpcrdma_xprt_disconnect` coming from either `rpcrdma_xprt_connect`
  or `xprt_rdma_close`.

Signed-off-by: Dan Aloni <dan@kernelim.com>
---
 net/sunrpc/xprtrdma/transport.c |  2 ++
 net/sunrpc/xprtrdma/verbs.c     | 50 ++++++++++++++++++++++-----------
 net/sunrpc/xprtrdma/xprt_rdma.h |  1 +
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 0c4af7f5e241..50c28be6b694 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -350,6 +350,8 @@ xprt_setup_rdma(struct xprt_create *args)
 	xprt_rdma_format_addresses(xprt, sap);
 
 	new_xprt = rpcx_to_rdmax(xprt);
+	mutex_init(&new_xprt->rx_flush);
+
 	rc = rpcrdma_buffer_create(new_xprt);
 	if (rc) {
 		xprt_rdma_free_addresses(xprt);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 2ae348377806..c66871bbb894 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -84,7 +84,7 @@ static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep);
 static void rpcrdma_reps_unmap(struct rpcrdma_xprt *r_xprt);
 static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
 static void rpcrdma_mrs_destroy(struct rpcrdma_xprt *r_xprt);
-static int rpcrdma_ep_destroy(struct rpcrdma_ep *ep);
+static int rpcrdma_ep_put(struct rpcrdma_ep *ep);
 static struct rpcrdma_regbuf *
 rpcrdma_regbuf_alloc(size_t size, enum dma_data_direction direction,
 		     gfp_t flags);
@@ -99,6 +99,9 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
 {
 	struct rdma_cm_id *id = r_xprt->rx_ep->re_id;
 
+	if (!id->qp)
+		return;
+
 	/* Flush Receives, then wait for deferred Reply work
 	 * to complete.
 	 */
@@ -266,7 +269,6 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 		xprt_force_disconnect(xprt);
 		goto disconnected;
 	case RDMA_CM_EVENT_ESTABLISHED:
-		kref_get(&ep->re_kref);
 		ep->re_connect_status = 1;
 		rpcrdma_update_cm_private(ep, &event->param.conn);
 		trace_xprtrdma_inline_thresh(ep);
@@ -289,7 +291,8 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 		ep->re_connect_status = -ECONNABORTED;
 disconnected:
 		xprt_force_disconnect(xprt);
-		return rpcrdma_ep_destroy(ep);
+		wake_up_all(&ep->re_connect_wait);
+		return rpcrdma_ep_put(ep);
 	default:
 		break;
 	}
@@ -345,11 +348,11 @@ static struct rdma_cm_id *rpcrdma_create_id(struct rpcrdma_xprt *r_xprt,
 	return ERR_PTR(rc);
 }
 
-static void rpcrdma_ep_put(struct kref *kref)
+static void rpcrdma_ep_destroy(struct kref *kref)
 {
 	struct rpcrdma_ep *ep = container_of(kref, struct rpcrdma_ep, re_kref);
 
-	if (ep->re_id->qp) {
+	if (ep->re_id && ep->re_id->qp) {
 		rdma_destroy_qp(ep->re_id);
 		ep->re_id->qp = NULL;
 	}
@@ -373,9 +376,9 @@ static void rpcrdma_ep_put(struct kref *kref)
  *     %0 if @ep still has a positive kref count, or
  *     %1 if @ep was destroyed successfully.
  */
-static int rpcrdma_ep_destroy(struct rpcrdma_ep *ep)
+static int rpcrdma_ep_put(struct rpcrdma_ep *ep)
 {
-	return kref_put(&ep->re_kref, rpcrdma_ep_put);
+	return kref_put(&ep->re_kref, rpcrdma_ep_destroy);
 }
 
 static int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
@@ -492,11 +495,12 @@ static int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
 	return 0;
 
 out_destroy:
-	rpcrdma_ep_destroy(ep);
+	rpcrdma_ep_put(ep);
 	rdma_destroy_id(id);
+	return rc;
+
 out_free:
 	kfree(ep);
-	r_xprt->rx_ep = NULL;
 	return rc;
 }
 
@@ -510,6 +514,7 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 {
 	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
 	struct rpcrdma_ep *ep;
+	struct rdma_cm_id *id;
 	int rc;
 
 retry:
@@ -518,6 +523,7 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 	if (rc)
 		return rc;
 	ep = r_xprt->rx_ep;
+	id = ep->re_id;
 
 	ep->re_connect_status = 0;
 	xprt_clear_connected(xprt);
@@ -529,7 +535,10 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 	if (rc)
 		goto out;
 
-	rc = rdma_connect(ep->re_id, &ep->re_remote_cma);
+	/* From this point on, CM events can discard our EP */
+	kref_get(&ep->re_kref);
+
+	rc = rdma_connect(id, &ep->re_remote_cma);
 	if (rc)
 		goto out;
 
@@ -546,14 +555,17 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 
 	rc = rpcrdma_reqs_setup(r_xprt);
 	if (rc) {
-		rpcrdma_xprt_disconnect(r_xprt);
+		ep->re_connect_status = rc;
 		goto out;
 	}
+
 	rpcrdma_mrs_create(r_xprt);
+	trace_xprtrdma_connect(r_xprt, 0);
+	return rc;
 
 out:
-	if (rc)
-		ep->re_connect_status = rc;
+	ep->re_connect_status = rc;
+	rpcrdma_xprt_disconnect(r_xprt);
 	trace_xprtrdma_connect(r_xprt, rc);
 	return rc;
 }
@@ -570,12 +582,15 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
  */
 void rpcrdma_xprt_disconnect(struct rpcrdma_xprt *r_xprt)
 {
-	struct rpcrdma_ep *ep = r_xprt->rx_ep;
+	struct rpcrdma_ep *ep;
 	struct rdma_cm_id *id;
 	int rc;
 
+	mutex_lock(&r_xprt->rx_flush);
+
+	ep = r_xprt->rx_ep;
 	if (!ep)
-		return;
+		goto out;
 
 	id = ep->re_id;
 	rc = rdma_disconnect(id);
@@ -587,10 +602,13 @@ void rpcrdma_xprt_disconnect(struct rpcrdma_xprt *r_xprt)
 	rpcrdma_mrs_destroy(r_xprt);
 	rpcrdma_sendctxs_destroy(r_xprt);
 
-	if (rpcrdma_ep_destroy(ep))
+	if (rpcrdma_ep_put(ep))
 		rdma_destroy_id(id);
 
 	r_xprt->rx_ep = NULL;
+
+out:
+	mutex_unlock(&r_xprt->rx_flush);
 }
 
 /* Fixed-size circular FIFO queue. This implementation is wait-free and
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 0a16fdb09b2c..288645a9437f 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -417,6 +417,7 @@ struct rpcrdma_xprt {
 	struct delayed_work	rx_connect_worker;
 	struct rpc_timeout	rx_timeout;
 	struct rpcrdma_stats	rx_stats;
+	struct mutex            rx_flush;
 };
 
 #define rpcx_to_rdmax(x) container_of(x, struct rpcrdma_xprt, rx_xprt)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xprtrdma: Ensure connect worker is awoken after connect error
  2020-06-25 19:19 ` Chuck Lever
  2020-06-26  7:10   ` [PATCH] xprtrdma: fix EP destruction logic Dan Aloni
@ 2020-06-26  7:17   ` Dan Aloni
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Aloni @ 2020-06-26  7:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma

On Thu, Jun 25, 2020 at 03:19:39PM -0400, Chuck Lever wrote:
> Anna, please drop this one. It appears to trigger a particularly nasty
> use-after-free. I'll follow up with a more complete fix soon.
> 
> (Yes, a wake-up on connect errors is indeed necessary... but the connect
> worker needs to be re-organized to deal properly with it).

After sending that patch I also noticed more issues with the management
of the EP context. The decref inside the CM handler caused freeing of
the EP while connect path still held a reference to it. A KASAN-enabled
kernel revealed this easily. I've sent just now a more comprehensive
patch to deal with this.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xprtrdma: fix EP destruction logic
  2020-06-26  7:10   ` [PATCH] xprtrdma: fix EP destruction logic Dan Aloni
@ 2020-06-26 12:56     ` Chuck Lever
  2020-06-26 15:10       ` Dan Aloni
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2020-06-26 12:56 UTC (permalink / raw)
  To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List



> On Jun 26, 2020, at 3:10 AM, Dan Aloni <dan@kernelim.com> wrote:
> 
> This fixes various issues found when testing disconnections and other
> various fault injections under a KASAN-enabled kernel.
> 
> - Swap the names of `rpcrdma_ep_destroy` and `rpcrdma_ep_put` to
>  reflect what the functions are doing.
> - In the cleanup of `rpcrdma_ep_create` do `kfree` on the EP only for
>  the case where no one knows about it, and avoid a double free (what
>  `rpcrdma_ep_destroy` did, followed by `kfree`). No need to set
>  `r_xprt->rx_ep` to NULL because we are doing that only in the success
>  path.
> - Don't up the EP ref in `RDMA_CM_EVENT_ESTABLISHED` but do it sooner
>  before `rdma_connect`. This is needed so that an early wake up of
>  `wait_event_interruptible` in `rpcrdma_xprt_connect` in case of
>  a failure does not see a freed EP.
> - Still try to follow the rule that the last decref of EP performs
>  `rdma_destroy_id(id)`.
> - Only path to disengage an EP is `rpcrdma_xprt_disconnect`, whether it
>  is actually connected or not. This makes the error paths of
>  `rpcrdma_xprt_connect` clearer.
> - Add a mutex in `rpcrdma_ep_destroy` to guard against concurrent calls
>  to `rpcrdma_xprt_disconnect` coming from either `rpcrdma_xprt_connect`
>  or `xprt_rdma_close`.

NAK. The RPC client provides appropriate exclusion, please let's not
add more serialization that can introduce further deadlocks.

There are already patches pending in this area that make some of these
changes, starting with

https://marc.info/?l=linux-nfs&m=159222724808832&w=2

Yesterday, I found the other problems you mentioned above after I was
finally able to get my server to REJECT connection attempts. I'll submit
patches to clean all this up once I have tested them.


> Signed-off-by: Dan Aloni <dan@kernelim.com>
> ---
> net/sunrpc/xprtrdma/transport.c |  2 ++
> net/sunrpc/xprtrdma/verbs.c     | 50 ++++++++++++++++++++++-----------
> net/sunrpc/xprtrdma/xprt_rdma.h |  1 +
> 3 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 0c4af7f5e241..50c28be6b694 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -350,6 +350,8 @@ xprt_setup_rdma(struct xprt_create *args)
> 	xprt_rdma_format_addresses(xprt, sap);
> 
> 	new_xprt = rpcx_to_rdmax(xprt);
> +	mutex_init(&new_xprt->rx_flush);
> +
> 	rc = rpcrdma_buffer_create(new_xprt);
> 	if (rc) {
> 		xprt_rdma_free_addresses(xprt);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 2ae348377806..c66871bbb894 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -84,7 +84,7 @@ static void rpcrdma_rep_destroy(struct rpcrdma_rep *rep);
> static void rpcrdma_reps_unmap(struct rpcrdma_xprt *r_xprt);
> static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
> static void rpcrdma_mrs_destroy(struct rpcrdma_xprt *r_xprt);
> -static int rpcrdma_ep_destroy(struct rpcrdma_ep *ep);
> +static int rpcrdma_ep_put(struct rpcrdma_ep *ep);
> static struct rpcrdma_regbuf *
> rpcrdma_regbuf_alloc(size_t size, enum dma_data_direction direction,
> 		     gfp_t flags);
> @@ -99,6 +99,9 @@ static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> {
> 	struct rdma_cm_id *id = r_xprt->rx_ep->re_id;
> 
> +	if (!id->qp)
> +		return;
> +
> 	/* Flush Receives, then wait for deferred Reply work
> 	 * to complete.
> 	 */
> @@ -266,7 +269,6 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> 		xprt_force_disconnect(xprt);
> 		goto disconnected;
> 	case RDMA_CM_EVENT_ESTABLISHED:
> -		kref_get(&ep->re_kref);
> 		ep->re_connect_status = 1;
> 		rpcrdma_update_cm_private(ep, &event->param.conn);
> 		trace_xprtrdma_inline_thresh(ep);
> @@ -289,7 +291,8 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> 		ep->re_connect_status = -ECONNABORTED;
> disconnected:
> 		xprt_force_disconnect(xprt);
> -		return rpcrdma_ep_destroy(ep);
> +		wake_up_all(&ep->re_connect_wait);
> +		return rpcrdma_ep_put(ep);
> 	default:
> 		break;
> 	}
> @@ -345,11 +348,11 @@ static struct rdma_cm_id *rpcrdma_create_id(struct rpcrdma_xprt *r_xprt,
> 	return ERR_PTR(rc);
> }
> 
> -static void rpcrdma_ep_put(struct kref *kref)
> +static void rpcrdma_ep_destroy(struct kref *kref)
> {
> 	struct rpcrdma_ep *ep = container_of(kref, struct rpcrdma_ep, re_kref);
> 
> -	if (ep->re_id->qp) {
> +	if (ep->re_id && ep->re_id->qp) {
> 		rdma_destroy_qp(ep->re_id);
> 		ep->re_id->qp = NULL;
> 	}
> @@ -373,9 +376,9 @@ static void rpcrdma_ep_put(struct kref *kref)
>  *     %0 if @ep still has a positive kref count, or
>  *     %1 if @ep was destroyed successfully.
>  */
> -static int rpcrdma_ep_destroy(struct rpcrdma_ep *ep)
> +static int rpcrdma_ep_put(struct rpcrdma_ep *ep)
> {
> -	return kref_put(&ep->re_kref, rpcrdma_ep_put);
> +	return kref_put(&ep->re_kref, rpcrdma_ep_destroy);
> }
> 
> static int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
> @@ -492,11 +495,12 @@ static int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
> 	return 0;
> 
> out_destroy:
> -	rpcrdma_ep_destroy(ep);
> +	rpcrdma_ep_put(ep);
> 	rdma_destroy_id(id);
> +	return rc;
> +
> out_free:
> 	kfree(ep);
> -	r_xprt->rx_ep = NULL;
> 	return rc;
> }
> 
> @@ -510,6 +514,7 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
> {
> 	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
> 	struct rpcrdma_ep *ep;
> +	struct rdma_cm_id *id;
> 	int rc;
> 
> retry:
> @@ -518,6 +523,7 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
> 	if (rc)
> 		return rc;
> 	ep = r_xprt->rx_ep;
> +	id = ep->re_id;
> 
> 	ep->re_connect_status = 0;
> 	xprt_clear_connected(xprt);
> @@ -529,7 +535,10 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
> 	if (rc)
> 		goto out;
> 
> -	rc = rdma_connect(ep->re_id, &ep->re_remote_cma);
> +	/* From this point on, CM events can discard our EP */
> +	kref_get(&ep->re_kref);
> +
> +	rc = rdma_connect(id, &ep->re_remote_cma);
> 	if (rc)
> 		goto out;
> 
> @@ -546,14 +555,17 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
> 
> 	rc = rpcrdma_reqs_setup(r_xprt);
> 	if (rc) {
> -		rpcrdma_xprt_disconnect(r_xprt);
> +		ep->re_connect_status = rc;
> 		goto out;
> 	}
> +
> 	rpcrdma_mrs_create(r_xprt);
> +	trace_xprtrdma_connect(r_xprt, 0);
> +	return rc;
> 
> out:
> -	if (rc)
> -		ep->re_connect_status = rc;
> +	ep->re_connect_status = rc;
> +	rpcrdma_xprt_disconnect(r_xprt);
> 	trace_xprtrdma_connect(r_xprt, rc);
> 	return rc;
> }
> @@ -570,12 +582,15 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
>  */
> void rpcrdma_xprt_disconnect(struct rpcrdma_xprt *r_xprt)
> {
> -	struct rpcrdma_ep *ep = r_xprt->rx_ep;
> +	struct rpcrdma_ep *ep;
> 	struct rdma_cm_id *id;
> 	int rc;
> 
> +	mutex_lock(&r_xprt->rx_flush);
> +
> +	ep = r_xprt->rx_ep;
> 	if (!ep)
> -		return;
> +		goto out;
> 
> 	id = ep->re_id;
> 	rc = rdma_disconnect(id);
> @@ -587,10 +602,13 @@ void rpcrdma_xprt_disconnect(struct rpcrdma_xprt *r_xprt)
> 	rpcrdma_mrs_destroy(r_xprt);
> 	rpcrdma_sendctxs_destroy(r_xprt);
> 
> -	if (rpcrdma_ep_destroy(ep))
> +	if (rpcrdma_ep_put(ep))
> 		rdma_destroy_id(id);
> 
> 	r_xprt->rx_ep = NULL;
> +
> +out:
> +	mutex_unlock(&r_xprt->rx_flush);
> }
> 
> /* Fixed-size circular FIFO queue. This implementation is wait-free and
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 0a16fdb09b2c..288645a9437f 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -417,6 +417,7 @@ struct rpcrdma_xprt {
> 	struct delayed_work	rx_connect_worker;
> 	struct rpc_timeout	rx_timeout;
> 	struct rpcrdma_stats	rx_stats;
> +	struct mutex            rx_flush;
> };
> 
> #define rpcx_to_rdmax(x) container_of(x, struct rpcrdma_xprt, rx_xprt)
> -- 
> 2.25.4
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xprtrdma: fix EP destruction logic
  2020-06-26 12:56     ` Chuck Lever
@ 2020-06-26 15:10       ` Dan Aloni
  2020-06-26 15:22         ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Aloni @ 2020-06-26 15:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Fri, Jun 26, 2020 at 08:56:41AM -0400, Chuck Lever wrote:
> > On Jun 26, 2020, at 3:10 AM, Dan Aloni <dan@kernelim.com> wrote:
[..]
> > - Add a mutex in `rpcrdma_ep_destroy` to guard against concurrent calls
> >  to `rpcrdma_xprt_disconnect` coming from either `rpcrdma_xprt_connect`
> >  or `xprt_rdma_close`.
> 
> NAK. The RPC client provides appropriate exclusion, please let's not
> add more serialization that can introduce further deadlocks.

It appeared to me that this exclusion does not works well. As for my
considerations, if I am not mistaken from analyzing crashes I've
seen:

   -> xprt_autoclose (running on xprtiod)
     -> xprt->ops->close
       -> xprt_rdma_close
         -> rpcrdma_xprt_disconnect

and:

    -> xprt_rdma_connect_worker (running on xprtiod)
      -> rpcrdma_xprt_connect
	-> rpcrdma_xprt_disconnect

I understand the rationale or at least the aim that `close` and
`connect` ops should not be concurrent on the same `xprt`, however:

* `xprt_force_disconnect`, is called from various places, queues
  a call to `xprt_autoclose` to the background on `xprtiod` workqueue item,
  conditioned that `!XPRT_LOCKED` which is the case for connect that went
  to the background.
* `xprt_rdma_connect` also sends `xprt_rdma_connect_worker` as an `xprtiod`
  workqueue item, unconditionally.

So we have two work items that can run in parallel, and I don't see
clear gating on this from the code.

Maybe there's a simpler fix for this. Perhaps a
`cancel_delayed_work_sync(&r_xprt->rx_connect_worker);` is appropriate
in `xprt_rdma_close`?

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xprtrdma: fix EP destruction logic
  2020-06-26 15:10       ` Dan Aloni
@ 2020-06-26 15:22         ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2020-06-26 15:22 UTC (permalink / raw)
  To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List



> On Jun 26, 2020, at 11:10 AM, Dan Aloni <dan@kernelim.com> wrote:
> 
> On Fri, Jun 26, 2020 at 08:56:41AM -0400, Chuck Lever wrote:
>>> On Jun 26, 2020, at 3:10 AM, Dan Aloni <dan@kernelim.com> wrote:
> [..]
>>> - Add a mutex in `rpcrdma_ep_destroy` to guard against concurrent calls
>>> to `rpcrdma_xprt_disconnect` coming from either `rpcrdma_xprt_connect`
>>> or `xprt_rdma_close`.
>> 
>> NAK. The RPC client provides appropriate exclusion, please let's not
>> add more serialization that can introduce further deadlocks.
> 
> It appeared to me that this exclusion does not works well. As for my
> considerations, if I am not mistaken from analyzing crashes I've
> seen:
> 
>   -> xprt_autoclose (running on xprtiod)
>     -> xprt->ops->close
>       -> xprt_rdma_close
>         -> rpcrdma_xprt_disconnect
> 
> and:
> 
>    -> xprt_rdma_connect_worker (running on xprtiod)
>      -> rpcrdma_xprt_connect
> 	-> rpcrdma_xprt_disconnect
> 
> I understand the rationale or at least the aim that `close` and
> `connect` ops should not be concurrent on the same `xprt`, however:
> 
> * `xprt_force_disconnect`, is called from various places, queues
>  a call to `xprt_autoclose` to the background on `xprtiod` workqueue item,
>  conditioned that `!XPRT_LOCKED` which is the case for connect that went
>  to the background.
> * `xprt_rdma_connect` also sends `xprt_rdma_connect_worker` as an `xprtiod`
>  workqueue item, unconditionally.
> 
> So we have two work items that can run in parallel, and I don't see
> clear gating on this from the code.

If close and connect are being called concurrently on the same xprt,
then there is a bug in the generic RPC xprt code. I don't believe
that to be the case here.

If xprtrdma invokes force_disconnect during connect processing,
XPRT_LOCKED should be held and the close should be delayed.


> Maybe there's a simpler fix for this. Perhaps a
> `cancel_delayed_work_sync(&r_xprt->rx_connect_worker);` is appropriate
> in `xprt_rdma_close`?

There are simpler, less deadlock-prone, and more justifiable fixes.
Please stand by, I will take care of this today.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-26 15:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-21 14:59 [PATCH] xprtrdma: Ensure connect worker is awoken after connect error Chuck Lever
2020-06-25 19:19 ` Chuck Lever
2020-06-26  7:10   ` [PATCH] xprtrdma: fix EP destruction logic Dan Aloni
2020-06-26 12:56     ` Chuck Lever
2020-06-26 15:10       ` Dan Aloni
2020-06-26 15:22         ` Chuck Lever
2020-06-26  7:17   ` [PATCH] xprtrdma: Ensure connect worker is awoken after connect error Dan Aloni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).