All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] SUNRPC: Fix a potential race in xprt_connect()
@ 2018-12-02 14:42 Trond Myklebust
  2018-12-02 17:07 ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2018-12-02 14:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

If an asynchronous connection attempt completes while another task is
in xprt_connect(), then the call to rpc_sleep_on() could end up
racing with the call to xprt_wake_pending_tasks().
So add a second test of the connection state after we've put the
task to sleep and set the XPRT_CONNECTING flag, when we know that there
can be no asynchronous connection attempts still in progress.

Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/xprt.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 122c91c28e7c..ce927002862a 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
 			return;
 		if (xprt_test_and_set_connecting(xprt))
 			return;
-		xprt->stat.connect_start = jiffies;
-		xprt->ops->connect(xprt, task);
+		/* Race breaker */
+		if (!xprt_connected(xprt)) {
+			xprt->stat.connect_start = jiffies;
+			xprt->ops->connect(xprt, task);
+		} else {
+			xprt_clear_connecting(xprt);
+			task->tk_status = 0;
+			rpc_wake_up_queued_task(&xprt->pending, task);
+		}
 	}
 	xprt_release_write(xprt, task);
 }
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-02 14:42 [PATCH] SUNRPC: Fix a potential race in xprt_connect() Trond Myklebust
@ 2018-12-02 17:07 ` Chuck Lever
  2018-12-02 17:26   ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2018-12-02 17:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List

Test report.

I applied this on top of the three v4 patches from Friday.

With TCP, I no longer see the soft IRQ warnings or bdev leaks.
However, at some point during my test (sec=krb5p,vers=3) the
mount stops responding and the client becomes idle. ^C leaves
the iozone processes in D state.


With RDMA, I can see a few issues:
 - A lot of -74 and server disconnects. I suspect this is due
   to GSS sequence window overruns
 - xprt_connect puts the calling task to sleep if the credit
   limit has been reached, and that can result in a deadlock
 - The reconnect logic appears to busy-wait until something
   else (eg, an RTO) forces a reconnect
 - A connect or disconnect wake-up that occurs while an RPC
   reply is being handled (ie, is still on xprt->pending)
   results in that RPC being retried unnecessarily.

I'm not sure how to address that last one. Sometimes RDMA has
to invalidate MRs, and that involves a wait/context swith which
opens the race window significantly.


> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com> wrote:
> 
> If an asynchronous connection attempt completes while another task is
> in xprt_connect(), then the call to rpc_sleep_on() could end up
> racing with the call to xprt_wake_pending_tasks().
> So add a second test of the connection state after we've put the
> task to sleep and set the XPRT_CONNECTING flag, when we know that there
> can be no asynchronous connection attempts still in progress.
> 
> Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> net/sunrpc/xprt.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 122c91c28e7c..ce927002862a 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
> 			return;
> 		if (xprt_test_and_set_connecting(xprt))
> 			return;
> -		xprt->stat.connect_start = jiffies;
> -		xprt->ops->connect(xprt, task);
> +		/* Race breaker */
> +		if (!xprt_connected(xprt)) {
> +			xprt->stat.connect_start = jiffies;
> +			xprt->ops->connect(xprt, task);
> +		} else {
> +			xprt_clear_connecting(xprt);
> +			task->tk_status = 0;
> +			rpc_wake_up_queued_task(&xprt->pending, task);
> +		}
> 	}
> 	xprt_release_write(xprt, task);
> }
> -- 
> 2.19.2
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-02 17:07 ` Chuck Lever
@ 2018-12-02 17:26   ` Trond Myklebust
  2018-12-03 15:56     ` Chuck Lever
  2018-12-04 20:04     ` Chuck Lever
  0 siblings, 2 replies; 7+ messages in thread
From: Trond Myklebust @ 2018-12-02 17:26 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
> Test report.
> 
> I applied this on top of the three v4 patches from Friday.
> 
> With TCP, I no longer see the soft IRQ warnings or bdev leaks.
> However, at some point during my test (sec=krb5p,vers=3) the
> mount stops responding and the client becomes idle. ^C leaves
> the iozone processes in D state.

Does /proc/*/stack show any tasks with any interesting stack behaviour?

> 
> With RDMA, I can see a few issues:
>  - A lot of -74 and server disconnects. I suspect this is due
>    to GSS sequence window overruns
>  - xprt_connect puts the calling task to sleep if the credit
>    limit has been reached, and that can result in a deadlock
>  - The reconnect logic appears to busy-wait until something
>    else (eg, an RTO) forces a reconnect
>  - A connect or disconnect wake-up that occurs while an RPC
>    reply is being handled (ie, is still on xprt->pending)
>    results in that RPC being retried unnecessarily.
> 
> I'm not sure how to address that last one. Sometimes RDMA has
> to invalidate MRs, and that involves a wait/context swith which
> opens the race window significantly.

In principle the latter issue is supposed to be handled by the connect
cookies. Are the updates perhaps being ordered incorrectly w.r.t. the
xprt->pending wakeup? If so, then that could cause races.

I'd expect the cookie would need to be updated before the call to xprt-
>pending both on connect and disconnect.

> 
> > On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
> > wrote:
> > 
> > If an asynchronous connection attempt completes while another task
> > is
> > in xprt_connect(), then the call to rpc_sleep_on() could end up
> > racing with the call to xprt_wake_pending_tasks().
> > So add a second test of the connection state after we've put the
> > task to sleep and set the XPRT_CONNECTING flag, when we know that
> > there
> > can be no asynchronous connection attempts still in progress.
> > 
> > Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING
> > into...")
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > net/sunrpc/xprt.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index 122c91c28e7c..ce927002862a 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
> > 			return;
> > 		if (xprt_test_and_set_connecting(xprt))
> > 			return;
> > -		xprt->stat.connect_start = jiffies;
> > -		xprt->ops->connect(xprt, task);
> > +		/* Race breaker */
> > +		if (!xprt_connected(xprt)) {
> > +			xprt->stat.connect_start = jiffies;
> > +			xprt->ops->connect(xprt, task);
> > +		} else {
> > +			xprt_clear_connecting(xprt);
> > +			task->tk_status = 0;
> > +			rpc_wake_up_queued_task(&xprt->pending, task);
> > +		}
> > 	}
> > 	xprt_release_write(xprt, task);
> > }
> > -- 
> > 2.19.2
> > 
> 
> --
> Chuck Lever
> 
> 
> 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-02 17:26   ` Trond Myklebust
@ 2018-12-03 15:56     ` Chuck Lever
  2018-12-03 17:15       ` Trond Myklebust
  2018-12-04 20:04     ` Chuck Lever
  1 sibling, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2018-12-03 15:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List



> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@gmail.com> wrote:
> 
> On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
>> Test report.
>> 
>> I applied this on top of the three v4 patches from Friday.
>> 
>> With TCP, I no longer see the soft IRQ warnings or bdev leaks.
>> However, at some point during my test (sec=krb5p,vers=3) the
>> mount stops responding and the client becomes idle. ^C leaves
>> the iozone processes in D state.
> 
> Does /proc/*/stack show any tasks with any interesting stack behaviour?
> 
>> 
>> With RDMA, I can see a few issues:
>> - A lot of -74 and server disconnects. I suspect this is due
>>   to GSS sequence window overruns
>> - xprt_connect puts the calling task to sleep if the credit
>>   limit has been reached, and that can result in a deadlock
>> - The reconnect logic appears to busy-wait until something
>>   else (eg, an RTO) forces a reconnect
>> - A connect or disconnect wake-up that occurs while an RPC
>>   reply is being handled (ie, is still on xprt->pending)
>>   results in that RPC being retried unnecessarily.
>> 
>> I'm not sure how to address that last one. Sometimes RDMA has
>> to invalidate MRs, and that involves a wait/context swith which
>> opens the race window significantly.
> 
> In principle the latter issue is supposed to be handled by the connect
> cookies. Are the updates perhaps being ordered incorrectly w.r.t. the
> xprt->pending wakeup? If so, then that could cause races.
> 
> I'd expect the cookie would need to be updated before the call to xprt-
> pending both on connect and disconnect.

I've found one peculiar cookie-related behavior.

At some point, xprt_request_retransmit_after_disconnect() starts
to claim that all requests have to be retransmitted and that the
connection is closed. Closer inspection shows that the cookie
check here is failing because it seems to be testing a request
that hasn't been transmitted yet.


>>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>> 
>>> If an asynchronous connection attempt completes while another task
>>> is
>>> in xprt_connect(), then the call to rpc_sleep_on() could end up
>>> racing with the call to xprt_wake_pending_tasks().
>>> So add a second test of the connection state after we've put the
>>> task to sleep and set the XPRT_CONNECTING flag, when we know that
>>> there
>>> can be no asynchronous connection attempts still in progress.
>>> 
>>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING
>>> into...")
>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>>> ---
>>> net/sunrpc/xprt.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index 122c91c28e7c..ce927002862a 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
>>> 			return;
>>> 		if (xprt_test_and_set_connecting(xprt))
>>> 			return;
>>> -		xprt->stat.connect_start = jiffies;
>>> -		xprt->ops->connect(xprt, task);
>>> +		/* Race breaker */
>>> +		if (!xprt_connected(xprt)) {
>>> +			xprt->stat.connect_start = jiffies;
>>> +			xprt->ops->connect(xprt, task);
>>> +		} else {
>>> +			xprt_clear_connecting(xprt);
>>> +			task->tk_status = 0;
>>> +			rpc_wake_up_queued_task(&xprt->pending, task);
>>> +		}
>>> 	}
>>> 	xprt_release_write(xprt, task);
>>> }
>>> -- 
>>> 2.19.2
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-03 15:56     ` Chuck Lever
@ 2018-12-03 17:15       ` Trond Myklebust
  2018-12-03 19:00         ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2018-12-03 17:15 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs

On Mon, 2018-12-03 at 10:56 -0500, Chuck Lever wrote:
> > On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@gmail.com>
> > wrote:
> > 
> > On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
> > > Test report.
> > > 
> > > I applied this on top of the three v4 patches from Friday.
> > > 
> > > With TCP, I no longer see the soft IRQ warnings or bdev leaks.
> > > However, at some point during my test (sec=krb5p,vers=3) the
> > > mount stops responding and the client becomes idle. ^C leaves
> > > the iozone processes in D state.
> > 
> > Does /proc/*/stack show any tasks with any interesting stack
> > behaviour?
> > 
> > > With RDMA, I can see a few issues:
> > > - A lot of -74 and server disconnects. I suspect this is due
> > >   to GSS sequence window overruns
> > > - xprt_connect puts the calling task to sleep if the credit
> > >   limit has been reached, and that can result in a deadlock
> > > - The reconnect logic appears to busy-wait until something
> > >   else (eg, an RTO) forces a reconnect
> > > - A connect or disconnect wake-up that occurs while an RPC
> > >   reply is being handled (ie, is still on xprt->pending)
> > >   results in that RPC being retried unnecessarily.
> > > 
> > > I'm not sure how to address that last one. Sometimes RDMA has
> > > to invalidate MRs, and that involves a wait/context swith which
> > > opens the race window significantly.
> > 
> > In principle the latter issue is supposed to be handled by the
> > connect
> > cookies. Are the updates perhaps being ordered incorrectly w.r.t.
> > the
> > xprt->pending wakeup? If so, then that could cause races.
> > 
> > I'd expect the cookie would need to be updated before the call to
> > xprt-
> > pending both on connect and disconnect.
> 
> I've found one peculiar cookie-related behavior.
> 
> At some point, xprt_request_retransmit_after_disconnect() starts
> to claim that all requests have to be retransmitted and that the
> connection is closed. Closer inspection shows that the cookie
> check here is failing because it seems to be testing a request
> that hasn't been transmitted yet.

Right, but as far as the transport layer is concerned, there should be
no difference between 'need transmit' and 'need retransmit'. It is up
to the higher RPC client layer to decide whether or not there is a
policy difference in behaviour, and AFAICS rpc_task_need_encode()
should be doing the right thing.

> > > > On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
> > > > wrote:
> > > > 
> > > > If an asynchronous connection attempt completes while another
> > > > task
> > > > is
> > > > in xprt_connect(), then the call to rpc_sleep_on() could end up
> > > > racing with the call to xprt_wake_pending_tasks().
> > > > So add a second test of the connection state after we've put
> > > > the
> > > > task to sleep and set the XPRT_CONNECTING flag, when we know
> > > > that
> > > > there
> > > > can be no asynchronous connection attempts still in progress.
> > > > 
> > > > Fixes: 0b9e79431377d ("SUNRPC: Move the test for
> > > > XPRT_CONNECTING
> > > > into...")
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com
> > > > >
> > > > ---
> > > > net/sunrpc/xprt.c | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > index 122c91c28e7c..ce927002862a 100644
> > > > --- a/net/sunrpc/xprt.c
> > > > +++ b/net/sunrpc/xprt.c
> > > > @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
> > > > 			return;
> > > > 		if (xprt_test_and_set_connecting(xprt))
> > > > 			return;
> > > > -		xprt->stat.connect_start = jiffies;
> > > > -		xprt->ops->connect(xprt, task);
> > > > +		/* Race breaker */
> > > > +		if (!xprt_connected(xprt)) {
> > > > +			xprt->stat.connect_start = jiffies;
> > > > +			xprt->ops->connect(xprt, task);
> > > > +		} else {
> > > > +			xprt_clear_connecting(xprt);
> > > > +			task->tk_status = 0;
> > > > +			rpc_wake_up_queued_task(&xprt->pending, 
> > > > task);
> > > > +		}
> > > > 	}
> > > > 	xprt_release_write(xprt, task);
> > > > }
> > > > -- 
> > > > 2.19.2
> > > > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-03 17:15       ` Trond Myklebust
@ 2018-12-03 19:00         ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2018-12-03 19:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List



> On Dec 3, 2018, at 12:15 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2018-12-03 at 10:56 -0500, Chuck Lever wrote:
>>> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>> 
>>> On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
>>>> Test report.
>>>> 
>>>> I applied this on top of the three v4 patches from Friday.
>>>> 
>>>> With TCP, I no longer see the soft IRQ warnings or bdev leaks.
>>>> However, at some point during my test (sec=krb5p,vers=3) the
>>>> mount stops responding and the client becomes idle. ^C leaves
>>>> the iozone processes in D state.
>>> 
>>> Does /proc/*/stack show any tasks with any interesting stack
>>> behaviour?
>>> 
>>>> With RDMA, I can see a few issues:
>>>> - A lot of -74 and server disconnects. I suspect this is due
>>>>  to GSS sequence window overruns
>>>> - xprt_connect puts the calling task to sleep if the credit
>>>>  limit has been reached, and that can result in a deadlock
>>>> - The reconnect logic appears to busy-wait until something
>>>>  else (eg, an RTO) forces a reconnect
>>>> - A connect or disconnect wake-up that occurs while an RPC
>>>>  reply is being handled (ie, is still on xprt->pending)
>>>>  results in that RPC being retried unnecessarily.
>>>> 
>>>> I'm not sure how to address that last one. Sometimes RDMA has
>>>> to invalidate MRs, and that involves a wait/context swith which
>>>> opens the race window significantly.
>>> 
>>> In principle the latter issue is supposed to be handled by the
>>> connect
>>> cookies. Are the updates perhaps being ordered incorrectly w.r.t.
>>> the
>>> xprt->pending wakeup? If so, then that could cause races.
>>> 
>>> I'd expect the cookie would need to be updated before the call to
>>> xprt-
>>> pending both on connect and disconnect.
>> 
>> I've found one peculiar cookie-related behavior.
>> 
>> At some point, xprt_request_retransmit_after_disconnect() starts
>> to claim that all requests have to be retransmitted and that the
>> connection is closed. Closer inspection shows that the cookie
>> check here is failing because it seems to be testing a request
>> that hasn't been transmitted yet.
> 
> Right, but as far as the transport layer is concerned, there should be
> no difference between 'need transmit' and 'need retransmit'. It is up
> to the higher RPC client layer to decide whether or not there is a
> policy difference in behaviour, and AFAICS rpc_task_need_encode()
> should be doing the right thing.

OK, thanks for clearing that up. A couple of observations:

- The sleep followed immediately by a wake-up tells me you don't
expect this to happen frequently. But with sec=krb5 flavors, I
do see this path taken quite a bit. Sounds like I need to figure
out why that is.

- returning EAGAIN would make a whole lot more sense in these
cases since the connection is frequently still whole.


>>>>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> If an asynchronous connection attempt completes while another
>>>>> task
>>>>> is
>>>>> in xprt_connect(), then the call to rpc_sleep_on() could end up
>>>>> racing with the call to xprt_wake_pending_tasks().
>>>>> So add a second test of the connection state after we've put
>>>>> the
>>>>> task to sleep and set the XPRT_CONNECTING flag, when we know
>>>>> that
>>>>> there
>>>>> can be no asynchronous connection attempts still in progress.
>>>>> 
>>>>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for
>>>>> XPRT_CONNECTING
>>>>> into...")
>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com
>>>>>> 
>>>>> ---
>>>>> net/sunrpc/xprt.c | 11 +++++++++--
>>>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>>>> index 122c91c28e7c..ce927002862a 100644
>>>>> --- a/net/sunrpc/xprt.c
>>>>> +++ b/net/sunrpc/xprt.c
>>>>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
>>>>> 			return;
>>>>> 		if (xprt_test_and_set_connecting(xprt))
>>>>> 			return;
>>>>> -		xprt->stat.connect_start = jiffies;
>>>>> -		xprt->ops->connect(xprt, task);
>>>>> +		/* Race breaker */
>>>>> +		if (!xprt_connected(xprt)) {
>>>>> +			xprt->stat.connect_start = jiffies;
>>>>> +			xprt->ops->connect(xprt, task);
>>>>> +		} else {
>>>>> +			xprt_clear_connecting(xprt);
>>>>> +			task->tk_status = 0;
>>>>> +			rpc_wake_up_queued_task(&xprt->pending, 
>>>>> task);
>>>>> +		}
>>>>> 	}
>>>>> 	xprt_release_write(xprt, task);
>>>>> }
>>>>> -- 
>>>>> 2.19.2
>>>>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
  2018-12-02 17:26   ` Trond Myklebust
  2018-12-03 15:56     ` Chuck Lever
@ 2018-12-04 20:04     ` Chuck Lever
  1 sibling, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2018-12-04 20:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List



> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@gmail.com> wrote:
> 
> On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
>> Test report.
>> 
>> I applied this on top of the three v4 patches from Friday.
>> 
>> With TCP, I no longer see the soft IRQ warnings or bdev leaks.
>> However, at some point during my test (sec=krb5p,vers=3) the
>> mount stops responding and the client becomes idle. ^C leaves
>> the iozone processes in D state.
> 
> Does /proc/*/stack show any tasks with any interesting stack behaviour?
> 
>> 
>> With RDMA, I can see a few issues:
>> - A lot of -74 and server disconnects. I suspect this is due
>>   to GSS sequence window overruns
>> - xprt_connect puts the calling task to sleep if the credit
>>   limit has been reached, and that can result in a deadlock
>> - The reconnect logic appears to busy-wait until something
>>   else (eg, an RTO) forces a reconnect
>> - A connect or disconnect wake-up that occurs while an RPC
>>   reply is being handled (ie, is still on xprt->pending)
>>   results in that RPC being retried unnecessarily.
>> 
>> I'm not sure how to address that last one. Sometimes RDMA has
>> to invalidate MRs, and that involves a wait/context swith which
>> opens the race window significantly.
> 
> In principle the latter issue is supposed to be handled by the connect
> cookies. Are the updates perhaps being ordered incorrectly w.r.t. the
> xprt->pending wakeup? If so, then that could cause races.
> 
> I'd expect the cookie would need to be updated before the call to xprt-
> pending both on connect and disconnect.

xprtrdma does this in rpcrdma_cm_event_handler.

I'm at a loss to understand how connect cookies can help. During a
disconnect, rpc_wake_pending_tasks sets tk_status to -EAGAIN, but a
subsequent xprt_complete_rqst does not set tk_status. When
xprt_complete_rqst wakes up the waiting RPC task, that task sees
-EAGAIN rather than a successful RPC completion.


>>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>> 
>>> If an asynchronous connection attempt completes while another task
>>> is
>>> in xprt_connect(), then the call to rpc_sleep_on() could end up
>>> racing with the call to xprt_wake_pending_tasks().
>>> So add a second test of the connection state after we've put the
>>> task to sleep and set the XPRT_CONNECTING flag, when we know that
>>> there
>>> can be no asynchronous connection attempts still in progress.
>>> 
>>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING
>>> into...")
>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>>> ---
>>> net/sunrpc/xprt.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index 122c91c28e7c..ce927002862a 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
>>> 			return;
>>> 		if (xprt_test_and_set_connecting(xprt))
>>> 			return;
>>> -		xprt->stat.connect_start = jiffies;
>>> -		xprt->ops->connect(xprt, task);
>>> +		/* Race breaker */
>>> +		if (!xprt_connected(xprt)) {
>>> +			xprt->stat.connect_start = jiffies;
>>> +			xprt->ops->connect(xprt, task);
>>> +		} else {
>>> +			xprt_clear_connecting(xprt);
>>> +			task->tk_status = 0;
>>> +			rpc_wake_up_queued_task(&xprt->pending, task);
>>> +		}
>>> 	}
>>> 	xprt_release_write(xprt, task);
>>> }
>>> -- 
>>> 2.19.2
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-12-04 20:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-02 14:42 [PATCH] SUNRPC: Fix a potential race in xprt_connect() Trond Myklebust
2018-12-02 17:07 ` Chuck Lever
2018-12-02 17:26   ` Trond Myklebust
2018-12-03 15:56     ` Chuck Lever
2018-12-03 17:15       ` Trond Myklebust
2018-12-03 19:00         ` Chuck Lever
2018-12-04 20:04     ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.