All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client
@ 2021-02-21 18:27 Timo Rothenpieler
  2021-02-21 19:26 ` Chuck Lever
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Rothenpieler @ 2021-02-21 18:27 UTC (permalink / raw)
  To: Linux NFS Mailing List; +Cc: Chuck Lever, Olga Kornievskaia, Timo Rothenpieler

This tackles an issue where the callback client times out from
inactivity, causing operations like server side copy to never return on
the client side.
I was observing that issue frequently on my RDMA connected clients, it
does not seem to manifest on tcp connected clients.

However, it does not fix the actual issue of the callback channel
not getting re-established and the client being stuck in the call
forever. It just makes it a lot less likely to occur, as long as no
other circumstances cause the callback channel to be disconnected.
---
 fs/nfsd/nfs4callback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 052be5bf9ef5..75dacb7878b8 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -897,7 +897,7 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
 		.timeout	= &timeparms,
 		.program	= &cb_program,
 		.version	= 1,
-		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET),
+		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET | RPC_CLNT_CREATE_NO_IDLE_TIMEOUT),
 		.cred		= current_cred(),
 	};
 	struct rpc_clnt *client;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client
  2021-02-21 18:27 [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client Timo Rothenpieler
@ 2021-02-21 19:26 ` Chuck Lever
  2021-02-21 21:13   ` Timo Rothenpieler
       [not found]   ` <3701466e-6c0a-93e1-1953-f2839b6fa37d@rothenpieler.org>
  0 siblings, 2 replies; 9+ messages in thread
From: Chuck Lever @ 2021-02-21 19:26 UTC (permalink / raw)
  To: Timo Rothenpieler; +Cc: Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo

Thanks, Timo. A handful of nits below:

> On Feb 21, 2021, at 1:27 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> This tackles an issue where the callback client times out from
> inactivity, causing operations like server side copy to never return on
> the client side.
> I was observing that issue frequently on my RDMA connected clients, it
> does not seem to manifest on tcp connected clients.

Indeed, it is curious that the COPY issue does not occur on TCP
connections. You could try using the same tracing technique to
collect some data on TCP to see what is different.


> However, it does not fix the actual issue of the callback channel
> not getting re-established and the client being stuck in the call
> forever. It just makes it a lot less likely to occur, as long as no
> other circumstances cause the callback channel to be disconnected.

Agreed. I'm hoping Olga or Dai will look further into why recovery
is failing in this case (and whether that missing recovery action
is also observed only on RDMA transports!).

Please add a Signed-off-by: tag. See the "Sign your work" section
in Documentation/process/submitting-patches.rst


> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 052be5bf9ef5..75dacb7878b8 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -897,7 +897,7 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
> 		.timeout	= &timeparms,
> 		.program	= &cb_program,
> 		.version	= 1,
> -		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET),
> +		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET | RPC_CLNT_CREATE_NO_IDLE_TIMEOUT),

Kernel coding style keeps lines at 80 characters or fewer. Please
find a way to keep the replacement line under 80 characters.


> 		.cred		= current_cred(),
> 	};
> 	struct rpc_clnt *client;
> -- 
> 2.25.1
> 

Once you have received other review comments, you might wish to
submit this patch again. Be sure to update the Subject: line to
say "[PATCH v2]".

--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client
  2021-02-21 19:26 ` Chuck Lever
@ 2021-02-21 21:13   ` Timo Rothenpieler
       [not found]   ` <3701466e-6c0a-93e1-1953-f2839b6fa37d@rothenpieler.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Timo Rothenpieler @ 2021-02-21 21:13 UTC (permalink / raw)
  To: Linux NFS Mailing List; +Cc: Chuck Lever, Olga Kornievskaia, Timo Rothenpieler

This tackles an issue where the callback client times out from
inactivity, causing operations like server side copy to never return on
the client side.
I was observing that issue frequently on my RDMA connected clients, it
does not seem to manifest on tcp connected clients.

However, it does not fix the actual issue of the callback channel
not getting re-established and the client being stuck in the call
forever. It just makes it a lot less likely to occur, as long as no
other circumstances cause the callback channel to be disconnected.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
---
 fs/nfsd/nfs4callback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 052be5bf9ef5..75dacb7878b8 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -897,7 +897,7 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
 		.timeout	= &timeparms,
 		.program	= &cb_program,
 		.version	= 1,
-		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET),
+		.flags		= (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET | RPC_CLNT_CREATE_NO_IDLE_TIMEOUT),
 		.cred		= current_cred(),
 	};
 	struct rpc_clnt *client;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client
       [not found]   ` <3701466e-6c0a-93e1-1953-f2839b6fa37d@rothenpieler.org>
@ 2021-02-22 21:47     ` Chuck Lever
  2021-02-22 23:36       ` [PATCH] svcrdma: disable timeouts on rdma backchannel Timo Rothenpieler
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2021-02-22 21:47 UTC (permalink / raw)
  To: Timo Rothenpieler; +Cc: Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo


> On Feb 22, 2021, at 4:03 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> On 21.02.2021 20:26, Chuck Lever wrote:
>> Thanks, Timo. A handful of nits below:
>>> On Feb 21, 2021, at 1:27 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>> 
>>> This tackles an issue where the callback client times out from
>>> inactivity, causing operations like server side copy to never return on
>>> the client side.
>>> I was observing that issue frequently on my RDMA connected clients, it
>>> does not seem to manifest on tcp connected clients.
>> Indeed, it is curious that the COPY issue does not occur on TCP
>> connections. You could try using the same tracing technique to
>> collect some data on TCP to see what is different.
> 
> I did pretty much the same procedure, mount, copy, wait 10 minutes, copy again, just with it mounted via TCP instead of RDMA.
> (And I made sure to boot the server on an unpatched kernel).
> Worked perfectly fine.
> The pcap and trace are attached.
> 
> To me it looks like it just never hits xprt_disconnect_auto, despite the no timeout flag not being set.
> Maybe it gets set implicitly somewhere for TCP? Or there is some keep alive going on for TCP, that never gives it a chance to time out.

net/sunrpc/xprtsock.c:

2999         /* backchannel */
3000         xprt_set_bound(xprt);
3001         xprt->bind_timeout = 0;
3002         xprt->reestablish_timeout = 0;
3003         xprt->idle_timeout = 0;

But no similar setting for net/sunrpc/xprtrdma/svc_rdma_backchannel.c:

253         xprt_set_bound(xprt);
254         xprt_set_connected(xprt);
255         xprt->bind_timeout = RPCRDMA_BIND_TO;
256         xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
257         xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;

And perhaps these have been wrong since 5d252f90a800 ("svcrdma: Add
class for RDMA backwards direction transport").

Try replacing your previous patch with one that changes all these
backchannel settings to 0. It should have the same effect as using
NO_IDLE_TIMEOUT.


>>> However, it does not fix the actual issue of the callback channel
>>> not getting re-established and the client being stuck in the call
>>> forever. It just makes it a lot less likely to occur, as long as no
>>> other circumstances cause the callback channel to be disconnected.
>> Agreed. I'm hoping Olga or Dai will look further into why recovery
>> is failing in this case (and whether that missing recovery action
>> is also observed only on RDMA transports!).
>> Please add a Signed-off-by: tag. See the "Sign your work" section
>> in Documentation/process/submitting-patches.rst
> 
> <trace.dat.xz><traffic.pcap.xz>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] svcrdma: disable timeouts on rdma backchannel
  2021-02-22 21:47     ` Chuck Lever
@ 2021-02-22 23:36       ` Timo Rothenpieler
  2021-02-24 14:18         ` Chuck Lever
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Rothenpieler @ 2021-02-22 23:36 UTC (permalink / raw)
  To: Linux NFS Mailing List
  Cc: Chuck Lever, Olga Kornievskaia, Dai Ngo, Timo Rothenpieler

This brings it in line with the regular tcp backchannel, which also has
all those timeouts disabled.

Prevents the backchannel from timing out, getting some async operations
like server side copying getting stuck indefinitely on the client side.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
---
Did the same testing with this applied than before, and could not
observe it getting stuck, same as with the previous patch, which I
removed before testing this one.

This obviously still does not fix the issue of it being seemingly unable
to reestablish the disconnected backchannel.
An event that disconnects the backchannel but leaves the main connection
intact seems a pretty rare occurance though, outside of this issue.

 net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 63f8be974df2..8186ab6f99f1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
 	xprt->timeout = &xprt_rdma_bc_timeout;
 	xprt_set_bound(xprt);
 	xprt_set_connected(xprt);
-	xprt->bind_timeout = RPCRDMA_BIND_TO;
-	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
-	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
+	xprt->bind_timeout = 0;
+	xprt->reestablish_timeout = 0;
+	xprt->idle_timeout = 0;
 
 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
 	xprt->ops = &xprt_rdma_bc_procs;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] svcrdma: disable timeouts on rdma backchannel
  2021-02-22 23:36       ` [PATCH] svcrdma: disable timeouts on rdma backchannel Timo Rothenpieler
@ 2021-02-24 14:18         ` Chuck Lever
  2021-02-24 20:02           ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2021-02-24 14:18 UTC (permalink / raw)
  To: Timo Rothenpieler; +Cc: Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo



> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> This brings it in line with the regular tcp backchannel, which also has
> all those timeouts disabled.
> 
> Prevents the backchannel from timing out, getting some async operations
> like server side copying getting stuck indefinitely on the client side.
> 
> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>

Thanks for your patch! I've included it in the for-rc branch at

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git


> ---
> Did the same testing with this applied than before, and could not
> observe it getting stuck, same as with the previous patch, which I
> removed before testing this one.
> 
> This obviously still does not fix the issue of it being seemingly unable
> to reestablish the disconnected backchannel.
> An event that disconnects the backchannel but leaves the main connection
> intact seems a pretty rare occurance though, outside of this issue.
> 
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index 63f8be974df2..8186ab6f99f1 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> 	xprt->timeout = &xprt_rdma_bc_timeout;
> 	xprt_set_bound(xprt);
> 	xprt_set_connected(xprt);
> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> +	xprt->bind_timeout = 0;
> +	xprt->reestablish_timeout = 0;
> +	xprt->idle_timeout = 0;
> 
> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> 	xprt->ops = &xprt_rdma_bc_procs;
> -- 
> 2.25.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] svcrdma: disable timeouts on rdma backchannel
  2021-02-24 14:18         ` Chuck Lever
@ 2021-02-24 20:02           ` J. Bruce Fields
  2021-02-24 20:03             ` Chuck Lever
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2021-02-24 20:02 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Timo Rothenpieler, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo

On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
> 
> 
> > On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> > 
> > This brings it in line with the regular tcp backchannel, which also has
> > all those timeouts disabled.
> > 
> > Prevents the backchannel from timing out, getting some async operations
> > like server side copying getting stuck indefinitely on the client side.
> > 
> > Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
> 
> Thanks for your patch! I've included it in the for-rc branch at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

So, I'm sure this patch makes sense.

But I'm also curious why it's not recovering.

What I think should happen:

	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.
	- This should cause the next SEQUENCE reply to have
	  SEQ4_STATUS_CB_PATH_DOWN set.
	- That should poke the client to recover.  (Maybe by sending a
	  BIND_CONN_TO_SESSION call?)

I'd be curious whether any of that's actually happening.

--b.

> 
> 
> > ---
> > Did the same testing with this applied than before, and could not
> > observe it getting stuck, same as with the previous patch, which I
> > removed before testing this one.
> > 
> > This obviously still does not fix the issue of it being seemingly unable
> > to reestablish the disconnected backchannel.
> > An event that disconnects the backchannel but leaves the main connection
> > intact seems a pretty rare occurance though, outside of this issue.
> > 
> > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > index 63f8be974df2..8186ab6f99f1 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> > 	xprt->timeout = &xprt_rdma_bc_timeout;
> > 	xprt_set_bound(xprt);
> > 	xprt_set_connected(xprt);
> > -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> > -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> > -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> > +	xprt->bind_timeout = 0;
> > +	xprt->reestablish_timeout = 0;
> > +	xprt->idle_timeout = 0;
> > 
> > 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> > 	xprt->ops = &xprt_rdma_bc_procs;
> > -- 
> > 2.25.1
> > 
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] svcrdma: disable timeouts on rdma backchannel
  2021-02-24 20:02           ` J. Bruce Fields
@ 2021-02-24 20:03             ` Chuck Lever
  2021-02-24 20:08               ` Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2021-02-24 20:03 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Timo Rothenpieler, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo



> On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
>> 
>> 
>>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>> 
>>> This brings it in line with the regular tcp backchannel, which also has
>>> all those timeouts disabled.
>>> 
>>> Prevents the backchannel from timing out, getting some async operations
>>> like server side copying getting stuck indefinitely on the client side.
>>> 
>>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
>> 
>> Thanks for your patch! I've included it in the for-rc branch at
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> 
> So, I'm sure this patch makes sense.
> 
> But I'm also curious why it's not recovering.

Agreed. This patch is not a substitute for proper callback channel recovery.


> What I think should happen:
> 
> 	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.

I think it's set to FAULT.


> 	- This should cause the next SEQUENCE reply to have
> 	  SEQ4_STATUS_CB_PATH_DOWN set.
> 	- That should poke the client to recover.  (Maybe by sending a
> 	  BIND_CONN_TO_SESSION call?)
> 
> I'd be curious whether any of that's actually happening.
> 
> --b.
> 
>> 
>> 
>>> ---
>>> Did the same testing with this applied than before, and could not
>>> observe it getting stuck, same as with the previous patch, which I
>>> removed before testing this one.
>>> 
>>> This obviously still does not fix the issue of it being seemingly unable
>>> to reestablish the disconnected backchannel.
>>> An event that disconnects the backchannel but leaves the main connection
>>> intact seems a pretty rare occurance though, outside of this issue.
>>> 
>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> index 63f8be974df2..8186ab6f99f1 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
>>> 	xprt->timeout = &xprt_rdma_bc_timeout;
>>> 	xprt_set_bound(xprt);
>>> 	xprt_set_connected(xprt);
>>> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
>>> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
>>> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
>>> +	xprt->bind_timeout = 0;
>>> +	xprt->reestablish_timeout = 0;
>>> +	xprt->idle_timeout = 0;
>>> 
>>> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
>>> 	xprt->ops = &xprt_rdma_bc_procs;
>>> -- 
>>> 2.25.1
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] svcrdma: disable timeouts on rdma backchannel
  2021-02-24 20:03             ` Chuck Lever
@ 2021-02-24 20:08               ` Bruce Fields
  0 siblings, 0 replies; 9+ messages in thread
From: Bruce Fields @ 2021-02-24 20:08 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Timo Rothenpieler, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo

On Wed, Feb 24, 2021 at 08:03:54PM +0000, Chuck Lever wrote:
> 
> 
> > On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
> >> 
> >> 
> >>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> >>> 
> >>> This brings it in line with the regular tcp backchannel, which also has
> >>> all those timeouts disabled.
> >>> 
> >>> Prevents the backchannel from timing out, getting some async operations
> >>> like server side copying getting stuck indefinitely on the client side.
> >>> 
> >>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
> >> 
> >> Thanks for your patch! I've included it in the for-rc branch at
> >> 
> >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> > 
> > So, I'm sure this patch makes sense.
> > 
> > But I'm also curious why it's not recovering.
> 
> Agreed. This patch is not a substitute for proper callback channel recovery.
> 
> 
> > What I think should happen:
> > 
> > 	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.
> 
> I think it's set to FAULT.

OK.  The result should be similar in that case, but SEQUENCE gets the
SEQ4_STATUS_BACKCHANNEL_FAULT flag set instead.

--b.

> 
> 
> > 	- This should cause the next SEQUENCE reply to have
> > 	  SEQ4_STATUS_CB_PATH_DOWN set.
> > 	- That should poke the client to recover.  (Maybe by sending a
> > 	  BIND_CONN_TO_SESSION call?)
> > 
> > I'd be curious whether any of that's actually happening.
> > 
> > --b.
> > 
> >> 
> >> 
> >>> ---
> >>> Did the same testing with this applied than before, and could not
> >>> observe it getting stuck, same as with the previous patch, which I
> >>> removed before testing this one.
> >>> 
> >>> This obviously still does not fix the issue of it being seemingly unable
> >>> to reestablish the disconnected backchannel.
> >>> An event that disconnects the backchannel but leaves the main connection
> >>> intact seems a pretty rare occurance though, outside of this issue.
> >>> 
> >>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>> 
> >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> index 63f8be974df2..8186ab6f99f1 100644
> >>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> >>> 	xprt->timeout = &xprt_rdma_bc_timeout;
> >>> 	xprt_set_bound(xprt);
> >>> 	xprt_set_connected(xprt);
> >>> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> >>> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> >>> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> >>> +	xprt->bind_timeout = 0;
> >>> +	xprt->reestablish_timeout = 0;
> >>> +	xprt->idle_timeout = 0;
> >>> 
> >>> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> >>> 	xprt->ops = &xprt_rdma_bc_procs;
> >>> -- 
> >>> 2.25.1
> >>> 
> >> 
> >> --
> >> Chuck Lever
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-24 20:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-21 18:27 [PATCH] nfsd: set RPC_CLNT_CREATE_NO_IDLE_TIMEOUT on callback client Timo Rothenpieler
2021-02-21 19:26 ` Chuck Lever
2021-02-21 21:13   ` Timo Rothenpieler
     [not found]   ` <3701466e-6c0a-93e1-1953-f2839b6fa37d@rothenpieler.org>
2021-02-22 21:47     ` Chuck Lever
2021-02-22 23:36       ` [PATCH] svcrdma: disable timeouts on rdma backchannel Timo Rothenpieler
2021-02-24 14:18         ` Chuck Lever
2021-02-24 20:02           ` J. Bruce Fields
2021-02-24 20:03             ` Chuck Lever
2021-02-24 20:08               ` Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.