* [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" @ 2016-08-03 23:33 NeilBrown 2016-08-04 0:01 ` Trond Myklebust 0 siblings, 1 reply; 7+ messages in thread From: NeilBrown @ 2016-08-03 23:33 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing list [-- Attachment #1: Type: text/plain, Size: 989 bytes --] This reverts commit 4b0ab51db32eba0f48b7618254742f143364a28d. This change causes 'rpc.nfsd' to hang for long time if rpcbind is not available. If --no-nfs-version 2 --no-nfs-version 3 is given, the delay is about 6.5 minutes. When trying to register all versions, the delay is over half an hour. Before this commit, and after reverting it, nfsd fails (when v3 is requested) or succeeds (when only v4 is requested) immediately. Signed-off-by: NeilBrown <neilb@suse.com> --- net/sunrpc/xprtsock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 111767ab124a..2a938055e95b 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -795,6 +795,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** -- 2.9.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-08-03 23:33 [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" NeilBrown @ 2016-08-04 0:01 ` Trond Myklebust 2016-08-04 2:35 ` NeilBrown 2016-08-10 2:05 ` NeilBrown 0 siblings, 2 replies; 7+ messages in thread From: Trond Myklebust @ 2016-08-04 0:01 UTC (permalink / raw) To: NeilBrown; +Cc: Linux List > On Aug 3, 2016, at 19:33, NeilBrown <neilb@suse.com> wrote: > > > This reverts commit 4b0ab51db32eba0f48b7618254742f143364a28d. > > This change causes 'rpc.nfsd' to hang for long time if rpcbind is not > available. > If > --no-nfs-version 2 --no-nfs-version 3 > is given, the delay is about 6.5 minutes. When trying to register > all versions, the delay is over half an hour. > Before this commit, and after reverting it, nfsd fails (when v3 is > requested) or succeeds (when only v4 is requested) immediately. > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > net/sunrpc/xprtsock.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 111767ab124a..2a938055e95b 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -795,6 +795,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) > xs_sock_reset_connection_flags(xprt); > /* Mark transport as closed and wake up all pending tasks */ > xprt_disconnect_done(xprt); > + xprt_force_disconnect(xprt); > } > If there is an outstanding request, then _that_ is supposed to redrive the connection. Why is the xprt_disconnect_done() not functioning as per the comment above it? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-08-04 0:01 ` Trond Myklebust @ 2016-08-04 2:35 ` NeilBrown 2016-08-10 2:05 ` NeilBrown 1 sibling, 0 replies; 7+ messages in thread From: NeilBrown @ 2016-08-04 2:35 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux List [-- Attachment #1: Type: text/plain, Size: 1727 bytes --] On Thu, Aug 04 2016, Trond Myklebust wrote: >> On Aug 3, 2016, at 19:33, NeilBrown <neilb@suse.com> wrote: >> >> >> This reverts commit 4b0ab51db32eba0f48b7618254742f143364a28d. >> >> This change causes 'rpc.nfsd' to hang for long time if rpcbind is not >> available. >> If >> --no-nfs-version 2 --no-nfs-version 3 >> is given, the delay is about 6.5 minutes. When trying to register >> all versions, the delay is over half an hour. >> Before this commit, and after reverting it, nfsd fails (when v3 is >> requested) or succeeds (when only v4 is requested) immediately. >> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> net/sunrpc/xprtsock.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index 111767ab124a..2a938055e95b 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -795,6 +795,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) >> xs_sock_reset_connection_flags(xprt); >> /* Mark transport as closed and wake up all pending tasks */ >> xprt_disconnect_done(xprt); >> + xprt_force_disconnect(xprt); >> } >> > > If there is an outstanding request, then _that_ is supposed to redrive the connection. Why is the xprt_disconnect_done() not functioning as per the comment above it? No idea, sorry. I haven't tried to follow though exactly what is happening. The symptom is trivial to reproduce: rpc.nfsd 0 killall rpcbind rpc.nfsd 8 This will either complete immediately with kernel log messages, or block for a long time. I might try to figure out what is happening, but I suspect there are others who know this code a lot better than I do. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-08-04 0:01 ` Trond Myklebust 2016-08-04 2:35 ` NeilBrown @ 2016-08-10 2:05 ` NeilBrown 2016-11-18 4:48 ` NeilBrown 2016-11-18 14:31 ` Trond Myklebust 1 sibling, 2 replies; 7+ messages in thread From: NeilBrown @ 2016-08-10 2:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux List [-- Attachment #1: Type: text/plain, Size: 5453 bytes --] On Thu, Aug 04 2016, Trond Myklebust wrote: >> On Aug 3, 2016, at 19:33, NeilBrown <neilb@suse.com> wrote: >> >> >> This reverts commit 4b0ab51db32eba0f48b7618254742f143364a28d. >> >> This change causes 'rpc.nfsd' to hang for long time if rpcbind is not >> available. >> If >> --no-nfs-version 2 --no-nfs-version 3 >> is given, the delay is about 6.5 minutes. When trying to register >> all versions, the delay is over half an hour. >> Before this commit, and after reverting it, nfsd fails (when v3 is >> requested) or succeeds (when only v4 is requested) immediately. >> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> net/sunrpc/xprtsock.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index 111767ab124a..2a938055e95b 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -795,6 +795,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) >> xs_sock_reset_connection_flags(xprt); >> /* Mark transport as closed and wake up all pending tasks */ >> xprt_disconnect_done(xprt); >> + xprt_force_disconnect(xprt); >> } >> > > If there is an outstanding request, then _that_ is supposed to redrive the connection. Why is the xprt_disconnect_done() not functioning as per the comment above it? I made some time to look at this and have a somewhat clearer understanding. The rpcbind code makes about 24 requests, include NULL pings, UNSET requests to clear any old registrations, and SET requests. The first (a NULL) fails immediately due to ECONNREFUSED. Others aren't even attempted for a growing number of seconds because xs_connect() finds that transport->sock is not NULL, so it delays before trying. Once it tries, it failed immediately and we move on to the next request. transport->sock is set to NULL by xs_reset_transport() which xs_connect() calls as part of the the timeout, but ->reestablish_timeout is not reset. The next time xs_connect is called, ->sock will be non-NULL again as transport->reestablish_timeout will be longer. transport->sock is only set to NULL *before* the test in xs_connect() by a call to xs_close() or ->close(). But nothing in triggering this. xs_sock_mark_closed() currently doesn't actually mark the socket as closed (XPRT_CLOSE_WAITING), or close it directly. static void xs_sock_mark_closed(struct rpc_xprt *xprt) { xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); } xs_sock_reset_connection_flags() clears XPRT_CLOSE_WAITING, XPRT_CLOSING, and XPRT_SOCK_DATA_READY, and adds some memory barrier. xprt_disconnect_done() clears XPRT_CONNECTED and wakes up the task. So none of this will "Mark transport as closed" like the comment suggests. They just mark it as disconnected. xs_sock_reset_connection_flags() even *clears* XPRT_CLOSE_WAITING, which we really want to be set. The commit said: Under all conditions, it should be quite sufficient just to mark the socket as disconnected. It will then be closed by the transport shutdown or reconnect code. The "reconnect code" is presumably xprt_connect()? This will close the socket if XPRT_CLOSE_WAIT is set., but not if the socket is marked as disconnected. When marked as disconnected, it tries to reconnect - which incurs a timeout. The "transport shutdown" code is presumably xprt_destroy()? Yes, that would close the socket, but that doesn't straight away. The rpcbind client can try 10 UNSET calls before destroying the transport. It is possible that __svc_unregister should return an error so that svc_unregister() can abort the loop on error. That would reduce the timeouts to some extent, but not completely as we would still get a timeout when trying to register. It might make sense that if the svc_unregister() in svc_rpcb_setup() fails due to ECONNREFUSED, then it should fail so we never even try to register. Plumbing that status through nfsd so that it doesn't abort in the V4-only case when rpcbind isn't available could be done, but seems like a lot of work for little gain. Also, the error that comes through to __svc_unregister is "-EAGAIN" rather then -ECONNREFUSED. I think that the easiest solution is to make sure reconnection attempts on an RPC_IS_SOFTCONN tasks do not trigger a delay in xs_connect(). call_connect_status() already adds a 3second delay for other tasks, so simply forcing a close in xs_sock_marked_close() should be sufficient. The revert I posted would do that, or we could just add set_bit(XPRT_CLOSE_WAIT, &xprt->state); so that the comment would be correct. That is sufficient. Another approach which works is: diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 7f79fb7dc6a0..f80b135b4830 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1936,8 +1936,10 @@ call_connect_status(struct rpc_task *task) case -EADDRINUSE: case -ENOBUFS: case -EPIPE: - if (RPC_IS_SOFTCONN(task)) + if (RPC_IS_SOFTCONN(task)) { + xprt_force_disconnect(task->tk_rqstp->rq_xprt); break; + } /* retry with existing socket, after a delay */ rpc_delay(task, 3*HZ); case -EAGAIN: so that we force a disconnect precisely when a connection attempt fails on a SOFTCONN task. Which of these solutions do you prefer? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-08-10 2:05 ` NeilBrown @ 2016-11-18 4:48 ` NeilBrown 2016-11-18 14:31 ` Trond Myklebust 1 sibling, 0 replies; 7+ messages in thread From: NeilBrown @ 2016-11-18 4:48 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: Linux List [-- Attachment #1: Type: text/plain, Size: 5851 bytes --] Ping. This is still outstanding. Is there some way we can reach a resolution? Thanks, NeilBrown On Wed, Aug 10 2016, NeilBrown wrote: > [ Unknown signature status ] > On Thu, Aug 04 2016, Trond Myklebust wrote: > >>> On Aug 3, 2016, at 19:33, NeilBrown <neilb@suse.com> wrote: >>> >>> >>> This reverts commit 4b0ab51db32eba0f48b7618254742f143364a28d. >>> >>> This change causes 'rpc.nfsd' to hang for long time if rpcbind is not >>> available. >>> If >>> --no-nfs-version 2 --no-nfs-version 3 >>> is given, the delay is about 6.5 minutes. When trying to register >>> all versions, the delay is over half an hour. >>> Before this commit, and after reverting it, nfsd fails (when v3 is >>> requested) or succeeds (when only v4 is requested) immediately. >>> >>> Signed-off-by: NeilBrown <neilb@suse.com> >>> --- >>> net/sunrpc/xprtsock.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>> index 111767ab124a..2a938055e95b 100644 >>> --- a/net/sunrpc/xprtsock.c >>> +++ b/net/sunrpc/xprtsock.c >>> @@ -795,6 +795,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) >>> xs_sock_reset_connection_flags(xprt); >>> /* Mark transport as closed and wake up all pending tasks */ >>> xprt_disconnect_done(xprt); >>> + xprt_force_disconnect(xprt); >>> } >>> >> >> If there is an outstanding request, then _that_ is supposed to redrive the connection. Why is the xprt_disconnect_done() not functioning as per the comment above it? > > I made some time to look at this and have a somewhat clearer > understanding. > > The rpcbind code makes about 24 requests, include NULL pings, UNSET > requests to clear any old registrations, and SET requests. > > The first (a NULL) fails immediately due to ECONNREFUSED. Others aren't > even attempted for a growing number of seconds because xs_connect() > finds that transport->sock is not NULL, so it delays before trying. > Once it tries, it failed immediately and we move on to the next request. > > transport->sock is set to NULL by xs_reset_transport() which xs_connect() calls > as part of the the timeout, but ->reestablish_timeout is not reset. > The next time xs_connect is called, ->sock will be non-NULL again as > transport->reestablish_timeout will be longer. > > transport->sock is only set to NULL *before* the test in xs_connect() by a > call to xs_close() or ->close(). But nothing in triggering this. > > xs_sock_mark_closed() currently doesn't actually mark the socket as > closed (XPRT_CLOSE_WAITING), or close it directly. > > static void xs_sock_mark_closed(struct rpc_xprt *xprt) > { > xs_sock_reset_connection_flags(xprt); > /* Mark transport as closed and wake up all pending tasks */ > xprt_disconnect_done(xprt); > } > > xs_sock_reset_connection_flags() clears XPRT_CLOSE_WAITING, > XPRT_CLOSING, and XPRT_SOCK_DATA_READY, and adds some memory barrier. > > xprt_disconnect_done() clears XPRT_CONNECTED and wakes up the task. > > So none of this will "Mark transport as closed" like the comment > suggests. They just mark it as disconnected. > xs_sock_reset_connection_flags() even *clears* XPRT_CLOSE_WAITING, which > we really want to be set. > > The commit said: > > Under all conditions, it should be quite sufficient just to mark > the socket as disconnected. It will then be closed by the > transport shutdown or reconnect code. > > The "reconnect code" is presumably xprt_connect()? This will close the > socket if XPRT_CLOSE_WAIT is set., but not if the socket is marked as > disconnected. When marked as disconnected, it tries to reconnect - > which incurs a timeout. > > The "transport shutdown" code is presumably xprt_destroy()? Yes, that > would close the socket, but that doesn't straight away. The rpcbind > client can try 10 UNSET calls before destroying the transport. > > > It is possible that __svc_unregister should return an error so that > svc_unregister() can abort the loop on error. That would reduce the > timeouts to some extent, but not completely as we would still get a > timeout when trying to register. It might make sense that if the > svc_unregister() in svc_rpcb_setup() fails due to ECONNREFUSED, then it > should fail so we never even try to register. Plumbing that status > through nfsd so that it doesn't abort in the V4-only case when rpcbind > isn't available could be done, but seems like a lot of work for little gain. > > Also, the error that comes through to __svc_unregister is "-EAGAIN" > rather then -ECONNREFUSED. > > I think that the easiest solution is to make sure reconnection attempts > on an RPC_IS_SOFTCONN tasks do not trigger a delay in xs_connect(). > call_connect_status() already adds a 3second delay for other tasks, so > simply forcing a close in xs_sock_marked_close() should be sufficient. > The revert I posted would do that, or we could just add > > set_bit(XPRT_CLOSE_WAIT, &xprt->state); > > so that the comment would be correct. That is sufficient. > > Another approach which works is: > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index 7f79fb7dc6a0..f80b135b4830 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1936,8 +1936,10 @@ call_connect_status(struct rpc_task *task) > case -EADDRINUSE: > case -ENOBUFS: > case -EPIPE: > - if (RPC_IS_SOFTCONN(task)) > + if (RPC_IS_SOFTCONN(task)) { > + xprt_force_disconnect(task->tk_rqstp->rq_xprt); > break; > + } > /* retry with existing socket, after a delay */ > rpc_delay(task, 3*HZ); > case -EAGAIN: > > so that we force a disconnect precisely when a connection attempt fails > on a SOFTCONN task. > > Which of these solutions do you prefer? > > Thanks, > NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-08-10 2:05 ` NeilBrown 2016-11-18 4:48 ` NeilBrown @ 2016-11-18 14:31 ` Trond Myklebust 2016-11-21 3:24 ` NeilBrown 1 sibling, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2016-11-18 14:31 UTC (permalink / raw) To: NeilBrown; +Cc: List Linux NFS Mailing DQo+IE9uIEF1ZyA5LCAyMDE2LCBhdCAyMjowNSwgTmVpbEJyb3duIDxuZWlsYkBzdXNlLmNvbT4g d3JvdGU6DQo+IA0KPiBPbiBUaHUsIEF1ZyAwNCAyMDE2LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+IA0KPj4+IE9uIEF1ZyAzLCAyMDE2LCBhdCAxOTozMywgTmVpbEJyb3duIDxuZWlsYkBzdXNl LmNvbT4gd3JvdGU6DQo+Pj4gDQo+Pj4gDQo+Pj4gVGhpcyByZXZlcnRzIGNvbW1pdCA0YjBhYjUx ZGIzMmViYTBmNDhiNzYxODI1NDc0MmYxNDMzNjRhMjhkLg0KPj4+IA0KPj4+IFRoaXMgY2hhbmdl IGNhdXNlcyAncnBjLm5mc2QnIHRvIGhhbmcgZm9yIGxvbmcgdGltZSBpZiBycGNiaW5kIGlzIG5v dA0KPj4+IGF2YWlsYWJsZS4NCj4+PiBJZg0KPj4+ICAtLW5vLW5mcy12ZXJzaW9uIDIgLS1uby1u ZnMtdmVyc2lvbiAzDQo+Pj4gaXMgZ2l2ZW4sIHRoZSBkZWxheSBpcyBhYm91dCA2LjUgbWludXRl cy4gIFdoZW4gdHJ5aW5nIHRvIHJlZ2lzdGVyDQo+Pj4gYWxsIHZlcnNpb25zLCB0aGUgZGVsYXkg aXMgb3ZlciBoYWxmIGFuIGhvdXIuDQo+Pj4gQmVmb3JlIHRoaXMgY29tbWl0LCBhbmQgYWZ0ZXIg cmV2ZXJ0aW5nIGl0LCBuZnNkIGZhaWxzICh3aGVuIHYzIGlzDQo+Pj4gcmVxdWVzdGVkKSBvciBz dWNjZWVkcyAod2hlbiBvbmx5IHY0IGlzIHJlcXVlc3RlZCkgaW1tZWRpYXRlbHkuDQo+Pj4gDQo+ Pj4gU2lnbmVkLW9mZi1ieTogTmVpbEJyb3duIDxuZWlsYkBzdXNlLmNvbT4NCj4+PiAtLS0NCj4+ PiBuZXQvc3VucnBjL3hwcnRzb2NrLmMgfCAxICsNCj4+PiAxIGZpbGUgY2hhbmdlZCwgMSBpbnNl cnRpb24oKykNCj4+PiANCj4+PiBkaWZmIC0tZ2l0IGEvbmV0L3N1bnJwYy94cHJ0c29jay5jIGIv bmV0L3N1bnJwYy94cHJ0c29jay5jDQo+Pj4gaW5kZXggMTExNzY3YWIxMjRhLi4yYTkzODA1NWU5 NWIgMTAwNjQ0DQo+Pj4gLS0tIGEvbmV0L3N1bnJwYy94cHJ0c29jay5jDQo+Pj4gKysrIGIvbmV0 L3N1bnJwYy94cHJ0c29jay5jDQo+Pj4gQEAgLTc5NSw2ICs3OTUsNyBAQCBzdGF0aWMgdm9pZCB4 c19zb2NrX21hcmtfY2xvc2VkKHN0cnVjdCBycGNfeHBydCAqeHBydCkNCj4+PiAJeHNfc29ja19y ZXNldF9jb25uZWN0aW9uX2ZsYWdzKHhwcnQpOw0KPj4+IAkvKiBNYXJrIHRyYW5zcG9ydCBhcyBj bG9zZWQgYW5kIHdha2UgdXAgYWxsIHBlbmRpbmcgdGFza3MgKi8NCj4+PiAJeHBydF9kaXNjb25u ZWN0X2RvbmUoeHBydCk7DQo+Pj4gKwl4cHJ0X2ZvcmNlX2Rpc2Nvbm5lY3QoeHBydCk7DQo+Pj4g fQ0KPj4+IA0KPj4gDQo+PiBJZiB0aGVyZSBpcyBhbiBvdXRzdGFuZGluZyByZXF1ZXN0LCB0aGVu IF90aGF0XyBpcyBzdXBwb3NlZCB0byByZWRyaXZlIHRoZSBjb25uZWN0aW9uLiBXaHkgaXMgdGhl IHhwcnRfZGlzY29ubmVjdF9kb25lKCkgbm90IGZ1bmN0aW9uaW5nIGFzIHBlciB0aGUgY29tbWVu dCBhYm92ZSBpdD8NCj4gDQo+IEkgbWFkZSBzb21lIHRpbWUgdG8gbG9vayBhdCB0aGlzIGFuZCBo YXZlIGEgc29tZXdoYXQgY2xlYXJlcg0KPiB1bmRlcnN0YW5kaW5nLg0KPiANCj4gVGhlIHJwY2Jp bmQgY29kZSBtYWtlcyBhYm91dCAyNCByZXF1ZXN0cywgaW5jbHVkZSBOVUxMIHBpbmdzLCBVTlNF VA0KPiByZXF1ZXN0cyB0byBjbGVhciBhbnkgb2xkIHJlZ2lzdHJhdGlvbnMsIGFuZCBTRVQgcmVx dWVzdHMuDQo+IA0KPiBUaGUgZmlyc3QgKGEgTlVMTCkgZmFpbHMgaW1tZWRpYXRlbHkgZHVlIHRv IEVDT05OUkVGVVNFRC4gIE90aGVycyBhcmVuJ3QNCj4gZXZlbiBhdHRlbXB0ZWQgZm9yIGEgZ3Jv d2luZyBudW1iZXIgb2Ygc2Vjb25kcyBiZWNhdXNlIHhzX2Nvbm5lY3QoKQ0KPiBmaW5kcyB0aGF0 IHRyYW5zcG9ydC0+c29jayBpcyBub3QgTlVMTCwgc28gaXQgZGVsYXlzIGJlZm9yZSB0cnlpbmcu DQo+IE9uY2UgaXQgdHJpZXMsIGl0IGZhaWxlZCBpbW1lZGlhdGVseSBhbmQgd2UgbW92ZSBvbiB0 byB0aGUgbmV4dCByZXF1ZXN0Lg0KPiANCj4gdHJhbnNwb3J0LT5zb2NrIGlzIHNldCB0byBOVUxM IGJ5IHhzX3Jlc2V0X3RyYW5zcG9ydCgpIHdoaWNoIHhzX2Nvbm5lY3QoKSBjYWxscw0KPiBhcyBw YXJ0IG9mIHRoZSB0aGUgdGltZW91dCwgYnV0IC0+cmVlc3RhYmxpc2hfdGltZW91dCBpcyBub3Qg cmVzZXQuDQo+IFRoZSBuZXh0IHRpbWUgeHNfY29ubmVjdCBpcyBjYWxsZWQsIC0+c29jayB3aWxs IGJlIG5vbi1OVUxMIGFnYWluIGFzDQo+IHRyYW5zcG9ydC0+cmVlc3RhYmxpc2hfdGltZW91dCB3 aWxsIGJlIGxvbmdlci4NCj4gDQo+IHRyYW5zcG9ydC0+c29jayBpcyBvbmx5IHNldCB0byBOVUxM ICpiZWZvcmUqIHRoZSB0ZXN0IGluIHhzX2Nvbm5lY3QoKSBieSBhDQo+IGNhbGwgdG8geHNfY2xv c2UoKSBvciAtPmNsb3NlKCkuICBCdXQgbm90aGluZyBpbiB0cmlnZ2VyaW5nIHRoaXMuDQo+IA0K PiB4c19zb2NrX21hcmtfY2xvc2VkKCkgY3VycmVudGx5IGRvZXNuJ3QgYWN0dWFsbHkgbWFyayB0 aGUgc29ja2V0IGFzDQo+IGNsb3NlZCAoWFBSVF9DTE9TRV9XQUlUSU5HKSwgb3IgY2xvc2UgaXQg ZGlyZWN0bHkuDQo+IA0KPiBzdGF0aWMgdm9pZCB4c19zb2NrX21hcmtfY2xvc2VkKHN0cnVjdCBy cGNfeHBydCAqeHBydCkNCj4gew0KPiAJeHNfc29ja19yZXNldF9jb25uZWN0aW9uX2ZsYWdzKHhw cnQpOw0KPiAJLyogTWFyayB0cmFuc3BvcnQgYXMgY2xvc2VkIGFuZCB3YWtlIHVwIGFsbCBwZW5k aW5nIHRhc2tzICovDQo+IAl4cHJ0X2Rpc2Nvbm5lY3RfZG9uZSh4cHJ0KTsNCj4gfQ0KPiANCj4g eHNfc29ja19yZXNldF9jb25uZWN0aW9uX2ZsYWdzKCkgY2xlYXJzIFhQUlRfQ0xPU0VfV0FJVElO RywNCj4gWFBSVF9DTE9TSU5HLCBhbmQgWFBSVF9TT0NLX0RBVEFfUkVBRFksIGFuZCBhZGRzIHNv bWUgbWVtb3J5IGJhcnJpZXIuDQo+IA0KPiB4cHJ0X2Rpc2Nvbm5lY3RfZG9uZSgpIGNsZWFycyBY UFJUX0NPTk5FQ1RFRCBhbmQgd2FrZXMgdXAgdGhlIHRhc2suDQo+IA0KPiBTbyBub25lIG9mIHRo aXMgd2lsbCAiTWFyayB0cmFuc3BvcnQgYXMgY2xvc2VkIiBsaWtlIHRoZSBjb21tZW50DQo+IHN1 Z2dlc3RzLiAgVGhleSBqdXN0IG1hcmsgaXQgYXMgZGlzY29ubmVjdGVkLg0KPiB4c19zb2NrX3Jl c2V0X2Nvbm5lY3Rpb25fZmxhZ3MoKSBldmVuICpjbGVhcnMqIFhQUlRfQ0xPU0VfV0FJVElORywg d2hpY2gNCj4gd2UgcmVhbGx5IHdhbnQgdG8gYmUgc2V0Lg0KPiANCj4gVGhlIGNvbW1pdCBzYWlk Og0KPiANCj4gICAgVW5kZXIgYWxsIGNvbmRpdGlvbnMsIGl0IHNob3VsZCBiZSBxdWl0ZSBzdWZm aWNpZW50IGp1c3QgdG8gbWFyaw0KPiAgICB0aGUgc29ja2V0IGFzIGRpc2Nvbm5lY3RlZC4gSXQg d2lsbCB0aGVuIGJlIGNsb3NlZCBieSB0aGUNCj4gICAgdHJhbnNwb3J0IHNodXRkb3duIG9yIHJl Y29ubmVjdCBjb2RlLg0KPiANCj4gVGhlICJyZWNvbm5lY3QgY29kZSIgaXMgcHJlc3VtYWJseSB4 cHJ0X2Nvbm5lY3QoKT8gIFRoaXMgd2lsbCBjbG9zZSB0aGUNCj4gc29ja2V0IGlmIFhQUlRfQ0xP U0VfV0FJVCBpcyBzZXQuLCBidXQgbm90IGlmIHRoZSBzb2NrZXQgaXMgbWFya2VkIGFzDQo+IGRp c2Nvbm5lY3RlZC4gIFdoZW4gbWFya2VkIGFzIGRpc2Nvbm5lY3RlZCwgaXQgdHJpZXMgdG8gcmVj b25uZWN0IC0NCj4gd2hpY2ggaW5jdXJzIGEgdGltZW91dC4NCj4gDQo+IFRoZSAidHJhbnNwb3J0 IHNodXRkb3duIiBjb2RlIGlzIHByZXN1bWFibHkgeHBydF9kZXN0cm95KCk/ICBZZXMsIHRoYXQN Cj4gd291bGQgY2xvc2UgdGhlIHNvY2tldCwgYnV0IHRoYXQgZG9lc24ndCBzdHJhaWdodCBhd2F5 LiAgVGhlIHJwY2JpbmQNCj4gY2xpZW50IGNhbiB0cnkgMTAgVU5TRVQgY2FsbHMgYmVmb3JlIGRl c3Ryb3lpbmcgdGhlIHRyYW5zcG9ydC4NCj4gDQo+IA0KPiBJdCBpcyBwb3NzaWJsZSB0aGF0IF9f c3ZjX3VucmVnaXN0ZXIgc2hvdWxkIHJldHVybiBhbiBlcnJvciBzbyB0aGF0DQo+IHN2Y191bnJl Z2lzdGVyKCkgY2FuIGFib3J0IHRoZSBsb29wIG9uIGVycm9yLiAgVGhhdCB3b3VsZCByZWR1Y2Ug dGhlDQo+IHRpbWVvdXRzIHRvIHNvbWUgZXh0ZW50LCBidXQgbm90IGNvbXBsZXRlbHkgYXMgd2Ug d291bGQgc3RpbGwgZ2V0IGENCj4gdGltZW91dCB3aGVuIHRyeWluZyB0byByZWdpc3Rlci4gIEl0 IG1pZ2h0IG1ha2Ugc2Vuc2UgdGhhdCBpZiB0aGUNCj4gc3ZjX3VucmVnaXN0ZXIoKSBpbiBzdmNf cnBjYl9zZXR1cCgpIGZhaWxzIGR1ZSB0byBFQ09OTlJFRlVTRUQsIHRoZW4gaXQNCj4gc2hvdWxk IGZhaWwgc28gd2UgbmV2ZXIgZXZlbiB0cnkgdG8gcmVnaXN0ZXIuICBQbHVtYmluZyB0aGF0IHN0 YXR1cw0KPiB0aHJvdWdoIG5mc2Qgc28gdGhhdCBpdCBkb2Vzbid0IGFib3J0IGluIHRoZSBWNC1v bmx5IGNhc2Ugd2hlbiBycGNiaW5kDQo+IGlzbid0IGF2YWlsYWJsZSBjb3VsZCBiZSBkb25lLCBi dXQgc2VlbXMgbGlrZSBhIGxvdCBvZiB3b3JrIGZvciBsaXR0bGUgZ2Fpbi4NCj4gDQo+IEFsc28s IHRoZSBlcnJvciB0aGF0IGNvbWVzIHRocm91Z2ggdG8gX19zdmNfdW5yZWdpc3RlciBpcyAiLUVB R0FJTiINCj4gcmF0aGVyIHRoZW4gLUVDT05OUkVGVVNFRC4NCj4gDQo+IEkgdGhpbmsgdGhhdCB0 aGUgZWFzaWVzdCBzb2x1dGlvbiBpcyB0byBtYWtlIHN1cmUgcmVjb25uZWN0aW9uIGF0dGVtcHRz DQo+IG9uIGFuIFJQQ19JU19TT0ZUQ09OTiB0YXNrcyBkbyBub3QgdHJpZ2dlciBhIGRlbGF5IGlu IHhzX2Nvbm5lY3QoKS4NCj4gY2FsbF9jb25uZWN0X3N0YXR1cygpIGFscmVhZHkgYWRkcyBhIDNz ZWNvbmQgZGVsYXkgZm9yIG90aGVyIHRhc2tzLCBzbw0KPiBzaW1wbHkgZm9yY2luZyBhIGNsb3Nl IGluIHhzX3NvY2tfbWFya2VkX2Nsb3NlKCkgc2hvdWxkIGJlIHN1ZmZpY2llbnQuDQo+IFRoZSBy ZXZlcnQgSSBwb3N0ZWQgd291bGQgZG8gdGhhdCwgb3Igd2UgY291bGQganVzdCBhZGQNCj4gDQo+ IAlzZXRfYml0KFhQUlRfQ0xPU0VfV0FJVCwgJnhwcnQtPnN0YXRlKTsNCj4gDQo+IHNvIHRoYXQg dGhlIGNvbW1lbnQgd291bGQgYmUgY29ycmVjdC4gIFRoYXQgaXMgc3VmZmljaWVudC4NCj4gDQo+ IEFub3RoZXIgYXBwcm9hY2ggd2hpY2ggd29ya3MgaXM6DQo+IGRpZmYgLS1naXQgYS9uZXQvc3Vu cnBjL2NsbnQuYyBiL25ldC9zdW5ycGMvY2xudC5jDQo+IGluZGV4IDdmNzlmYjdkYzZhMC4uZjgw YjEzNWI0ODMwIDEwMDY0NA0KPiAtLS0gYS9uZXQvc3VucnBjL2NsbnQuYw0KPiArKysgYi9uZXQv c3VucnBjL2NsbnQuYw0KPiBAQCAtMTkzNiw4ICsxOTM2LDEwIEBAIGNhbGxfY29ubmVjdF9zdGF0 dXMoc3RydWN0IHJwY190YXNrICp0YXNrKQ0KPiAJY2FzZSAtRUFERFJJTlVTRToNCj4gCWNhc2Ug LUVOT0JVRlM6DQo+IAljYXNlIC1FUElQRToNCj4gLQkJaWYgKFJQQ19JU19TT0ZUQ09OTih0YXNr KSkNCj4gKwkJaWYgKFJQQ19JU19TT0ZUQ09OTih0YXNrKSkgew0KPiArCQkJeHBydF9mb3JjZV9k aXNjb25uZWN0KHRhc2stPnRrX3Jxc3RwLT5ycV94cHJ0KTsNCj4gCQkJYnJlYWs7DQo+ICsJCX0N Cj4gCQkvKiByZXRyeSB3aXRoIGV4aXN0aW5nIHNvY2tldCwgYWZ0ZXIgYSBkZWxheSAqLw0KPiAJ CXJwY19kZWxheSh0YXNrLCAzKkhaKTsNCj4gCWNhc2UgLUVBR0FJTjoNCj4gDQo+IHNvIHRoYXQg d2UgZm9yY2UgYSBkaXNjb25uZWN0IHByZWNpc2VseSB3aGVuIGEgY29ubmVjdGlvbiBhdHRlbXB0 IGZhaWxzDQo+IG9uIGEgU09GVENPTk4gdGFzay4NCj4gDQo+IFdoaWNoIG9mIHRoZXNlIHNvbHV0 aW9ucyBkbyB5b3UgcHJlZmVyPw0KDQoNCkluc3RlYWQgb2YgdGhlIGFib3ZlLCBjb3VsZCB3ZSBy YXRoZXIgbG9vayBhdCBhZGRpbmcgYSBjYWxsIHRvIHhwcnRfY29uZGl0aW9uYWxfZGlzY29ubmVj dCgpIHRvIGVuc3VyZSB0aGF0IHdlIGRvbuKAmXQga2VlcCBkaXNjb25uZWN0aW5nIG11bHRpcGxl IHRpbWVzPyBJdCBtZWFucyB3ZeKAmWQgaGF2ZSB0byBzZXQgcmVxLT5ycV9jb25uZWN0X2Nvb2tp ZSBpbiB4cHJ0X2Nvbm5lY3QoKSBiZWZvcmUgc2xlZXBpbmcgb24geHBydC0+cGVuZGluZywgYnV0 IEkgZG9u4oCZdCB0aGluayB0aGF0IGlzIGEgbWFqb3IgaW1wb3NpdGlvbi4NCg0KSSBhbHNvIHRo aW5rIHdlIG1pZ2h0IG5vdCB3YW50IHRvIG1ha2UgdGhhdCBjYWxsIGNvbmRpdGlvbmFsIG9uIFJQ Q19JU19TT0ZUQ09OTi4NCg0KDQo= ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" 2016-11-18 14:31 ` Trond Myklebust @ 2016-11-21 3:24 ` NeilBrown 0 siblings, 0 replies; 7+ messages in thread From: NeilBrown @ 2016-11-21 3:24 UTC (permalink / raw) To: Trond Myklebust; +Cc: List Linux NFS Mailing [-- Attachment #1: Type: text/plain, Size: 2915 bytes --] On Sat, Nov 19 2016, Trond Myklebust wrote: >> On Aug 9, 2016, at 22:05, NeilBrown <neilb@suse.com> wrote: .... >> >> Another approach which works is: >> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >> index 7f79fb7dc6a0..f80b135b4830 100644 >> --- a/net/sunrpc/clnt.c >> +++ b/net/sunrpc/clnt.c >> @@ -1936,8 +1936,10 @@ call_connect_status(struct rpc_task *task) >> case -EADDRINUSE: >> case -ENOBUFS: >> case -EPIPE: >> - if (RPC_IS_SOFTCONN(task)) >> + if (RPC_IS_SOFTCONN(task)) { >> + xprt_force_disconnect(task->tk_rqstp->rq_xprt); >> break; >> + } >> /* retry with existing socket, after a delay */ >> rpc_delay(task, 3*HZ); >> case -EAGAIN: >> >> so that we force a disconnect precisely when a connection attempt fails >> on a SOFTCONN task. >> >> Which of these solutions do you prefer? > > > Instead of the above, could we rather look at adding a call to xprt_conditional_disconnect() to ensure that we don’t keep disconnecting multiple times? It means we’d have to set req->rq_connect_cookie in xprt_connect() before sleeping on xprt->pending, but I don’t think that is a major imposition. > > I also think we might not want to make that call conditional on RPC_IS_SOFTCONN. Something like this maybe? Just using xprt_conditional_disconnect() doesn't work as it does nothing when !xprt_connected(xprt). I commented out that test and then the change works. I don't know if that is what you want though. Thanks, NeilBrown diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index b31ca97e754e..53acd4e3a317 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1926,6 +1926,8 @@ call_connect_status(struct rpc_task *task) case -EADDRINUSE: case -ENOBUFS: case -EPIPE: + xprt_conditional_disconnect(task->tk_rqstp->rq_xprt, + task->tk_rqstp->rq_connect_cookie); if (RPC_IS_SOFTCONN(task)) break; /* retry with existing socket, after a delay */ diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 685e6d225414..52007d018135 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -669,7 +669,7 @@ void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie) spin_lock_bh(&xprt->transport_lock); if (cookie != xprt->connect_cookie) goto out; - if (test_bit(XPRT_CLOSING, &xprt->state) || !xprt_connected(xprt)) + if (test_bit(XPRT_CLOSING, &xprt->state) /*|| !xprt_connected(xprt)*/) goto out; set_bit(XPRT_CLOSE_WAIT, &xprt->state); /* Try to schedule an autoclose RPC call */ @@ -772,6 +772,7 @@ void xprt_connect(struct rpc_task *task) if (!xprt_connected(xprt)) { task->tk_rqstp->rq_bytes_sent = 0; task->tk_timeout = task->tk_rqstp->rq_timeout; + task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie; rpc_sleep_on(&xprt->pending, task, xprt_connect_status); if (test_bit(XPRT_CLOSING, &xprt->state)) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-11-21 3:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-03 23:33 [PATCH] Revert "SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose" NeilBrown 2016-08-04 0:01 ` Trond Myklebust 2016-08-04 2:35 ` NeilBrown 2016-08-10 2:05 ` NeilBrown 2016-11-18 4:48 ` NeilBrown 2016-11-18 14:31 ` Trond Myklebust 2016-11-21 3:24 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).