All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] SUNRPC: Clean up port reuse behavior on reconnects
@ 2018-05-10  6:12 Vallish Vaidyeshwara
  2018-05-10  6:12 ` [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect Vallish Vaidyeshwara
  2018-05-10  6:12 ` [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination Vallish Vaidyeshwara
  0 siblings, 2 replies; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10  6:12 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, jlayton, bfields, linux-nfs; +Cc: jsstraus

We have seen examples of NFSv4 client reconnections failing due to
client source port reuse combined with stateful TCP state tracking in
middleboxes.

Commit 0f7a622ca616 ("rpc: xs_bind - do not bind when requesting a
random ephemeral port"), commit 4dda9c8a5e34 ("SUNRPC: Set SO_REUSEPORT
socket option for TCP connections"), and commit 1f4c17a03ba7 ("SUNRPC:
Handle EADDRNOTAVAIL on connection failures") had unintended side effects
of making port reuse dependent on whether that client uses reserved or
non-reserved source ports.

This series fixes the observed NFSv4 client reconnect failures by
choosing a new source port regardless of which port range is in use
for RST or FIN terminating the prior connection. This series does
not restore the original always-reuse behavior that existed prior to
commits mentioned above.

Patch 1:
--------
SUNRPC: Need to reuse non-reserved port for reconnect

This patch restores the original behavior of port reuse in all
scenarios.

Patch 2:
--------
SUNRPC: Reconnect with new port on server initiated connection
    termination

Client uses a new port only in cases where server has terminated
the connection explicitly using RST or FIN.

Proposed outcome with this patch series:
----------------------------------------
------------------------------------------------------------------
| Port type | Connection termination | Current    | With this    |
|           |         type           | behavior   | patch series |
------------------------------------------------------------------
|  Reserved | Network partition-DROP | Reuse port | Reuse port   |
------------------------------------------------------------------
|  Reserved | FIN from server        | Reuse port | New port     |
------------------------------------------------------------------
|  Reserved | RST from server        | Reuse port | New port     |
------------------------------------------------------------------
|  Non-Resv | Network partition-DROP | New port   | Reuse port   |
------------------------------------------------------------------
|  Non-Resv | FIN from server        | New port   | New port     |
------------------------------------------------------------------
|  Non-Resv | RST from server        | New port   | New port     |
------------------------------------------------------------------

 net/sunrpc/xprtsock.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

-- 
2.7.3.AMZN


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect
  2018-05-10  6:12 [PATCH 0/2] SUNRPC: Clean up port reuse behavior on reconnects Vallish Vaidyeshwara
@ 2018-05-10  6:12 ` Vallish Vaidyeshwara
  2018-05-10 21:18   ` Vallish Vaidyeshwara
  2018-05-10  6:12 ` [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination Vallish Vaidyeshwara
  1 sibling, 1 reply; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10  6:12 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, jlayton, bfields, linux-nfs; +Cc: jsstraus

Seemingly innocent optimization related to xs_bind() broke TCP port
reuse by making non-reserved ephermal socket port to not be saved
in "struct sock_xprt (srcport)". In case of non-reserved port,
allocation happens as part of kernel_connect() inside of
xs_tcp_finish_connecting(). kernel_connect() returns EINPROGRESS
and the code skips stashing srcport in sock_xprt for reconnects.
This affects servers DRC in case of network partition where client's
RPC recovery would try reconnecting with a different port.

Reported-by: Alexey Kuznetsov <alexeyk@amazon.com>
Reviewed-by: Jacob Strauss <jsstraus@amazon.com>
Reviewed-by: Alakesh Haloi <alakeshh@amazon.com>
Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
Fixes: 0f7a622c ("rpc: xs_bind - do not bind when requesting a random ephemeral port")
---
 net/sunrpc/xprtsock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index c8902f1..5bf75b3 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2393,9 +2393,11 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
 	switch (ret) {
 	case 0:
-		xs_set_srcport(transport, sock);
 		/* fall through */
 	case -EINPROGRESS:
+		/* Allocated port saved for reconnect */
+		xs_set_srcport(transport, sock);
+
 		/* SYN_SENT! */
 		if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
 			xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
-- 
2.7.3.AMZN


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10  6:12 [PATCH 0/2] SUNRPC: Clean up port reuse behavior on reconnects Vallish Vaidyeshwara
  2018-05-10  6:12 ` [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect Vallish Vaidyeshwara
@ 2018-05-10  6:12 ` Vallish Vaidyeshwara
  2018-05-10 15:25   ` Trond Myklebust
  1 sibling, 1 reply; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10  6:12 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, jlayton, bfields, linux-nfs; +Cc: jsstraus

Server initiated socket close can corrupt connection state tracking
table in conjunction with other network middle boxes. In situations
like these, client connection hangs till connection state tracking
table entries age out and get purged. Client reconnection with a new
port in such a situation will avoid connection hang.

Reviewed-by: Jacob Strauss <jsstraus@amazon.com>
Reviewed-by: Alakesh Haloi <alakeshh@amazon.com>
Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
---
 net/sunrpc/xprtsock.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 5bf75b3..d293c8d 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1629,6 +1629,8 @@ static void xs_tcp_state_change(struct sock *sk)
 		/* The server initiated a shutdown of the socket */
 		xprt->connect_cookie++;
 		clear_bit(XPRT_CONNECTED, &xprt->state);
+		/* Server sent FIN, reconnect with a new port */
+		transport->srcport = 0;
 		xs_tcp_force_close(xprt);
 		/* fall through */
 	case TCP_CLOSING:
@@ -1650,6 +1652,9 @@ static void xs_tcp_state_change(struct sock *sk)
 					&transport->sock_state))
 			xprt_clear_connecting(xprt);
 		clear_bit(XPRT_CLOSING, &xprt->state);
+		/* Server sent RST, reconnect with a new port */
+		if (sk->sk_err == ECONNRESET)
+			transport->srcport = 0;
 		if (sk->sk_err)
 			xprt_wake_pending_tasks(xprt, -sk->sk_err);
 		/* Trigger the socket release */
-- 
2.7.3.AMZN


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10  6:12 ` [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination Vallish Vaidyeshwara
@ 2018-05-10 15:25   ` Trond Myklebust
  2018-05-10 16:22     ` Vallish Vaidyeshwara
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2018-05-10 15:25 UTC (permalink / raw)
  To: bfields, vallish, anna.schumaker, linux-nfs, jlayton; +Cc: jsstraus

T24gVGh1LCAyMDE4LTA1LTEwIGF0IDA2OjEyICswMDAwLCBWYWxsaXNoIFZhaWR5ZXNod2FyYSB3
cm90ZToNCj4gU2VydmVyIGluaXRpYXRlZCBzb2NrZXQgY2xvc2UgY2FuIGNvcnJ1cHQgY29ubmVj
dGlvbiBzdGF0ZSB0cmFja2luZw0KPiB0YWJsZSBpbiBjb25qdW5jdGlvbiB3aXRoIG90aGVyIG5l
dHdvcmsgbWlkZGxlIGJveGVzLiBJbiBzaXR1YXRpb25zDQo+IGxpa2UgdGhlc2UsIGNsaWVudCBj
b25uZWN0aW9uIGhhbmdzIHRpbGwgY29ubmVjdGlvbiBzdGF0ZSB0cmFja2luZw0KPiB0YWJsZSBl
bnRyaWVzIGFnZSBvdXQgYW5kIGdldCBwdXJnZWQuIENsaWVudCByZWNvbm5lY3Rpb24gd2l0aCBh
IG5ldw0KPiBwb3J0IGluIHN1Y2ggYSBzaXR1YXRpb24gd2lsbCBhdm9pZCBjb25uZWN0aW9uIGhh
bmcuDQo+IA0KPiBSZXZpZXdlZC1ieTogSmFjb2IgU3RyYXVzcyA8anNzdHJhdXNAYW1hem9uLmNv
bT4NCj4gUmV2aWV3ZWQtYnk6IEFsYWtlc2ggSGFsb2kgPGFsYWtlc2hoQGFtYXpvbi5jb20+DQo+
IFNpZ25lZC1vZmYtYnk6IFZhbGxpc2ggVmFpZHllc2h3YXJhIDx2YWxsaXNoQGFtYXpvbi5jb20+
DQo+IC0tLQ0KPiAgbmV0L3N1bnJwYy94cHJ0c29jay5jIHwgNSArKysrKw0KPiAgMSBmaWxlIGNo
YW5nZWQsIDUgaW5zZXJ0aW9ucygrKQ0KPiANCj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBy
dHNvY2suYyBiL25ldC9zdW5ycGMveHBydHNvY2suYw0KPiBpbmRleCA1YmY3NWIzLi5kMjkzYzhk
IDEwMDY0NA0KPiAtLS0gYS9uZXQvc3VucnBjL3hwcnRzb2NrLmMNCj4gKysrIGIvbmV0L3N1bnJw
Yy94cHJ0c29jay5jDQo+IEBAIC0xNjI5LDYgKzE2MjksOCBAQCBzdGF0aWMgdm9pZCB4c190Y3Bf
c3RhdGVfY2hhbmdlKHN0cnVjdCBzb2NrDQo+ICpzaykNCj4gIAkJLyogVGhlIHNlcnZlciBpbml0
aWF0ZWQgYSBzaHV0ZG93biBvZiB0aGUgc29ja2V0ICovDQo+ICAJCXhwcnQtPmNvbm5lY3RfY29v
a2llKys7DQo+ICAJCWNsZWFyX2JpdChYUFJUX0NPTk5FQ1RFRCwgJnhwcnQtPnN0YXRlKTsNCj4g
KwkJLyogU2VydmVyIHNlbnQgRklOLCByZWNvbm5lY3Qgd2l0aCBhIG5ldyBwb3J0ICovDQo+ICsJ
CXRyYW5zcG9ydC0+c3JjcG9ydCA9IDA7DQo+ICAJCXhzX3RjcF9mb3JjZV9jbG9zZSh4cHJ0KTsN
Cj4gIAkJLyogZmFsbCB0aHJvdWdoICovDQo+ICAJY2FzZSBUQ1BfQ0xPU0lORzoNCj4gQEAgLTE2
NTAsNiArMTY1Miw5IEBAIHN0YXRpYyB2b2lkIHhzX3RjcF9zdGF0ZV9jaGFuZ2Uoc3RydWN0IHNv
Y2sNCj4gKnNrKQ0KPiAgCQkJCQkmdHJhbnNwb3J0LT5zb2NrX3N0YXRlKSkNCj4gIAkJCXhwcnRf
Y2xlYXJfY29ubmVjdGluZyh4cHJ0KTsNCj4gIAkJY2xlYXJfYml0KFhQUlRfQ0xPU0lORywgJnhw
cnQtPnN0YXRlKTsNCj4gKwkJLyogU2VydmVyIHNlbnQgUlNULCByZWNvbm5lY3Qgd2l0aCBhIG5l
dyBwb3J0ICovDQo+ICsJCWlmIChzay0+c2tfZXJyID09IEVDT05OUkVTRVQpDQo+ICsJCQl0cmFu
c3BvcnQtPnNyY3BvcnQgPSAwOw0KPiAgCQlpZiAoc2stPnNrX2VycikNCj4gIAkJCXhwcnRfd2Fr
ZV9wZW5kaW5nX3Rhc2tzKHhwcnQsIC1zay0+c2tfZXJyKTsNCj4gIAkJLyogVHJpZ2dlciB0aGUg
c29ja2V0IHJlbGVhc2UgKi8NCg0KTkFDSy4gVGhpcyB3aWxsIHV0dGVybHkgYnJlYWsgTkZTdjIs
IE5GU3YzIGFuZCBORlN2NC4wIGR1cGxpY2F0ZSByZXBsYXkNCmNhY2hlIHNlbWFudGljcy4gDQoN
CkNoZWVycw0KICBUcm9uZA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lciwgSGFtbWVyc3BhY2UNCnRyb25kLm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20N
Cg==

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10 15:25   ` Trond Myklebust
@ 2018-05-10 16:22     ` Vallish Vaidyeshwara
  2018-05-10 17:26       ` Trond Myklebust
  2018-05-10 17:37       ` bfields
  0 siblings, 2 replies; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10 16:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfields, anna.schumaker, linux-nfs, jlayton, jsstraus

On Thu, May 10, 2018 at 03:25:14PM +0000, Trond Myklebust wrote:
> On Thu, 2018-05-10 at 06:12 +0000, Vallish Vaidyeshwara wrote:
> > Server initiated socket close can corrupt connection state tracking
> > table in conjunction with other network middle boxes. In situations
> > like these, client connection hangs till connection state tracking
> > table entries age out and get purged. Client reconnection with a new
> > port in such a situation will avoid connection hang.
> > 
> > Reviewed-by: Jacob Strauss <jsstraus@amazon.com>
> > Reviewed-by: Alakesh Haloi <alakeshh@amazon.com>
> > Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
> > ---
> >  net/sunrpc/xprtsock.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 5bf75b3..d293c8d 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1629,6 +1629,8 @@ static void xs_tcp_state_change(struct sock
> > *sk)
> >  		/* The server initiated a shutdown of the socket */
> >  		xprt->connect_cookie++;
> >  		clear_bit(XPRT_CONNECTED, &xprt->state);
> > +		/* Server sent FIN, reconnect with a new port */
> > +		transport->srcport = 0;
> >  		xs_tcp_force_close(xprt);
> >  		/* fall through */
> >  	case TCP_CLOSING:
> > @@ -1650,6 +1652,9 @@ static void xs_tcp_state_change(struct sock
> > *sk)
> >  					&transport->sock_state))
> >  			xprt_clear_connecting(xprt);
> >  		clear_bit(XPRT_CLOSING, &xprt->state);
> > +		/* Server sent RST, reconnect with a new port */
> > +		if (sk->sk_err == ECONNRESET)
> > +			transport->srcport = 0;
> >  		if (sk->sk_err)
> >  			xprt_wake_pending_tasks(xprt, -sk->sk_err);
> >  		/* Trigger the socket release */
> 
> NACK. This will utterly break NFSv2, NFSv3 and NFSv4.0 duplicate replay
> cache semantics. 
> 
> Cheers
>   Trond
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

Hello Trond,

The first patch in this series is actually helping restore DRC behavior in
cases like network partition where packets are dropped:
[PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect
Patch 1 starts reusing port in all cases.

The second patch is still not breaking DRC semantics, to quote from source
code:

<code snip>
  /**
   * xs_close - close a socket
   * @xprt: transport
   *
   * This is used when all requests are complete; ie, no DRC state remains
   * on the server we want to save.
   *
   * The caller _must_ be holding XPRT_LOCKED in order to avoid issues with
   * xs_reset_transport() zeroing the socket from underneath a writer.
   */
  static void xs_close(struct rpc_xprt *xprt)
<code snip>

If the server has closed a connection, then no DRC state remains on the server
we want to use.

The second patch is exploiting this semantics and using a new port in following
2 cases:
a) RST from server implies the connection was torn down & no useful DRC exists
on server
b) FIN from server implies that server is shutting down the connection as part
of close and no DRC state remains

Please let me know if I have missed something obvious, I definitely do not want
to break DRC as that is not the intention of this patch series. Is there a
situation where server can close a connection and still keep DRC?

Thanks.
-Vallish

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10 16:22     ` Vallish Vaidyeshwara
@ 2018-05-10 17:26       ` Trond Myklebust
  2018-05-10 21:12         ` Vallish Vaidyeshwara
  2018-05-10 17:37       ` bfields
  1 sibling, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2018-05-10 17:26 UTC (permalink / raw)
  To: vallish; +Cc: bfields, anna.schumaker, jsstraus, linux-nfs, jlayton

T24gVGh1LCAyMDE4LTA1LTEwIGF0IDE2OjIyICswMDAwLCBWYWxsaXNoIFZhaWR5ZXNod2FyYSB3
cm90ZToNCj4gT24gVGh1LCBNYXkgMTAsIDIwMTggYXQgMDM6MjU6MTRQTSArMDAwMCwgVHJvbmQg
TXlrbGVidXN0IHdyb3RlOg0KPiA+IE9uIFRodSwgMjAxOC0wNS0xMCBhdCAwNjoxMiArMDAwMCwg
VmFsbGlzaCBWYWlkeWVzaHdhcmEgd3JvdGU6DQo+ID4gPiBTZXJ2ZXIgaW5pdGlhdGVkIHNvY2tl
dCBjbG9zZSBjYW4gY29ycnVwdCBjb25uZWN0aW9uIHN0YXRlDQo+ID4gPiB0cmFja2luZw0KPiA+
ID4gdGFibGUgaW4gY29uanVuY3Rpb24gd2l0aCBvdGhlciBuZXR3b3JrIG1pZGRsZSBib3hlcy4g
SW4NCj4gPiA+IHNpdHVhdGlvbnMNCj4gPiA+IGxpa2UgdGhlc2UsIGNsaWVudCBjb25uZWN0aW9u
IGhhbmdzIHRpbGwgY29ubmVjdGlvbiBzdGF0ZQ0KPiA+ID4gdHJhY2tpbmcNCj4gPiA+IHRhYmxl
IGVudHJpZXMgYWdlIG91dCBhbmQgZ2V0IHB1cmdlZC4gQ2xpZW50IHJlY29ubmVjdGlvbiB3aXRo
IGENCj4gPiA+IG5ldw0KPiA+ID4gcG9ydCBpbiBzdWNoIGEgc2l0dWF0aW9uIHdpbGwgYXZvaWQg
Y29ubmVjdGlvbiBoYW5nLg0KPiA+ID4gDQo+ID4gPiBSZXZpZXdlZC1ieTogSmFjb2IgU3RyYXVz
cyA8anNzdHJhdXNAYW1hem9uLmNvbT4NCj4gPiA+IFJldmlld2VkLWJ5OiBBbGFrZXNoIEhhbG9p
IDxhbGFrZXNoaEBhbWF6b24uY29tPg0KPiA+ID4gU2lnbmVkLW9mZi1ieTogVmFsbGlzaCBWYWlk
eWVzaHdhcmEgPHZhbGxpc2hAYW1hem9uLmNvbT4NCj4gPiA+IC0tLQ0KPiA+ID4gIG5ldC9zdW5y
cGMveHBydHNvY2suYyB8IDUgKysrKysNCj4gPiA+ICAxIGZpbGUgY2hhbmdlZCwgNSBpbnNlcnRp
b25zKCspDQo+ID4gPiANCj4gPiA+IGRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3hwcnRzb2NrLmMg
Yi9uZXQvc3VucnBjL3hwcnRzb2NrLmMNCj4gPiA+IGluZGV4IDViZjc1YjMuLmQyOTNjOGQgMTAw
NjQ0DQo+ID4gPiAtLS0gYS9uZXQvc3VucnBjL3hwcnRzb2NrLmMNCj4gPiA+ICsrKyBiL25ldC9z
dW5ycGMveHBydHNvY2suYw0KPiA+ID4gQEAgLTE2MjksNiArMTYyOSw4IEBAIHN0YXRpYyB2b2lk
IHhzX3RjcF9zdGF0ZV9jaGFuZ2Uoc3RydWN0IHNvY2sNCj4gPiA+ICpzaykNCj4gPiA+ICAJCS8q
IFRoZSBzZXJ2ZXIgaW5pdGlhdGVkIGEgc2h1dGRvd24gb2YgdGhlIHNvY2tldA0KPiA+ID4gKi8N
Cj4gPiA+ICAJCXhwcnQtPmNvbm5lY3RfY29va2llKys7DQo+ID4gPiAgCQljbGVhcl9iaXQoWFBS
VF9DT05ORUNURUQsICZ4cHJ0LT5zdGF0ZSk7DQo+ID4gPiArCQkvKiBTZXJ2ZXIgc2VudCBGSU4s
IHJlY29ubmVjdCB3aXRoIGEgbmV3IHBvcnQgKi8NCj4gPiA+ICsJCXRyYW5zcG9ydC0+c3JjcG9y
dCA9IDA7DQo+ID4gPiAgCQl4c190Y3BfZm9yY2VfY2xvc2UoeHBydCk7DQo+ID4gPiAgCQkvKiBm
YWxsIHRocm91Z2ggKi8NCj4gPiA+ICAJY2FzZSBUQ1BfQ0xPU0lORzoNCj4gPiA+IEBAIC0xNjUw
LDYgKzE2NTIsOSBAQCBzdGF0aWMgdm9pZCB4c190Y3Bfc3RhdGVfY2hhbmdlKHN0cnVjdCBzb2Nr
DQo+ID4gPiAqc2spDQo+ID4gPiAgCQkJCQkmdHJhbnNwb3J0LT5zb2NrX3N0YXRlKSkNCj4gPiA+
ICAJCQl4cHJ0X2NsZWFyX2Nvbm5lY3RpbmcoeHBydCk7DQo+ID4gPiAgCQljbGVhcl9iaXQoWFBS
VF9DTE9TSU5HLCAmeHBydC0+c3RhdGUpOw0KPiA+ID4gKwkJLyogU2VydmVyIHNlbnQgUlNULCBy
ZWNvbm5lY3Qgd2l0aCBhIG5ldyBwb3J0ICovDQo+ID4gPiArCQlpZiAoc2stPnNrX2VyciA9PSBF
Q09OTlJFU0VUKQ0KPiA+ID4gKwkJCXRyYW5zcG9ydC0+c3JjcG9ydCA9IDA7DQo+ID4gPiAgCQlp
ZiAoc2stPnNrX2VycikNCj4gPiA+ICAJCQl4cHJ0X3dha2VfcGVuZGluZ190YXNrcyh4cHJ0LCAt
c2stDQo+ID4gPiA+c2tfZXJyKTsNCj4gPiA+ICAJCS8qIFRyaWdnZXIgdGhlIHNvY2tldCByZWxl
YXNlICovDQo+ID4gDQo+ID4gTkFDSy4gVGhpcyB3aWxsIHV0dGVybHkgYnJlYWsgTkZTdjIsIE5G
U3YzIGFuZCBORlN2NC4wIGR1cGxpY2F0ZQ0KPiA+IHJlcGxheQ0KPiA+IGNhY2hlIHNlbWFudGlj
cy4gDQo+ID4gDQo+ID4gQ2hlZXJzDQo+ID4gICBUcm9uZA0KPiA+IC0tIA0KPiA+IFRyb25kIE15
a2xlYnVzdA0KPiA+IExpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgSGFtbWVyc3BhY2UNCj4g
PiB0cm9uZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tDQo+IA0KPiBIZWxsbyBUcm9uZCwNCj4g
DQo+IFRoZSBmaXJzdCBwYXRjaCBpbiB0aGlzIHNlcmllcyBpcyBhY3R1YWxseSBoZWxwaW5nIHJl
c3RvcmUgRFJDDQo+IGJlaGF2aW9yIGluDQo+IGNhc2VzIGxpa2UgbmV0d29yayBwYXJ0aXRpb24g
d2hlcmUgcGFja2V0cyBhcmUgZHJvcHBlZDoNCj4gW1BBVENIIDEvMl0gU1VOUlBDOiBOZWVkIHRv
IHJldXNlIG5vbi1yZXNlcnZlZCBwb3J0IGZvciByZWNvbm5lY3QNCj4gUGF0Y2ggMSBzdGFydHMg
cmV1c2luZyBwb3J0IGluIGFsbCBjYXNlcy4NCj4gDQo+IFRoZSBzZWNvbmQgcGF0Y2ggaXMgc3Rp
bGwgbm90IGJyZWFraW5nIERSQyBzZW1hbnRpY3MsIHRvIHF1b3RlIGZyb20NCj4gc291cmNlDQo+
IGNvZGU6DQo+IA0KPiA8Y29kZSBzbmlwPg0KPiAgIC8qKg0KPiAgICAqIHhzX2Nsb3NlIC0gY2xv
c2UgYSBzb2NrZXQNCj4gICAgKiBAeHBydDogdHJhbnNwb3J0DQo+ICAgICoNCj4gICAgKiBUaGlz
IGlzIHVzZWQgd2hlbiBhbGwgcmVxdWVzdHMgYXJlIGNvbXBsZXRlOyBpZSwgbm8gRFJDIHN0YXRl
DQo+IHJlbWFpbnMNCj4gICAgKiBvbiB0aGUgc2VydmVyIHdlIHdhbnQgdG8gc2F2ZS4NCj4gICAg
Kg0KPiAgICAqIFRoZSBjYWxsZXIgX211c3RfIGJlIGhvbGRpbmcgWFBSVF9MT0NLRUQgaW4gb3Jk
ZXIgdG8gYXZvaWQNCj4gaXNzdWVzIHdpdGgNCj4gICAgKiB4c19yZXNldF90cmFuc3BvcnQoKSB6
ZXJvaW5nIHRoZSBzb2NrZXQgZnJvbSB1bmRlcm5lYXRoIGENCj4gd3JpdGVyLg0KPiAgICAqLw0K
PiAgIHN0YXRpYyB2b2lkIHhzX2Nsb3NlKHN0cnVjdCBycGNfeHBydCAqeHBydCkNCj4gPGNvZGUg
c25pcD4NCj4gDQo+IElmIHRoZSBzZXJ2ZXIgaGFzIGNsb3NlZCBhIGNvbm5lY3Rpb24sIHRoZW4g
bm8gRFJDIHN0YXRlIHJlbWFpbnMgb24NCj4gdGhlIHNlcnZlcg0KPiB3ZSB3YW50IHRvIHVzZS4N
Cj4gDQo+IFRoZSBzZWNvbmQgcGF0Y2ggaXMgZXhwbG9pdGluZyB0aGlzIHNlbWFudGljcyBhbmQg
dXNpbmcgYSBuZXcgcG9ydCBpbg0KPiBmb2xsb3dpbmcNCj4gMiBjYXNlczoNCj4gYSkgUlNUIGZy
b20gc2VydmVyIGltcGxpZXMgdGhlIGNvbm5lY3Rpb24gd2FzIHRvcm4gZG93biAmIG5vIHVzZWZ1
bA0KPiBEUkMgZXhpc3RzDQo+IG9uIHNlcnZlcg0KPiBiKSBGSU4gZnJvbSBzZXJ2ZXIgaW1wbGll
cyB0aGF0IHNlcnZlciBpcyBzaHV0dGluZyBkb3duIHRoZQ0KPiBjb25uZWN0aW9uIGFzIHBhcnQN
Cj4gb2YgY2xvc2UgYW5kIG5vIERSQyBzdGF0ZSByZW1haW5zDQo+IA0KPiBQbGVhc2UgbGV0IG1l
IGtub3cgaWYgSSBoYXZlIG1pc3NlZCBzb21ldGhpbmcgb2J2aW91cywgSSBkZWZpbml0ZWx5DQo+
IGRvIG5vdCB3YW50DQo+IHRvIGJyZWFrIERSQyBhcyB0aGF0IGlzIG5vdCB0aGUgaW50ZW50aW9u
IG9mIHRoaXMgcGF0Y2ggc2VyaWVzLiBJcw0KPiB0aGVyZSBhDQo+IHNpdHVhdGlvbiB3aGVyZSBz
ZXJ2ZXIgY2FuIGNsb3NlIGEgY29ubmVjdGlvbiBhbmQgc3RpbGwga2VlcCBEUkM/DQo+IA0KDQpU
aGUgRFJDIGRvZXMgbm90IGV4aXN0IGZvciB0aGUgYmVuZWZpdCBvZiB0aGUgc2VydmVyLCBidXQg
Zm9yIHRoZQ0KYmVuZWZpdCBvZiB0aGUgY2xpZW50LiBJdCBpcyB0aGVyZSB0byBlbnN1cmUgdGhh
dCBpZiB0aGUgY2xpZW50IHJlcGxheXMNCnRoZSByZXF1ZXN0LCB0aGVuIGl0IGdldHMgdGhlIGV4
YWN0IHNhbWUgcmVwbHkgYXMgaXQgc2hvdWxkIGhhdmUNCnJlY2VpdmVkIHdoZW4gdGhlIGZpcnN0
IHJlcXVlc3Qgd2FzIHNlbnQuIEJlYXJpbmcgdGhhdCBpbiBtaW5kOg0KDQogICAxLiBXaGF0IGd1
YXJhbnRlZXMgdGhhdCBhbGwgc2VydmVycyBiZWhhdmUgY29ycmVjdGx5IHcuci50LiBlbnN1cmlu
Zw0KICAgICAgdGhhdCB0aGV5IGhhdmUgc2VudCBhbGwgcmVwbGllcyB0byBhbnkgb3V0c3RhbmRp
bmcgUlBDIGNhbGwgYmVmb3JlDQogICAgICBzaHV0dGluZyBkb3duIHRoZSBjb25uZWN0aW9uLiBJ
J20gbm90IHN1cmUgdGhhdCBldmVuIHRoZSBMaW51eA0KICAgICAgc2VydmVyIGRvZXMgdGhhdC4N
CiAgIDIuIEhvdyB3b3VsZCBhIHNlcnZlciBldmVuIGtub3cgd2hldGhlciBvciBub3QgdGhlIGNs
aWVudCBtYXkgbmVlZCB0bw0KICAgICAgcmVwbGF5IGEgcmVxdWVzdD8NCg0KLS0gDQpUcm9uZCBN
eWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgSGFtbWVyc3BhY2UNCnRyb25k
Lm15a2xlYnVzdEBoYW1tZXJzcGFjZS5jb20=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10 16:22     ` Vallish Vaidyeshwara
  2018-05-10 17:26       ` Trond Myklebust
@ 2018-05-10 17:37       ` bfields
  2018-05-10 21:15         ` Vallish Vaidyeshwara
  1 sibling, 1 reply; 10+ messages in thread
From: bfields @ 2018-05-10 17:37 UTC (permalink / raw)
  To: Vallish Vaidyeshwara
  Cc: Trond Myklebust, anna.schumaker, linux-nfs, jlayton, jsstraus

On Thu, May 10, 2018 at 04:22:02PM +0000, Vallish Vaidyeshwara wrote:
> If the server has closed a connection, then no DRC state remains on the server
> we want to use.

That's not a rule, as far as I know.  Certainly the Linux server doesn't
destroy DRC state on closing a connection.  (Should it? I'm not sure
that's a good idea.)

--b.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10 17:26       ` Trond Myklebust
@ 2018-05-10 21:12         ` Vallish Vaidyeshwara
  0 siblings, 0 replies; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10 21:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfields, anna.schumaker, jsstraus, linux-nfs, jlayton

On Thu, May 10, 2018 at 05:26:12PM +0000, Trond Myklebust wrote:
> On Thu, 2018-05-10 at 16:22 +0000, Vallish Vaidyeshwara wrote:
> > On Thu, May 10, 2018 at 03:25:14PM +0000, Trond Myklebust wrote:
> > > On Thu, 2018-05-10 at 06:12 +0000, Vallish Vaidyeshwara wrote:
> > > > Server initiated socket close can corrupt connection state
> > > > tracking
> > > > table in conjunction with other network middle boxes. In
> > > > situations
> > > > like these, client connection hangs till connection state
> > > > tracking
> > > > table entries age out and get purged. Client reconnection with a
> > > > new
> > > > port in such a situation will avoid connection hang.
> > > > 
> > > > Reviewed-by: Jacob Strauss <jsstraus@amazon.com>
> > > > Reviewed-by: Alakesh Haloi <alakeshh@amazon.com>
> > > > Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
> > > > ---
> > > >  net/sunrpc/xprtsock.c | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 5bf75b3..d293c8d 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -1629,6 +1629,8 @@ static void xs_tcp_state_change(struct sock
> > > > *sk)
> > > >  		/* The server initiated a shutdown of the socket
> > > > */
> > > >  		xprt->connect_cookie++;
> > > >  		clear_bit(XPRT_CONNECTED, &xprt->state);
> > > > +		/* Server sent FIN, reconnect with a new port */
> > > > +		transport->srcport = 0;
> > > >  		xs_tcp_force_close(xprt);
> > > >  		/* fall through */
> > > >  	case TCP_CLOSING:
> > > > @@ -1650,6 +1652,9 @@ static void xs_tcp_state_change(struct sock
> > > > *sk)
> > > >  					&transport->sock_state))
> > > >  			xprt_clear_connecting(xprt);
> > > >  		clear_bit(XPRT_CLOSING, &xprt->state);
> > > > +		/* Server sent RST, reconnect with a new port */
> > > > +		if (sk->sk_err == ECONNRESET)
> > > > +			transport->srcport = 0;
> > > >  		if (sk->sk_err)
> > > >  			xprt_wake_pending_tasks(xprt, -sk-
> > > > >sk_err);
> > > >  		/* Trigger the socket release */
> > > 
> > > NACK. This will utterly break NFSv2, NFSv3 and NFSv4.0 duplicate
> > > replay
> > > cache semantics. 
> > > 
> > > Cheers
> > >   Trond
> > > -- 
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > 
> > Hello Trond,
> > 
> > The first patch in this series is actually helping restore DRC
> > behavior in
> > cases like network partition where packets are dropped:
> > [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect
> > Patch 1 starts reusing port in all cases.
> > 
> > The second patch is still not breaking DRC semantics, to quote from
> > source
> > code:
> > 
> > <code snip>
> >   /**
> >    * xs_close - close a socket
> >    * @xprt: transport
> >    *
> >    * This is used when all requests are complete; ie, no DRC state
> > remains
> >    * on the server we want to save.
> >    *
> >    * The caller _must_ be holding XPRT_LOCKED in order to avoid
> > issues with
> >    * xs_reset_transport() zeroing the socket from underneath a
> > writer.
> >    */
> >   static void xs_close(struct rpc_xprt *xprt)
> > <code snip>
> > 
> > If the server has closed a connection, then no DRC state remains on
> > the server
> > we want to use.
> > 
> > The second patch is exploiting this semantics and using a new port in
> > following
> > 2 cases:
> > a) RST from server implies the connection was torn down & no useful
> > DRC exists
> > on server
> > b) FIN from server implies that server is shutting down the
> > connection as part
> > of close and no DRC state remains
> > 
> > Please let me know if I have missed something obvious, I definitely
> > do not want
> > to break DRC as that is not the intention of this patch series. Is
> > there a
> > situation where server can close a connection and still keep DRC?
> > 
> 
> The DRC does not exist for the benefit of the server, but for the
> benefit of the client. It is there to ensure that if the client replays
> the request, then it gets the exact same reply as it should have
> received when the first request was sent. Bearing that in mind:
> 
>    1. What guarantees that all servers behave correctly w.r.t. ensuring
>       that they have sent all replies to any outstanding RPC call before
>       shutting down the connection. I'm not sure that even the Linux
>       server does that.
>    2. How would a server even know whether or not the client may need to
>       replay a request?
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

Hello Trond,

Thanks for the explanation, I was kind of misled by code comments in
xs_close.I will probably submit a different patch to clean up these code
comments which can mislead others as well.

Regards,
-Vallish

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination
  2018-05-10 17:37       ` bfields
@ 2018-05-10 21:15         ` Vallish Vaidyeshwara
  0 siblings, 0 replies; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10 21:15 UTC (permalink / raw)
  To: bfields; +Cc: Trond Myklebust, anna.schumaker, linux-nfs, jlayton, jsstraus

On Thu, May 10, 2018 at 01:37:28PM -0400, bfields@fieldses.org wrote:
> On Thu, May 10, 2018 at 04:22:02PM +0000, Vallish Vaidyeshwara wrote:
> > If the server has closed a connection, then no DRC state remains on the server
> > we want to use.
> 
> That's not a rule, as far as I know.  Certainly the Linux server doesn't
> destroy DRC state on closing a connection.  (Should it? I'm not sure
> that's a good idea.)
> 
> --b.
> 

Thanks for the clarification and I agree patch 2 should not be applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect
  2018-05-10  6:12 ` [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect Vallish Vaidyeshwara
@ 2018-05-10 21:18   ` Vallish Vaidyeshwara
  0 siblings, 0 replies; 10+ messages in thread
From: Vallish Vaidyeshwara @ 2018-05-10 21:18 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, jlayton, bfields, linux-nfs; +Cc: jsstraus

On Thu, May 10, 2018 at 06:12:53AM +0000, Vallish Vaidyeshwara wrote:
> Seemingly innocent optimization related to xs_bind() broke TCP port
> reuse by making non-reserved ephermal socket port to not be saved
> in "struct sock_xprt (srcport)". In case of non-reserved port,
> allocation happens as part of kernel_connect() inside of
> xs_tcp_finish_connecting(). kernel_connect() returns EINPROGRESS
> and the code skips stashing srcport in sock_xprt for reconnects.
> This affects servers DRC in case of network partition where client's
> RPC recovery would try reconnecting with a different port.
> 
> Reported-by: Alexey Kuznetsov <alexeyk@amazon.com>
> Reviewed-by: Jacob Strauss <jsstraus@amazon.com>
> Reviewed-by: Alakesh Haloi <alakeshh@amazon.com>
> Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
> Fixes: 0f7a622c ("rpc: xs_bind - do not bind when requesting a random ephemeral port")
> ---
>  net/sunrpc/xprtsock.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index c8902f1..5bf75b3 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2393,9 +2393,11 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>  	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
>  	switch (ret) {
>  	case 0:
> -		xs_set_srcport(transport, sock);
>  		/* fall through */
>  	case -EINPROGRESS:
> +		/* Allocated port saved for reconnect */
> +		xs_set_srcport(transport, sock);
> +
>  		/* SYN_SENT! */
>  		if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
>  			xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> -- 
> 2.7.3.AMZN
>

Hello Trond and Bruce,

This patch is actually restoring existing broken DRC behavior. Can you folks
let me know your feedback on this patch as well.

Thanks.
-Vallish

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-10 21:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10  6:12 [PATCH 0/2] SUNRPC: Clean up port reuse behavior on reconnects Vallish Vaidyeshwara
2018-05-10  6:12 ` [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect Vallish Vaidyeshwara
2018-05-10 21:18   ` Vallish Vaidyeshwara
2018-05-10  6:12 ` [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination Vallish Vaidyeshwara
2018-05-10 15:25   ` Trond Myklebust
2018-05-10 16:22     ` Vallish Vaidyeshwara
2018-05-10 17:26       ` Trond Myklebust
2018-05-10 21:12         ` Vallish Vaidyeshwara
2018-05-10 17:37       ` bfields
2018-05-10 21:15         ` Vallish Vaidyeshwara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.