Linux-NFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
@ 2018-12-12 13:51 Dave Wysochanski
  2018-12-12 13:51 ` [PATCH 1/1] " Dave Wysochanski
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Wysochanski @ 2018-12-12 13:51 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker; +Cc: linux-nfs

Recently a customer reported a regression which was the result of a small
portion of 9b30889c548a which changed the handling of TCP_CLOSE inside
xs_tcp_state_change.  A simple reproducer was shown where an unexpected RST
injected during a simple write test would cause the NFS client to hang, a
kworker thread spinning at 100% cpu, and the only recovery was to reboot.

Upon investigation we found that the XPRT_CONNECTED bit was never cleared on
the rpc_xprt and as a result, rpc_tasks would loop around with the following
rpcdebug, which shows the rpc_xprt still has the XPRT_CONNECTED bit set even
though EPIPE (32) is coming back from the TCP layer:

kernel: RPC:    92 xprt_transmit(524516)
kernel: RPC:       xs_tcp_send_request(524516) = -32
kernel: RPC:    92 call_status (status -32)
kernel: RPC:    92 call_bind (status 0)
kernel: RPC:    92 call_connect xprt ffff8ff8abf44000 is connected
kernel: RPC:    92 call_transmit (status 0)
kernel: RPC:    92 xprt_prepare_transmit
kernel: RPC:    92 rpc_xdr_encode (status 0)
kernel: RPC:    92 marshaling UNIX cred ffff8ff835db5000
kernel: RPC:    92 using AUTH_UNIX cred ffff8ff835db5000 to wrap rpc data
kernel: RPC:    92 xprt_transmit(524516)
kernel: RPC:       xs_tcp_send_request(524516) = -32

The explanation of the above is as follows.  Prior to 9b30889c548a,
XPRT_CONNECTED was unconditionally cleared while handling TCP_CLOSE inside
xs_tcp_state_change through the following call chain:
	xs_sock_mark_closed -> xprt_disconnect_done
After the patch, while handling TCP_CLOSE, we have the following chain:
	xs_tcp_force_close -> xprt_force_disconnect
The latter will queue a work xprt_autoclose based on the XPRT_LOCKED bit.
If the xprt is locked at this point (for example due to another rpc_task in
the middle of transmission), the work won't get queued and the XPRT_CONNECTED
bit never gets cleared.

The below reproducer will trigger the problem on 4.16 - 4.19 and the 1-line
patch that follows fixes the problem.
So far I have not been able to reproduce the problem on 4.20-rc6.  However,
the patch makes handling of TCP_CLOSE consistent with that of TCP_CLOSE_WAIT
inside xs_tcp_state_change.  On the basis of consistency / correctness, and
given the impact to 4.16 - 4.19, I submit the patch that follows for 
consideration for upstream as well as stable branches of 4.16 - 4.19.


#!/bin/bash
# Reproduces NFS client hang due to sunrpc state machine spinning in 
# transmit side due to incorrect handling of XPRT_CONNECTED in presence
# of unsolicited RST.
# Tested (fails) on: 4.16, 4.17, 4.18, 4.19, 3.10.0-957.el7
#
# Author: Frank Sorenson <sorenson@redhat.com>
# From: Deependra Shekhawat <deepens@amazon.com>
# Modifications: Dave Wysochanski <dwysocha@redhat.com>
#
install_dependency() {
	/usr/bin/which $1 > /dev/null 2>&1
	[ $? -ne 0 ] && echo "Installing dependency $1" && yum install -y $2
}
install_dependency tshark wireshark
install_dependency screen screen

exports_dir=/exports2
nfsmp=/mnt/tmp
blocked_sleep_time=10
test_sleep_time1=10
test_sleep_time2=10

LOG_OVER_NET=1
# whether to log by sending to udp/514 (syslog) on HOST1 (either
#  enable remote syslog, or just add 'or udp port 514' to the packet
#  capture filter.  Goes to /dev/null, if set to 0.
#

pcap_pid=""

if [[ $LOG_OVER_NET -ne 0 ]] ; then
# probably best to be the network address of some real network, but not a real host address
# FIXME: un-comment the following in /etc/rsyslog.conf
#$ModLoad imudp
#$UDPServerRun 514
	systemctl start rsyslog
	exec 6<>/dev/udp/127.0.0.1/514
else
	exec 6<>/dev/null
fi
trap exit_cleanup EXIT
exit_cleanup() {
	iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
	[[ -n $pcap_pid ]] && kill $pcap_pid
	screen -S dd_loop -X kill >/dev/null 2>&1

	rpcdebug -m rpc -c all >/dev/null
	rpcdebug -m nfs -c all >/dev/null

	exec 6>&-
}
output() {
	echo -en "$*"
#	echo -en "$0: $*" | tr -d '\n\t' | tee >(logger -t nfs_hang_test) >&6 
	echo -en "$0: $*" | tr -d '\n\t' | logger -t nfs_hang_test
	echo -en "$0: $*" | tr -d '\n\t' >&6 
}


tshark -i lo -Qw /tmp/trace.pcap tcp port 2049 or udp port 514 &
pcap_pid=$!

systemctl start nfs-server
sleep 2

output "performing setup and mount\n"
rpcdebug -m rpc -s all
rpcdebug -m nfs -s all

# Either add nfs to firewall or use 'insecure' on mount and 'noresvport' on mount
firewall-cmd --permanent --add-service=nfs
systemctl restart firewalld
mkdir -p $exports_dir
exportfs -o rw,sync,no_root_squash 127.0.0.1:$exports_dir
mkdir -p $nfsmp
mount -overs=4.1 127.0.0.1:$exports_dir $nfsmp
mkdir -p $nfsmp$exports_dir/dir



output "starting dd loop\n"
sleep 1
screen -d -m -S dd_loop /bin/bash -c "cd $nfsmp ; while true ; do date ; time dd if=/dev/zero oflag=direct of=$nfsmp$exports_dir/dir/file bs=16M count=100 2>/dev/null ; done"
output "dd loop started\n"
sleep 2


output "enabling firewall rule\n"
iptables -A OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule added; sleeping $blocked_sleep_time\n"

sleep $blocked_sleep_time

output "removing firewall rule\n"
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule removed\n"

output "sleeping $test_sleep_time1\n"
sleep $test_sleep_time1

output "checking response\n"
timeout -k 10 10 /bin/ls $nfsmp$exports_dir/dir/file >/dev/null 2>&1
res=$?

if [[ $res -eq 0 ]] ; then
	output "OK on `uname -r`: ls appears okay\n"
else
	output "WARNING on `uname -r`: ls appears hung\n"
fi

output "killing dd loop\n"
screen -S dd_loop -X kill

output "sleeping $test_sleep_time2\n"
sleep $test_sleep_time2

output "killing pcap\n"
kill $pcap_pid

output "stopping rpcdebug\n"
rpcdebug -m rpc -c all
rpcdebug -m nfs -c all

output "checking umount\n"
timeout -k 10 10 umount $nfsmp >/dev/null 2>&1
res=$?

if [[ $res -eq 0 ]] ; then
	output "TEST PASS on `uname -r`: umount ok\n"
else
	output "TEST FAIL on `uname -r`: umount hung\n"
fi

output "test complete\n"



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 13:51 [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST Dave Wysochanski
@ 2018-12-12 13:51 ` " Dave Wysochanski
  2018-12-12 16:56   ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Wysochanski @ 2018-12-12 13:51 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker; +Cc: linux-nfs

Commit 9b30889c548a changed the handling of TCP_CLOSE inside
xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED bit
was cleared unconditionally inside xprt_disconnect_done, similar
to the handling of TCP_CLOSE_WAIT.  After the change the clearing
of XPRT_CONNECTED depends on successfully queueing a work based
xprt_autoclose which depends on XPRT_LOCKED and may not happen.
This is significant in the case of an unexpected RST from the
server, as the client will only see xs_tcp_state_change called with
sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
with handling TCP_CLOSE_WAIT.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 net/sunrpc/xprtsock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 8a5e823e0b33..b9789036051d 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock *sk)
 		if (sk->sk_err)
 			xprt_wake_pending_tasks(xprt, -sk->sk_err);
 		/* Trigger the socket release */
+		clear_bit(XPRT_CONNECTED, &xprt->state);
 		xs_tcp_force_close(xprt);
 	}
  out:
-- 
2.14.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 13:51 ` [PATCH 1/1] " Dave Wysochanski
@ 2018-12-12 16:56   ` Trond Myklebust
  2018-12-12 17:47     ` Dave Wysochanski
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Trond Myklebust @ 2018-12-12 16:56 UTC (permalink / raw)
  To: anna.schumaker, dwysocha; +Cc: linux-nfs

On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED bit
> was cleared unconditionally inside xprt_disconnect_done, similar
> to the handling of TCP_CLOSE_WAIT.  After the change the clearing
> of XPRT_CONNECTED depends on successfully queueing a work based
> xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> This is significant in the case of an unexpected RST from the
> server, as the client will only see xs_tcp_state_change called with
> sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> with handling TCP_CLOSE_WAIT.
> 
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> ---
>  net/sunrpc/xprtsock.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 8a5e823e0b33..b9789036051d 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
> *sk)
>  		if (sk->sk_err)
>  			xprt_wake_pending_tasks(xprt, -sk->sk_err);
>  		/* Trigger the socket release */
> +		clear_bit(XPRT_CONNECTED, &xprt->state);
>  		xs_tcp_force_close(xprt);
>  	}
>   out:

Hi Dave,

This isn't needed for 4.20 or newer because call_transmit() will now
always call xprt_end_transmit(). I suggest that a stable fix do
something similar (perhaps conditional on the error returned by
xprt_transmit()?).

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 16:56   ` Trond Myklebust
@ 2018-12-12 17:47     ` Dave Wysochanski
  2018-12-12 18:02       ` Trond Myklebust
  2018-12-14 13:48     ` Dave Wysochanski
  2018-12-14 18:29     ` Scott Mayhew
  2 siblings, 1 reply; 9+ messages in thread
From: Dave Wysochanski @ 2018-12-12 17:47 UTC (permalink / raw)
  To: Trond Myklebust, anna.schumaker; +Cc: linux-nfs

On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote:
> On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> > Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> > xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED bit
> > was cleared unconditionally inside xprt_disconnect_done, similar
> > to the handling of TCP_CLOSE_WAIT.  After the change the clearing
> > of XPRT_CONNECTED depends on successfully queueing a work based
> > xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> > This is significant in the case of an unexpected RST from the
> > server, as the client will only see xs_tcp_state_change called with
> > sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> > with handling TCP_CLOSE_WAIT.
> > 
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> >  net/sunrpc/xprtsock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 8a5e823e0b33..b9789036051d 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
> > *sk)
> >  		if (sk->sk_err)
> >  			xprt_wake_pending_tasks(xprt, -sk-
> > >sk_err);
> >  		/* Trigger the socket release */
> > +		clear_bit(XPRT_CONNECTED, &xprt->state);
> >  		xs_tcp_force_close(xprt);
> >  	}
> >   out:
> 
> Hi Dave,
> 
> This isn't needed for 4.20 or newer because call_transmit() will now
> always call xprt_end_transmit(). I suggest that a stable fix do
> something similar (perhaps conditional on the error returned by
> xprt_transmit()?).
> 

Can you explain the handling in xs_tcp_state_change - why
XPRT_CONNECTED would need to remain set longer in the case of handling
TCP_CLOSE case rather than TCP_CLOSE_WAIT?  It seems like if we get an
RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED bit
getting cleared need to be delayed in that case?

I will look into the approach you mention though at the moment I do not
see how it helps because the underlying issue seems to be clearing of
the XPRT_CONNECTED bit.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 17:47     ` Dave Wysochanski
@ 2018-12-12 18:02       ` Trond Myklebust
  2018-12-12 19:56         ` Dave Wysochanski
  2019-01-08 12:46         ` Benjamin Coddington
  0 siblings, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2018-12-12 18:02 UTC (permalink / raw)
  To: anna.schumaker, dwysocha; +Cc: linux-nfs

On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote:
> On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote:
> > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> > > xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED
> > > bit
> > > was cleared unconditionally inside xprt_disconnect_done, similar
> > > to the handling of TCP_CLOSE_WAIT.  After the change the clearing
> > > of XPRT_CONNECTED depends on successfully queueing a work based
> > > xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> > > This is significant in the case of an unexpected RST from the
> > > server, as the client will only see xs_tcp_state_change called
> > > with
> > > sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> > > with handling TCP_CLOSE_WAIT.
> > > 
> > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > ---
> > >  net/sunrpc/xprtsock.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index 8a5e823e0b33..b9789036051d 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
> > > *sk)
> > >  		if (sk->sk_err)
> > >  			xprt_wake_pending_tasks(xprt, -sk-
> > > > sk_err);
> > >  		/* Trigger the socket release */
> > > +		clear_bit(XPRT_CONNECTED, &xprt->state);
> > >  		xs_tcp_force_close(xprt);
> > >  	}
> > >   out:
> > 
> > Hi Dave,
> > 
> > This isn't needed for 4.20 or newer because call_transmit() will
> > now
> > always call xprt_end_transmit(). I suggest that a stable fix do
> > something similar (perhaps conditional on the error returned by
> > xprt_transmit()?).
> > 
> 
> Can you explain the handling in xs_tcp_state_change - why
> XPRT_CONNECTED would need to remain set longer in the case of
> handling
> TCP_CLOSE case rather than TCP_CLOSE_WAIT?  It seems like if we get
> an
> RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED
> bit
> getting cleared need to be delayed in that case?
> 
> I will look into the approach you mention though at the moment I do
> not
> see how it helps because the underlying issue seems to be clearing of
> the XPRT_CONNECTED bit.
> 

See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically
trigger autoclose if  XPRT_CLOSE_WAIT is set, regardless of whether or
not there are other tasks waiting for the lock.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 18:02       ` Trond Myklebust
@ 2018-12-12 19:56         ` Dave Wysochanski
  2019-01-08 12:46         ` Benjamin Coddington
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Wysochanski @ 2018-12-12 19:56 UTC (permalink / raw)
  To: Trond Myklebust, anna.schumaker; +Cc: linux-nfs

On Wed, 2018-12-12 at 18:02 +0000, Trond Myklebust wrote:
> On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote:
> > On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote:
> > > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> > > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> > > > xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED
> > > > bit
> > > > was cleared unconditionally inside xprt_disconnect_done,
> > > > similar
> > > > to the handling of TCP_CLOSE_WAIT.  After the change the
> > > > clearing
> > > > of XPRT_CONNECTED depends on successfully queueing a work based
> > > > xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> > > > This is significant in the case of an unexpected RST from the
> > > > server, as the client will only see xs_tcp_state_change called
> > > > with
> > > > sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> > > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> > > > with handling TCP_CLOSE_WAIT.
> > > > 
> > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > ---
> > > >  net/sunrpc/xprtsock.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 8a5e823e0b33..b9789036051d 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct
> > > > sock
> > > > *sk)
> > > >  		if (sk->sk_err)
> > > >  			xprt_wake_pending_tasks(xprt, -sk-
> > > > > sk_err);
> > > > 
> > > >  		/* Trigger the socket release */
> > > > +		clear_bit(XPRT_CONNECTED, &xprt->state);
> > > >  		xs_tcp_force_close(xprt);
> > > >  	}
> > > >   out:
> > > 
> > > Hi Dave,
> > > 
> > > This isn't needed for 4.20 or newer because call_transmit() will
> > > now
> > > always call xprt_end_transmit(). I suggest that a stable fix do
> > > something similar (perhaps conditional on the error returned by
> > > xprt_transmit()?).
> > > 
> > 
> > Can you explain the handling in xs_tcp_state_change - why
> > XPRT_CONNECTED would need to remain set longer in the case of
> > handling
> > TCP_CLOSE case rather than TCP_CLOSE_WAIT?  It seems like if we get
> > an
> > RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED
> > bit
> > getting cleared need to be delayed in that case?
> > 
> > I will look into the approach you mention though at the moment I do
> > not
> > see how it helps because the underlying issue seems to be clearing
> > of
> > the XPRT_CONNECTED bit.
> > 
> 
> See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically
> trigger autoclose if  XPRT_CLOSE_WAIT is set, regardless of whether
> or
> not there are other tasks waiting for the lock.
> 
> 

Ok thanks for pointing me in that direction I will investigate.

I am actually seeing a hang now with the reproducer on 4.20-rc6 but
it's not CPU spinning.  Investigating as it was not easy to reproduce
but there must still be a race somewhere.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 16:56   ` Trond Myklebust
  2018-12-12 17:47     ` Dave Wysochanski
@ 2018-12-14 13:48     ` Dave Wysochanski
  2018-12-14 18:29     ` Scott Mayhew
  2 siblings, 0 replies; 9+ messages in thread
From: Dave Wysochanski @ 2018-12-14 13:48 UTC (permalink / raw)
  To: Trond Myklebust, anna.schumaker; +Cc: linux-nfs

On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote:
> On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> > Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> > xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED bit
> > was cleared unconditionally inside xprt_disconnect_done, similar
> > to the handling of TCP_CLOSE_WAIT.  After the change the clearing
> > of XPRT_CONNECTED depends on successfully queueing a work based
> > xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> > This is significant in the case of an unexpected RST from the
> > server, as the client will only see xs_tcp_state_change called with
> > sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> > with handling TCP_CLOSE_WAIT.
> > 
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> >  net/sunrpc/xprtsock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 8a5e823e0b33..b9789036051d 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
> > *sk)
> >  		if (sk->sk_err)
> >  			xprt_wake_pending_tasks(xprt, -sk-
> > >sk_err);
> >  		/* Trigger the socket release */
> > +		clear_bit(XPRT_CONNECTED, &xprt->state);
> >  		xs_tcp_force_close(xprt);
> >  	}
> >   out:
> 
> Hi Dave,
> 
> This isn't needed for 4.20 or newer because call_transmit() will now
> always call xprt_end_transmit(). I suggest that a stable fix do
> something similar (perhaps conditional on the error returned by
> xprt_transmit()?).
> 
> 

It appears the "Special cases" comment inside call_transmit_status()
should be removed now, do you agree?

Also I guess you are not concerned about this issue anymore that
created the comment and the cases?
https://lore.kernel.org/patchwork/patch/165428/

Also for a stable fix, would you prefer handling all of the codes on
lines 39-43 the same and removing that comment, or just EPIPE we are
seeing in the testcase?

After some discussion with Scott we're testing variants on the above
(see below for possible fixes).  Any preferences on option 1 or option
2?

Thanks.




Upstream (4.20-rc6) has a comment but it looks wrong due to code in call_transmit():
   1971 static void
   1972 call_transmit(struct rpc_task *task)
   1973 {
   1974         dprint_status(task);
   1975 
   1976         task->tk_status = 0;
   1977         if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) {
   1978                 if (!xprt_prepare_transmit(task))
   1979                         return;
   1980                 xprt_transmit(task);
   1981         }
   1982         task->tk_action = call_transmit_status;
   1983         xprt_end_transmit(task);   <--------------------- Always drop the XPRT_LOCK
   1984 }
...
   1989 static void
   1990 call_transmit_status(struct rpc_task *task)
   1991 {
   1992         task->tk_action = call_status;
   1993 
   1994         /*
   1995          * Common case: success.  Force the compiler to put this
   1996          * test first.
   1997          */
   1998         if (task->tk_status == 0) {
   1999                 xprt_request_wait_receive(task);
   2000                 return;
   2001         }
   2002 
   2003         switch (task->tk_status) {
   2004         default:
   2005                 dprint_status(task);
   2006                 break;
   2007         case -EBADMSG:
   2008                 task->tk_status = 0;
   2009                 task->tk_action = call_encode;
   2010                 break;
   2011                 /*
   2012                  * Special cases: if we've been waiting on the  <---- WRONG
   2013                  * socket's write_space() callback, or if the  <----- WRONG
   2014                  * socket just returned a connection error,  <------- WRONG
   2015                  * then hold onto the transport lock.    <----------- WRONG 
   2016                  */
   2017         case -ENOBUFS:
   2018                 rpc_delay(task, HZ>>2);
   2019                 /* fall through */
   2020         case -EBADSLT:
   2021         case -EAGAIN:
   2022                 task->tk_action = call_transmit;
   2023                 task->tk_status = 0;
   2024                 break;
   2025         case -ECONNREFUSED:
   2026         case -EHOSTDOWN:
   2027         case -ENETDOWN:
   2028         case -EHOSTUNREACH:
   2029         case -ENETUNREACH:
   2030         case -EPERM:
   2031                 if (RPC_IS_SOFTCONN(task)) {
   2032                         if (!task->tk_msg.rpc_proc->p_proc)
   2033                                 trace_xprt_ping(task->tk_xprt,
   2034                                                 task->tk_status);
   2035                         rpc_exit(task, task->tk_status);
   2036                         break;
   2037                 }
   2038                 /* fall through */
   2039         case -ECONNRESET:
   2040         case -ECONNABORTED:
   2041         case -EADDRINUSE:
   2042         case -ENOTCONN:
   2043         case -EPIPE:
   2044                 break;
   2045         }
   2046 }



For 4.16 - 4.19 possible fixes


Possible fix #1 - just remove EPIPE and/or all the codes and move these to 'default':


2002 static void
2003 call_transmit_status(struct rpc_task *task)
2004 {
2005         task->tk_action = call_status;
2006 
2007         /*
2008          * Common case: success.  Force the compiler to put this
2009          * test first.
2010          */
2011         if (task->tk_status == 0) {
2012                 xprt_end_transmit(task);
2013                 rpc_task_force_reencode(task);
2014                 return;
2015         }
2016 
2017         switch (task->tk_status) {
2018         case -EAGAIN:
2019         case -ENOBUFS:
2020                 break;
2021         default:
2022                 dprint_status(task);
2023                 xprt_end_transmit(task);
2024                 rpc_task_force_reencode(task);
2025                 break;
2026                 /*
2027                  * Special cases: if we've been waiting on the
2028                  * socket's write_space() callback, or if the
2029                  * socket just returned a connection error,  <----------- WRONG ???
2030                  * then hold onto the transport lock.     <-------------- WRONG ???
2031                  */
2032         case -ECONNREFUSED:
2033         case -EHOSTDOWN:
2034         case -EHOSTUNREACH:
2035         case -ENETUNREACH:
2036         case -EPERM:
2037                 if (RPC_IS_SOFTCONN(task)) {
2038                         xprt_end_transmit(task);
2039                         rpc_exit(task, task->tk_status);
2040                         break;
2041                 }
2042         case -ECONNRESET:  <------- ????
2043         case -ECONNABORTED: <------ ????
2044         case -EADDRINUSE:  <------- ???? 
2045         case -ENOTCONN:   <-------- ???? What about these others ???
2046         case -EPIPE:  <------------------------------ remove this line to fix bug
2047                 rpc_task_force_reencode(task);
2048         }
2049 }



Possible fix #2: make calling xprt_end_transmit() conditional on both EPIPE and xprt_connected()


 	case -EPIPE:
+		if (xprt_connected(task->tk_xprt))
+			xprt_end_transmit(task);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 16:56   ` Trond Myklebust
  2018-12-12 17:47     ` Dave Wysochanski
  2018-12-14 13:48     ` Dave Wysochanski
@ 2018-12-14 18:29     ` Scott Mayhew
  2 siblings, 0 replies; 9+ messages in thread
From: Scott Mayhew @ 2018-12-14 18:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna.schumaker, dwysocha, linux-nfs

On Wed, 12 Dec 2018, Trond Myklebust wrote:

> On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
> > Commit 9b30889c548a changed the handling of TCP_CLOSE inside
> > xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED bit
> > was cleared unconditionally inside xprt_disconnect_done, similar
> > to the handling of TCP_CLOSE_WAIT.  After the change the clearing
> > of XPRT_CONNECTED depends on successfully queueing a work based
> > xprt_autoclose which depends on XPRT_LOCKED and may not happen.
> > This is significant in the case of an unexpected RST from the
> > server, as the client will only see xs_tcp_state_change called with
> > sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
> > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
> > with handling TCP_CLOSE_WAIT.
> > 
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> >  net/sunrpc/xprtsock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 8a5e823e0b33..b9789036051d 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
> > *sk)
> >  		if (sk->sk_err)
> >  			xprt_wake_pending_tasks(xprt, -sk->sk_err);
> >  		/* Trigger the socket release */
> > +		clear_bit(XPRT_CONNECTED, &xprt->state);
> >  		xs_tcp_force_close(xprt);
> >  	}
> >   out:
> 
> Hi Dave,
> 
> This isn't needed for 4.20 or newer because call_transmit() will now
> always call xprt_end_transmit().

Dave's script can also produce a hang on 4.20, but for a different
reason... if xs_reset_transport() is called on an xprt that has
XPRT_WRITE_SPACE set, then no further RPC tasks will be able to lock the
transport or connect it.  Clearing XPRT_WRITE_SPACE in
xs_sock_reset_connection_flags() seems to fix that. I don't know if
XPRT_WRITE_SPACE counts as a "connection flag" though.

-Scott

> I suggest that a stable fix do
> something similar (perhaps conditional on the error returned by
> xprt_transmit()?).
> 
> Cheers
>   Trond
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
  2018-12-12 18:02       ` Trond Myklebust
  2018-12-12 19:56         ` Dave Wysochanski
@ 2019-01-08 12:46         ` Benjamin Coddington
  1 sibling, 0 replies; 9+ messages in thread
From: Benjamin Coddington @ 2019-01-08 12:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna.schumaker, dwysocha, linux-nfs

On 12 Dec 2018, at 13:02, Trond Myklebust wrote:

> On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote:
>> On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote:
>>> On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote:
>>>> Commit 9b30889c548a changed the handling of TCP_CLOSE inside
>>>> xs_tcp_state_change.  Prior to this change, the XPRT_CONNECTED
>>>> bit
>>>> was cleared unconditionally inside xprt_disconnect_done, similar
>>>> to the handling of TCP_CLOSE_WAIT.  After the change the clearing
>>>> of XPRT_CONNECTED depends on successfully queueing a work based
>>>> xprt_autoclose which depends on XPRT_LOCKED and may not happen.
>>>> This is significant in the case of an unexpected RST from the
>>>> server, as the client will only see xs_tcp_state_change called
>>>> with
>>>> sk_state == TCP_CLOSE.  Restore the unconditional clear_bit on
>>>> XPRT_CONNECTED while handling TCP_CLOSE and make it consistent
>>>> with handling TCP_CLOSE_WAIT.
>>>>
>>>> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
>>>> ---
>>>>  net/sunrpc/xprtsock.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>>> index 8a5e823e0b33..b9789036051d 100644
>>>> --- a/net/sunrpc/xprtsock.c
>>>> +++ b/net/sunrpc/xprtsock.c
>>>> @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock
>>>> *sk)
>>>>  		if (sk->sk_err)
>>>>  			xprt_wake_pending_tasks(xprt, -sk-
>>>>> sk_err);
>>>>  		/* Trigger the socket release */
>>>> +		clear_bit(XPRT_CONNECTED, &xprt->state);
>>>>  		xs_tcp_force_close(xprt);
>>>>  	}
>>>>   out:
>>>
>>> Hi Dave,
>>>
>>> This isn't needed for 4.20 or newer because call_transmit() will
>>> now
>>> always call xprt_end_transmit(). I suggest that a stable fix do
>>> something similar (perhaps conditional on the error returned by
>>> xprt_transmit()?).
>>>
>>
>> Can you explain the handling in xs_tcp_state_change - why
>> XPRT_CONNECTED would need to remain set longer in the case of
>> handling
>> TCP_CLOSE case rather than TCP_CLOSE_WAIT?  It seems like if we get
>> an
>> RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED
>> bit
>> getting cleared need to be delayed in that case?
>>
>> I will look into the approach you mention though at the moment I do
>> not
>> see how it helps because the underlying issue seems to be clearing of
>> the XPRT_CONNECTED bit.
>>
>
> See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically
> trigger autoclose if  XPRT_CLOSE_WAIT is set, regardless of whether or
> not there are other tasks waiting for the lock.

It seems to make sense to drop the XPRT_LOCK in call_transmit_status if
XPRT_CLOSE_WAIT is set.. something like:

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 336fd1a19cca..f30bf500888d 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -443,6 +443,11 @@ static inline int xprt_test_and_set_connecting(struct
rpc_xprt *xprt)
        return test_and_set_bit(XPRT_CONNECTING, &xprt->state);
 }

+static inline int xprt_close_wait(struct rpc_xprt *xprt)
+{
+       return test_bit(XPRT_CLOSE_WAIT, &xprt->state);
+}
+
 static inline void xprt_set_bound(struct rpc_xprt *xprt)
 {
        test_and_set_bit(XPRT_BOUND, &xprt->state);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8ea2f5fadd96..1fc812ba9871 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1992,13 +1992,15 @@ call_transmit(struct rpc_task *task)
 static void
 call_transmit_status(struct rpc_task *task)
 {
+       struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt;
        task->tk_action = call_status;

        /*
         * Common case: success.  Force the compiler to put this
-        * test first.
+        * test first.  Or, if any error and xprt_close_wait,
+        * release the xprt lock so the socket can close.
         */
-       if (task->tk_status == 0) {
+       if (task->tk_status == 0 || xprt_close_wait(xprt)) {
                xprt_end_transmit(task);
                rpc_task_force_reencode(task);
                return;

Hmm.. I guess I have to figure out how to get a stable patch submitted that
will never have a mainline commit, since we don't have this problem in the
mainline.

Ben

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 13:51 [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST Dave Wysochanski
2018-12-12 13:51 ` [PATCH 1/1] " Dave Wysochanski
2018-12-12 16:56   ` Trond Myklebust
2018-12-12 17:47     ` Dave Wysochanski
2018-12-12 18:02       ` Trond Myklebust
2018-12-12 19:56         ` Dave Wysochanski
2019-01-08 12:46         ` Benjamin Coddington
2018-12-14 13:48     ` Dave Wysochanski
2018-12-14 18:29     ` Scott Mayhew

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox