All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Wysochanski <dwysocha@redhat.com>
To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
Date: Wed, 12 Dec 2018 08:51:56 -0500	[thread overview]
Message-ID: <20181212135157.4489-1-dwysocha@redhat.com> (raw)

Recently a customer reported a regression which was the result of a small
portion of 9b30889c548a which changed the handling of TCP_CLOSE inside
xs_tcp_state_change.  A simple reproducer was shown where an unexpected RST
injected during a simple write test would cause the NFS client to hang, a
kworker thread spinning at 100% cpu, and the only recovery was to reboot.

Upon investigation we found that the XPRT_CONNECTED bit was never cleared on
the rpc_xprt and as a result, rpc_tasks would loop around with the following
rpcdebug, which shows the rpc_xprt still has the XPRT_CONNECTED bit set even
though EPIPE (32) is coming back from the TCP layer:

kernel: RPC:    92 xprt_transmit(524516)
kernel: RPC:       xs_tcp_send_request(524516) = -32
kernel: RPC:    92 call_status (status -32)
kernel: RPC:    92 call_bind (status 0)
kernel: RPC:    92 call_connect xprt ffff8ff8abf44000 is connected
kernel: RPC:    92 call_transmit (status 0)
kernel: RPC:    92 xprt_prepare_transmit
kernel: RPC:    92 rpc_xdr_encode (status 0)
kernel: RPC:    92 marshaling UNIX cred ffff8ff835db5000
kernel: RPC:    92 using AUTH_UNIX cred ffff8ff835db5000 to wrap rpc data
kernel: RPC:    92 xprt_transmit(524516)
kernel: RPC:       xs_tcp_send_request(524516) = -32

The explanation of the above is as follows.  Prior to 9b30889c548a,
XPRT_CONNECTED was unconditionally cleared while handling TCP_CLOSE inside
xs_tcp_state_change through the following call chain:
	xs_sock_mark_closed -> xprt_disconnect_done
After the patch, while handling TCP_CLOSE, we have the following chain:
	xs_tcp_force_close -> xprt_force_disconnect
The latter will queue a work xprt_autoclose based on the XPRT_LOCKED bit.
If the xprt is locked at this point (for example due to another rpc_task in
the middle of transmission), the work won't get queued and the XPRT_CONNECTED
bit never gets cleared.

The below reproducer will trigger the problem on 4.16 - 4.19 and the 1-line
patch that follows fixes the problem.
So far I have not been able to reproduce the problem on 4.20-rc6.  However,
the patch makes handling of TCP_CLOSE consistent with that of TCP_CLOSE_WAIT
inside xs_tcp_state_change.  On the basis of consistency / correctness, and
given the impact to 4.16 - 4.19, I submit the patch that follows for 
consideration for upstream as well as stable branches of 4.16 - 4.19.


#!/bin/bash
# Reproduces NFS client hang due to sunrpc state machine spinning in 
# transmit side due to incorrect handling of XPRT_CONNECTED in presence
# of unsolicited RST.
# Tested (fails) on: 4.16, 4.17, 4.18, 4.19, 3.10.0-957.el7
#
# Author: Frank Sorenson <sorenson@redhat.com>
# From: Deependra Shekhawat <deepens@amazon.com>
# Modifications: Dave Wysochanski <dwysocha@redhat.com>
#
install_dependency() {
	/usr/bin/which $1 > /dev/null 2>&1
	[ $? -ne 0 ] && echo "Installing dependency $1" && yum install -y $2
}
install_dependency tshark wireshark
install_dependency screen screen

exports_dir=/exports2
nfsmp=/mnt/tmp
blocked_sleep_time=10
test_sleep_time1=10
test_sleep_time2=10

LOG_OVER_NET=1
# whether to log by sending to udp/514 (syslog) on HOST1 (either
#  enable remote syslog, or just add 'or udp port 514' to the packet
#  capture filter.  Goes to /dev/null, if set to 0.
#

pcap_pid=""

if [[ $LOG_OVER_NET -ne 0 ]] ; then
# probably best to be the network address of some real network, but not a real host address
# FIXME: un-comment the following in /etc/rsyslog.conf
#$ModLoad imudp
#$UDPServerRun 514
	systemctl start rsyslog
	exec 6<>/dev/udp/127.0.0.1/514
else
	exec 6<>/dev/null
fi
trap exit_cleanup EXIT
exit_cleanup() {
	iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
	[[ -n $pcap_pid ]] && kill $pcap_pid
	screen -S dd_loop -X kill >/dev/null 2>&1

	rpcdebug -m rpc -c all >/dev/null
	rpcdebug -m nfs -c all >/dev/null

	exec 6>&-
}
output() {
	echo -en "$*"
#	echo -en "$0: $*" | tr -d '\n\t' | tee >(logger -t nfs_hang_test) >&6 
	echo -en "$0: $*" | tr -d '\n\t' | logger -t nfs_hang_test
	echo -en "$0: $*" | tr -d '\n\t' >&6 
}


tshark -i lo -Qw /tmp/trace.pcap tcp port 2049 or udp port 514 &
pcap_pid=$!

systemctl start nfs-server
sleep 2

output "performing setup and mount\n"
rpcdebug -m rpc -s all
rpcdebug -m nfs -s all

# Either add nfs to firewall or use 'insecure' on mount and 'noresvport' on mount
firewall-cmd --permanent --add-service=nfs
systemctl restart firewalld
mkdir -p $exports_dir
exportfs -o rw,sync,no_root_squash 127.0.0.1:$exports_dir
mkdir -p $nfsmp
mount -overs=4.1 127.0.0.1:$exports_dir $nfsmp
mkdir -p $nfsmp$exports_dir/dir



output "starting dd loop\n"
sleep 1
screen -d -m -S dd_loop /bin/bash -c "cd $nfsmp ; while true ; do date ; time dd if=/dev/zero oflag=direct of=$nfsmp$exports_dir/dir/file bs=16M count=100 2>/dev/null ; done"
output "dd loop started\n"
sleep 2


output "enabling firewall rule\n"
iptables -A OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule added; sleeping $blocked_sleep_time\n"

sleep $blocked_sleep_time

output "removing firewall rule\n"
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule removed\n"

output "sleeping $test_sleep_time1\n"
sleep $test_sleep_time1

output "checking response\n"
timeout -k 10 10 /bin/ls $nfsmp$exports_dir/dir/file >/dev/null 2>&1
res=$?

if [[ $res -eq 0 ]] ; then
	output "OK on `uname -r`: ls appears okay\n"
else
	output "WARNING on `uname -r`: ls appears hung\n"
fi

output "killing dd loop\n"
screen -S dd_loop -X kill

output "sleeping $test_sleep_time2\n"
sleep $test_sleep_time2

output "killing pcap\n"
kill $pcap_pid

output "stopping rpcdebug\n"
rpcdebug -m rpc -c all
rpcdebug -m nfs -c all

output "checking umount\n"
timeout -k 10 10 umount $nfsmp >/dev/null 2>&1
res=$?

if [[ $res -eq 0 ]] ; then
	output "TEST PASS on `uname -r`: umount ok\n"
else
	output "TEST FAIL on `uname -r`: umount hung\n"
fi

output "test complete\n"



             reply	other threads:[~2018-12-12 13:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12 13:51 Dave Wysochanski [this message]
2018-12-12 13:51 ` [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST Dave Wysochanski
2018-12-12 16:56   ` Trond Myklebust
2018-12-12 17:47     ` Dave Wysochanski
2018-12-12 18:02       ` Trond Myklebust
2018-12-12 19:56         ` Dave Wysochanski
2019-01-08 12:46         ` Benjamin Coddington
2018-12-14 13:48     ` Dave Wysochanski
2018-12-14 18:29     ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181212135157.4489-1-dwysocha@redhat.com \
    --to=dwysocha@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.