From: Dave Wysochanski <dwysocha@redhat.com>
To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
Date: Wed, 12 Dec 2018 08:51:56 -0500 [thread overview]
Message-ID: <20181212135157.4489-1-dwysocha@redhat.com> (raw)
Recently a customer reported a regression which was the result of a small
portion of 9b30889c548a which changed the handling of TCP_CLOSE inside
xs_tcp_state_change. A simple reproducer was shown where an unexpected RST
injected during a simple write test would cause the NFS client to hang, a
kworker thread spinning at 100% cpu, and the only recovery was to reboot.
Upon investigation we found that the XPRT_CONNECTED bit was never cleared on
the rpc_xprt and as a result, rpc_tasks would loop around with the following
rpcdebug, which shows the rpc_xprt still has the XPRT_CONNECTED bit set even
though EPIPE (32) is coming back from the TCP layer:
kernel: RPC: 92 xprt_transmit(524516)
kernel: RPC: xs_tcp_send_request(524516) = -32
kernel: RPC: 92 call_status (status -32)
kernel: RPC: 92 call_bind (status 0)
kernel: RPC: 92 call_connect xprt ffff8ff8abf44000 is connected
kernel: RPC: 92 call_transmit (status 0)
kernel: RPC: 92 xprt_prepare_transmit
kernel: RPC: 92 rpc_xdr_encode (status 0)
kernel: RPC: 92 marshaling UNIX cred ffff8ff835db5000
kernel: RPC: 92 using AUTH_UNIX cred ffff8ff835db5000 to wrap rpc data
kernel: RPC: 92 xprt_transmit(524516)
kernel: RPC: xs_tcp_send_request(524516) = -32
The explanation of the above is as follows. Prior to 9b30889c548a,
XPRT_CONNECTED was unconditionally cleared while handling TCP_CLOSE inside
xs_tcp_state_change through the following call chain:
xs_sock_mark_closed -> xprt_disconnect_done
After the patch, while handling TCP_CLOSE, we have the following chain:
xs_tcp_force_close -> xprt_force_disconnect
The latter will queue a work xprt_autoclose based on the XPRT_LOCKED bit.
If the xprt is locked at this point (for example due to another rpc_task in
the middle of transmission), the work won't get queued and the XPRT_CONNECTED
bit never gets cleared.
The below reproducer will trigger the problem on 4.16 - 4.19 and the 1-line
patch that follows fixes the problem.
So far I have not been able to reproduce the problem on 4.20-rc6. However,
the patch makes handling of TCP_CLOSE consistent with that of TCP_CLOSE_WAIT
inside xs_tcp_state_change. On the basis of consistency / correctness, and
given the impact to 4.16 - 4.19, I submit the patch that follows for
consideration for upstream as well as stable branches of 4.16 - 4.19.
#!/bin/bash
# Reproduces NFS client hang due to sunrpc state machine spinning in
# transmit side due to incorrect handling of XPRT_CONNECTED in presence
# of unsolicited RST.
# Tested (fails) on: 4.16, 4.17, 4.18, 4.19, 3.10.0-957.el7
#
# Author: Frank Sorenson <sorenson@redhat.com>
# From: Deependra Shekhawat <deepens@amazon.com>
# Modifications: Dave Wysochanski <dwysocha@redhat.com>
#
install_dependency() {
/usr/bin/which $1 > /dev/null 2>&1
[ $? -ne 0 ] && echo "Installing dependency $1" && yum install -y $2
}
install_dependency tshark wireshark
install_dependency screen screen
exports_dir=/exports2
nfsmp=/mnt/tmp
blocked_sleep_time=10
test_sleep_time1=10
test_sleep_time2=10
LOG_OVER_NET=1
# whether to log by sending to udp/514 (syslog) on HOST1 (either
# enable remote syslog, or just add 'or udp port 514' to the packet
# capture filter. Goes to /dev/null, if set to 0.
#
pcap_pid=""
if [[ $LOG_OVER_NET -ne 0 ]] ; then
# probably best to be the network address of some real network, but not a real host address
# FIXME: un-comment the following in /etc/rsyslog.conf
#$ModLoad imudp
#$UDPServerRun 514
systemctl start rsyslog
exec 6<>/dev/udp/127.0.0.1/514
else
exec 6<>/dev/null
fi
trap exit_cleanup EXIT
exit_cleanup() {
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
[[ -n $pcap_pid ]] && kill $pcap_pid
screen -S dd_loop -X kill >/dev/null 2>&1
rpcdebug -m rpc -c all >/dev/null
rpcdebug -m nfs -c all >/dev/null
exec 6>&-
}
output() {
echo -en "$*"
# echo -en "$0: $*" | tr -d '\n\t' | tee >(logger -t nfs_hang_test) >&6
echo -en "$0: $*" | tr -d '\n\t' | logger -t nfs_hang_test
echo -en "$0: $*" | tr -d '\n\t' >&6
}
tshark -i lo -Qw /tmp/trace.pcap tcp port 2049 or udp port 514 &
pcap_pid=$!
systemctl start nfs-server
sleep 2
output "performing setup and mount\n"
rpcdebug -m rpc -s all
rpcdebug -m nfs -s all
# Either add nfs to firewall or use 'insecure' on mount and 'noresvport' on mount
firewall-cmd --permanent --add-service=nfs
systemctl restart firewalld
mkdir -p $exports_dir
exportfs -o rw,sync,no_root_squash 127.0.0.1:$exports_dir
mkdir -p $nfsmp
mount -overs=4.1 127.0.0.1:$exports_dir $nfsmp
mkdir -p $nfsmp$exports_dir/dir
output "starting dd loop\n"
sleep 1
screen -d -m -S dd_loop /bin/bash -c "cd $nfsmp ; while true ; do date ; time dd if=/dev/zero oflag=direct of=$nfsmp$exports_dir/dir/file bs=16M count=100 2>/dev/null ; done"
output "dd loop started\n"
sleep 2
output "enabling firewall rule\n"
iptables -A OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule added; sleeping $blocked_sleep_time\n"
sleep $blocked_sleep_time
output "removing firewall rule\n"
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule removed\n"
output "sleeping $test_sleep_time1\n"
sleep $test_sleep_time1
output "checking response\n"
timeout -k 10 10 /bin/ls $nfsmp$exports_dir/dir/file >/dev/null 2>&1
res=$?
if [[ $res -eq 0 ]] ; then
output "OK on `uname -r`: ls appears okay\n"
else
output "WARNING on `uname -r`: ls appears hung\n"
fi
output "killing dd loop\n"
screen -S dd_loop -X kill
output "sleeping $test_sleep_time2\n"
sleep $test_sleep_time2
output "killing pcap\n"
kill $pcap_pid
output "stopping rpcdebug\n"
rpcdebug -m rpc -c all
rpcdebug -m nfs -c all
output "checking umount\n"
timeout -k 10 10 umount $nfsmp >/dev/null 2>&1
res=$?
if [[ $res -eq 0 ]] ; then
output "TEST PASS on `uname -r`: umount ok\n"
else
output "TEST FAIL on `uname -r`: umount hung\n"
fi
output "test complete\n"
next reply other threads:[~2018-12-12 13:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-12 13:51 Dave Wysochanski [this message]
2018-12-12 13:51 ` [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST Dave Wysochanski
2018-12-12 16:56 ` Trond Myklebust
2018-12-12 17:47 ` Dave Wysochanski
2018-12-12 18:02 ` Trond Myklebust
2018-12-12 19:56 ` Dave Wysochanski
2019-01-08 12:46 ` Benjamin Coddington
2018-12-14 13:48 ` Dave Wysochanski
2018-12-14 18:29 ` Scott Mayhew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181212135157.4489-1-dwysocha@redhat.com \
--to=dwysocha@redhat.com \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).