From: Dave Wysochanski <dwysocha@redhat.com>
To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 0/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST
Date: Wed, 12 Dec 2018 08:51:56 -0500 [thread overview]
Message-ID: <20181212135157.4489-1-dwysocha@redhat.com> (raw)
Recently a customer reported a regression which was the result of a small
portion of 9b30889c548a which changed the handling of TCP_CLOSE inside
xs_tcp_state_change. A simple reproducer was shown where an unexpected RST
injected during a simple write test would cause the NFS client to hang, a
kworker thread spinning at 100% cpu, and the only recovery was to reboot.
Upon investigation we found that the XPRT_CONNECTED bit was never cleared on
the rpc_xprt and as a result, rpc_tasks would loop around with the following
rpcdebug, which shows the rpc_xprt still has the XPRT_CONNECTED bit set even
though EPIPE (32) is coming back from the TCP layer:
kernel: RPC: 92 xprt_transmit(524516)
kernel: RPC: xs_tcp_send_request(524516) = -32
kernel: RPC: 92 call_status (status -32)
kernel: RPC: 92 call_bind (status 0)
kernel: RPC: 92 call_connect xprt ffff8ff8abf44000 is connected
kernel: RPC: 92 call_transmit (status 0)
kernel: RPC: 92 xprt_prepare_transmit
kernel: RPC: 92 rpc_xdr_encode (status 0)
kernel: RPC: 92 marshaling UNIX cred ffff8ff835db5000
kernel: RPC: 92 using AUTH_UNIX cred ffff8ff835db5000 to wrap rpc data
kernel: RPC: 92 xprt_transmit(524516)
kernel: RPC: xs_tcp_send_request(524516) = -32
The explanation of the above is as follows. Prior to 9b30889c548a,
XPRT_CONNECTED was unconditionally cleared while handling TCP_CLOSE inside
xs_tcp_state_change through the following call chain:
xs_sock_mark_closed -> xprt_disconnect_done
After the patch, while handling TCP_CLOSE, we have the following chain:
xs_tcp_force_close -> xprt_force_disconnect
The latter will queue a work xprt_autoclose based on the XPRT_LOCKED bit.
If the xprt is locked at this point (for example due to another rpc_task in
the middle of transmission), the work won't get queued and the XPRT_CONNECTED
bit never gets cleared.
The below reproducer will trigger the problem on 4.16 - 4.19 and the 1-line
patch that follows fixes the problem.
So far I have not been able to reproduce the problem on 4.20-rc6. However,
the patch makes handling of TCP_CLOSE consistent with that of TCP_CLOSE_WAIT
inside xs_tcp_state_change. On the basis of consistency / correctness, and
given the impact to 4.16 - 4.19, I submit the patch that follows for
consideration for upstream as well as stable branches of 4.16 - 4.19.
#!/bin/bash
# Reproduces NFS client hang due to sunrpc state machine spinning in
# transmit side due to incorrect handling of XPRT_CONNECTED in presence
# of unsolicited RST.
# Tested (fails) on: 4.16, 4.17, 4.18, 4.19, 3.10.0-957.el7
#
# Author: Frank Sorenson <sorenson@redhat.com>
# From: Deependra Shekhawat <deepens@amazon.com>
# Modifications: Dave Wysochanski <dwysocha@redhat.com>
#
install_dependency() {
/usr/bin/which $1 > /dev/null 2>&1
[ $? -ne 0 ] && echo "Installing dependency $1" && yum install -y $2
}
install_dependency tshark wireshark
install_dependency screen screen
exports_dir=/exports2
nfsmp=/mnt/tmp
blocked_sleep_time=10
test_sleep_time1=10
test_sleep_time2=10
LOG_OVER_NET=1
# whether to log by sending to udp/514 (syslog) on HOST1 (either
# enable remote syslog, or just add 'or udp port 514' to the packet
# capture filter. Goes to /dev/null, if set to 0.
#
pcap_pid=""
if [[ $LOG_OVER_NET -ne 0 ]] ; then
# probably best to be the network address of some real network, but not a real host address
# FIXME: un-comment the following in /etc/rsyslog.conf
#$ModLoad imudp
#$UDPServerRun 514
systemctl start rsyslog
exec 6<>/dev/udp/127.0.0.1/514
else
exec 6<>/dev/null
fi
trap exit_cleanup EXIT
exit_cleanup() {
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
[[ -n $pcap_pid ]] && kill $pcap_pid
screen -S dd_loop -X kill >/dev/null 2>&1
rpcdebug -m rpc -c all >/dev/null
rpcdebug -m nfs -c all >/dev/null
exec 6>&-
}
output() {
echo -en "$*"
# echo -en "$0: $*" | tr -d '\n\t' | tee >(logger -t nfs_hang_test) >&6
echo -en "$0: $*" | tr -d '\n\t' | logger -t nfs_hang_test
echo -en "$0: $*" | tr -d '\n\t' >&6
}
tshark -i lo -Qw /tmp/trace.pcap tcp port 2049 or udp port 514 &
pcap_pid=$!
systemctl start nfs-server
sleep 2
output "performing setup and mount\n"
rpcdebug -m rpc -s all
rpcdebug -m nfs -s all
# Either add nfs to firewall or use 'insecure' on mount and 'noresvport' on mount
firewall-cmd --permanent --add-service=nfs
systemctl restart firewalld
mkdir -p $exports_dir
exportfs -o rw,sync,no_root_squash 127.0.0.1:$exports_dir
mkdir -p $nfsmp
mount -overs=4.1 127.0.0.1:$exports_dir $nfsmp
mkdir -p $nfsmp$exports_dir/dir
output "starting dd loop\n"
sleep 1
screen -d -m -S dd_loop /bin/bash -c "cd $nfsmp ; while true ; do date ; time dd if=/dev/zero oflag=direct of=$nfsmp$exports_dir/dir/file bs=16M count=100 2>/dev/null ; done"
output "dd loop started\n"
sleep 2
output "enabling firewall rule\n"
iptables -A OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule added; sleeping $blocked_sleep_time\n"
sleep $blocked_sleep_time
output "removing firewall rule\n"
iptables -D OUTPUT -p tcp -m tcp --dport 2049 -j REJECT --reject-with tcp-reset
output "firewall rule removed\n"
output "sleeping $test_sleep_time1\n"
sleep $test_sleep_time1
output "checking response\n"
timeout -k 10 10 /bin/ls $nfsmp$exports_dir/dir/file >/dev/null 2>&1
res=$?
if [[ $res -eq 0 ]] ; then
output "OK on `uname -r`: ls appears okay\n"
else
output "WARNING on `uname -r`: ls appears hung\n"
fi
output "killing dd loop\n"
screen -S dd_loop -X kill
output "sleeping $test_sleep_time2\n"
sleep $test_sleep_time2
output "killing pcap\n"
kill $pcap_pid
output "stopping rpcdebug\n"
rpcdebug -m rpc -c all
rpcdebug -m nfs -c all
output "checking umount\n"
timeout -k 10 10 umount $nfsmp >/dev/null 2>&1
res=$?
if [[ $res -eq 0 ]] ; then
output "TEST PASS on `uname -r`: umount ok\n"
else
output "TEST FAIL on `uname -r`: umount hung\n"
fi
output "test complete\n"
next reply other threads:[~2018-12-12 13:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-12 13:51 Dave Wysochanski [this message]
2018-12-12 13:51 ` [PATCH 1/1] SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST Dave Wysochanski
2018-12-12 16:56 ` Trond Myklebust
2018-12-12 17:47 ` Dave Wysochanski
2018-12-12 18:02 ` Trond Myklebust
2018-12-12 19:56 ` Dave Wysochanski
2019-01-08 12:46 ` Benjamin Coddington
2018-12-14 13:48 ` Dave Wysochanski
2018-12-14 18:29 ` Scott Mayhew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181212135157.4489-1-dwysocha@redhat.com \
--to=dwysocha@redhat.com \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.