Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: James Harvey <jamespharvey20@gmail.com>
To: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Cc: bcodding@redhat.com
Subject: 5.3.0 Regression: rpc.nfsd v4 uninterruptible sleep for 5+ minutes w/o rpc-statd/etc
Date: Thu, 19 Sep 2019 09:00:13 -0400
Message-ID: <CA+X5Wn60sGi+za48Lj-y1fcHHw7kdzEUsw8nj+Xc0U90mONz5w@mail.gmail.com> (raw)

For a really long time (years?) if you forced NFS v4 only, you could
mask a lot of unnecessary services.

In /etc/nfs.conf, in "[nfsd] I've been able to set "vers3=n", and then
mask the following services:
* gssproxy
* nfs-blkmap
* rpc-statd
* rpcbind (service & socket)

Upgrading from 5.2.14 to 5.3.0, nfs-server.service (rpc.nfsd) has
exactly a 5 minute delay, and sometimes longer.

nfs-utils 2.4.1 (latest).  Downgrading only linux back to 5.2.14 makes
it work again.

Unmasking the above services/socket is a temporary workaround.

I'll mention I do include "ipv6.disable=1" in my kernel arguments,
since I see in strace it attempts using IPV6 and fails right after the
hang.  Don't think that's related, but just in case.

ALL DIAGNOSTICS BELOW are with everything commented in "/etc/exports",
because it reproduces the problem as well.  They are also with
everything commented in "/etc/nfs.conf" except for in "[nfsd]"
"vers3=n".

In IRC, bcodding said he's been looking at a similar bug, but that
probably isn't exactly like mine.  Potentially having to do with
changes of handling of ENOTCON and EAGAIN in net/sunrpc/clnt.c, not
sure if that refers to mine or his.

The default nfs.conf includes "[nfsd]... # debug=0" which sort of
suggests to get debugging information to set "debug=1".  But, doing
this gave "rpc.nfsd: Invalid debug facility: 1".  I also tried
"debug=0x7FFF" from debug.h for NFSDDBG_ALL, but it doesn't like that
either.  "man rpc.nfsd" shows the --debug option.  I disabled
nfs-server.service and rebooted to manually start it with this option,
and it gave a whopping 3 lines at start, and nothing during or after
the 5 minute delay:

=====
rpc.nfsd: knfsd is currently down
rpc.nfsd: Writing version string to kernel: -2 -3
rpc.nfsd: Created AF_INET TCP socket.
=====

The first time I ran strace, it goes into uninterruptible sleep and shows:

=====
openat(AT_FDCWD, "/proc/fs/nfsd/versions", O_RDONLY) = 3
read(3, <<the massive delay is here>>"-2 -3 +4 +4.1 +4.2\n", 128)    = 19
=====

After re-running it after speaking with bcodding, strace showed:

=====
bind(4, (sa_family=AF_INET, sin_port=htons(2049),
sin_addr=inet_addr("0.0.0.0")), 16) = 0
listen(4, 64) = 0
write(3 "4\n", 2<<the massive delay is here>>) = 2
=====

Not sure why the change in the strace hang spot.

After speaking with bcodding, he gave me a list of events to set,
visible here: http://ix.io/1nTX

I ran "strace rpc.nfsd --debug &> log" (so the strace and the --debug
is mixed) which is viewable here: http://ix.io/1o2s

And, the contents of /sys/kernel/debug/tracing/trace after it finally
completed was 143MB, compared to 5.2.14's 17K!  This trace including
the first and last 1000 lines, skipping the hundreds of thousands of
lines between so it's only 325K is viewable here: http://ix.io/1o2y

This trace only goes from kernel time 990.504680 to 995.938377.  I
think it must have hit a maximum MB or number of lines, because it
didn't come out of that for for 5 minutes.



He also asked to see the trace of it working (which I hope meant on
5.2.14, rather than with services unmasked.)

The 5.2.14 working strace with --debug is viewable here: http://ix.io/1o8V

And the 5.2.14 working trace is viewable here: http://ix.io/1ofY

             reply index

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 13:00 James Harvey [this message]
2019-09-19 13:17 ` James Harvey
2019-09-26 19:51   ` bfields
2019-10-01 18:21 ` Benjamin Coddington

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+X5Wn60sGi+za48Lj-y1fcHHw7kdzEUsw8nj+Xc0U90mONz5w@mail.gmail.com \
    --to=jamespharvey20@gmail.com \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git