Need help debugging NFS issues new to 4.20 kernel

* Need help debugging NFS issues new to 4.20 kernel
@ 2019-01-24 17:32 Jason L Tibbitts III
  2019-01-24 19:28 ` Jason L Tibbitts III
  2019-01-24 19:58 ` Trond Myklebust
  0 siblings, 2 replies; 25+ messages in thread
From: Jason L Tibbitts III @ 2019-01-24 17:32 UTC (permalink / raw)
  To: linux-nfs

I could use some help figuring out the cause of some serious NFS client
issues I'm having with the 4.20.3 kernel which I did not see under
4.19.15.

I have a network of about 130 desktops (plus a bunch of other machines,
VMs and the like) running Fedora 29 connecting to six NFS servers
running CentOS 7.6 (with the heavily patched vendor kernel
3.10.0-957.1.3).  All machines involved are x86_64.  We use kerberized
NFS4 with generally sec=krb5i.  The exports are generally made with
"(rw,async,sec=krb5i:krb5p)".

Since I booted those clients into 4.20.3 I've started seeing processes
getting stuck in the D state.  The system itself will seem OK (except
for the high load average) as long as I don't touch the hung NFS mount.
Nothing was logged to dmesg or to the journal.  So far booting back into
the 4.19.15 kernel has cleared up the problem.  I cannot yet reproduce
this on demand; I've tried but it is probably related to some specific
usage pattern.

Has anyone else seen issues like this?  Can anyone help me to get more
useful information that might point to the problem?  I still haven't
learned how to debug NFS issues properly.  And if there's a stress test
tool I could easily run that might help to reproduce the issue, I'd be
happy to run it.

I note that 4.20.4 is out; I see one sunrpc fix which I guess could be
related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems
involved have plenty of free memory so I doubt that's it.  I'll
certainly try it anyway.

Various package versions:
kernel-4.20.3-200.fc29.x86_64 (the problematic kernel)
kernel-4.19.15-300.fc29.x86_64 (the functional kernel)
nfs-utils-2.3.3-1.rc2.fc29.x86_64
gssproxy-0.8.0-6.fc29.x86_64
krb5-libs-1.16.1-25.fc29.i686

Thanks in advance for any help or advice,

 - J<

^ permalink raw reply	[flat|nested] 25+ messages in thread