linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Vasko <kvasko@gmail.com>
To: Peter Eriksson <pen@lysator.liu.se>, linux-nfs@vger.kernel.org
Subject: Re: NFSv4 client locks up on larger writes with Kerberos enabled
Date: Thu, 19 Sep 2019 09:19:45 -0500	[thread overview]
Message-ID: <280ADE78-D617-4A97-9B92-18DCE3063B93@gmail.com> (raw)
In-Reply-To: <E172CA50-EC89-4072-9C1D-1B825DC3FE8B@lysator.liu.se>

Unfortunately, I haven’t seen rpc.gssd crash anywhere in the logs.

Just to confirm though I stopped rpc-gssd and started it manually in the foreground with

sudo rpc.gssd -f -vvv -rrr

Unmounted the share, mounted it again. Saw a bunch of logs and authentications seemingly

I then produced a lockup. It did some authentication in the logs and did not crash. The only warning is 

WARNING: gssd_clnt_gssd_cb: failed reading request

Googling that warning doesn’t produce much information and seems unrelated.

My keys when I do klist 

This shows they expire 24h in the future for both  my user and the NFS server. Not sure how refreshing the tickets would help in this case. In addition, if it’s locked, I restart the nfs server and then the client will immediately finish that write. I can then immediately transfer a 1MB file and it will complete successfully immediately. So if it was a kerberos ticket was expiring I would assume it wouldn’t work with the small file immediately after it just failed.

I’m using sssd so it refreshes the tickets basically every time I login to the NFS client.

-Kevin

> On Sep 19, 2019, at 6:38 AM, Peter Eriksson <pen@lysator.liu.se> wrote:
> 
> Have you verified that the “rpc.gssd” daemon doesn’t crash while the write is under way? We’ve seen problems possibly related to it (or gssproxy) with Linux clients connecting to our NFS Servers (FreeBSD-based) with NFSv4 & sec=krb5. Sometimes you might not notice it crashing due to it being restarted automatically… 
> 
> When things go south for us the Linux client “locks” up and processes end up in “device wait” and are unkillable. Only way out so far has been to reboot the client. _Sometimes_ it might be related to Kerberos tickets expiring - things _seems_ to work better here when we created a hack that renews them automatically (for as long as they can).
> 
> - Peter
> 
> 
>> On 19 Sep 2019, at 01:36, Kevin Vasko <kvasko@gmail.com> wrote:
>> 
>> Hello,
>> 
>> 
>> We have a new Dell EMC Unity 300 acting as NAS Server that is
>> presenting a NFSv4 NFS Share. Our clients are mostly Ubuntu 18.04.3
>> but issue is also present on CentOS 7.6 systems. We have been
>> struggling with this issue for over a week now and not sure how to
>> resolve it.
>> 
>> 
>> 
>> We are having trouble with NFS Clients completing their writes to the
>> Dell EMC Unity 300 NFS Server when Kerberos is enabled on the NFS
>> Share. I created the NFS Share on the U300, associated it with our
>> FreeIPA (Kerberos/LDAP server) and everything shows successful.
>> 
>> 
>> 
>> I can mount the share without a problem on multiple Ubuntu/CentOS
>> machines without issues. I can ls around and move around on the share
>> without issue (seemingly). I can copy files as well and that seem to
>> work (haven’t tested it extensively as I have been heavily focused on
>> this issue). However, clients won't complete their writes to the NFS
>> share randomly once the file gets to > 2GB or so in file size.
>> _sometimes_ they do, but I would say anything > 3GB, the client locks
>> up and won’t ever return 80% of the time I try it.
>> 
>> 
>> 
>> For example, I can run dd and write ten 1MB files to the NFS share no
>> issues, all back to back to back. However, when I try to do a single
>> 3GB file, it _might_ or might not finish. If it doesn’t finish it just
>> locks up the client. Sometimes it will finish though so its random.
>> However, I would say 70-80% of the time it is reproducible with files
>>> 3GB or so.
>> 
>> 
>> 
>> Here is some information on the client and what I’m doing to reproduce
>> the issue. I have tried krb5, and krb5p with the same results.
>> Removing the requirement for krb5 in the NFS server and mounting with
>> sec=none, I don’t seem to hit the issue.
>> 
>> 
>> 
>> $uname -a
>> 
>> 5.0.0-27-generic #28~18.04.1-Ubuntu SMP Thu Aug 22 03:00:32 UTC 2019
>> x86_64 x86_64 x86_64 GNU/Linux
>> 
>> 
>> 
>> $mount
>> 
>> nfs-server.example.com:/ds1 on /data type nfs4
>> (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,clientaddr=10.75.40.254,local_lock=none,addr=10.75.37.2)
>> 
>> 
>> 
>> $ dd if=/dev/zero of=/data/test.img count=1000 bs=1000
>> 
>> 1000+0 records in
>> 
>> 1000+0 records out
>> 
>> 1000000 bytes (1.0 MB, 977 KiB) copied, 0.00613671 s, 163 MB/s
>> 
>> $ dd if=/dev/zero of=/data/test.img bs=1000M count=6
>> 
>> ---- HUNG ----
>> 
>> 
>> 
>> After it hangs it will NEVER return. The only way I've been able to
>> get the client to return is to go into the U300 Control panel and
>> change the "SP Owner:" from SPA to SPB (effectively restarting the NFS
>> server). Once that happens the client returns.
>> 
>> 
>> 
>> If I disable Kerberos on the NFS Share I have no problems writing
>> larger files it seems.
>> 
>> 
>> 
>> I did some straces of the dd and this is what I saw.
>> 
>> 
>> 
>> This was a successful write….
>> 
>> 
>> 
>> $ strace dd if=/dev/zero of=/data/test3.img bs=100M count=1
>> 
>> execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test3.img",
>> "bs=100M", "count=1"], 0x7fff8c03c7f0 /* 21 vars */) = 0
>> 
>> brk(NULL)                               = 0x559b8b39a000
>> 
>> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
>> 
>> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
>> 
>> fstat(3, {st_mode=S_IFREG|0644, st_size=93482, ...}) = 0
>> 
>> mmap(NULL, 93482, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f62b6180000
>> 
>> close(3)                                = 0
>> 
>> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
>> 
>> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
>> 832) = 832
>> 
>> fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
>> 
>> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x7f62b617e000
>> 
>> mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
>> 0) = 0x7f62b5b7f000
>> 
>> mprotect(0x7f62b5d66000, 2097152, PROT_NONE) = 0
>> 
>> mmap(0x7f62b5f66000, 24576, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f62b5f66000
>> 
>> mmap(0x7f62b5f6c000, 15072, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f62b5f6c000
>> 
>> close(3)                                = 0
>> 
>> arch_prctl(ARCH_SET_FS, 0x7f62b617f540) = 0
>> 
>> mprotect(0x7f62b5f66000, 16384, PROT_READ) = 0
>> 
>> mprotect(0x559b8978b000, 4096, PROT_READ) = 0
>> 
>> mprotect(0x7f62b6197000, 4096, PROT_READ) = 0
>> 
>> munmap(0x7f62b6180000, 93482)           = 0
>> 
>> rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
>> 
>> rt_sigaction(SIGUSR1, {sa_handler=0x559b8957e0e0, sa_mask=[INT USR1],
>> sa_flags=SA_RESTORER, sa_restorer=0x7f62b5bbdf20}, NULL, 8) = 0
>> 
>> rt_sigaction(SIGINT, {sa_handler=0x559b8957e0d0, sa_mask=[INT USR1],
>> sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
>> sa_restorer=0x7f62b5bbdf20}, NULL, 8) = 0
>> 
>> brk(NULL)                               = 0x559b8b39a000
>> 
>> brk(0x559b8b3bb000)                     = 0x559b8b3bb000
>> 
>> openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
>> 
>> fstat(3, {st_mode=S_IFREG|0644, st_size=4326016, ...}) = 0
>> 
>> mmap(NULL, 4326016, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f62b575e000
>> 
>> close(3)                                = 0
>> 
>> openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
>> 
>> dup2(3, 0)                              = 0
>> 
>> close(3)                                = 0
>> 
>> lseek(0, 0, SEEK_CUR)                   = 0
>> 
>> openat(AT_FDCWD, "/data/test3.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
>> 
>> dup2(3, 1)                              = 1
>> 
>> close(3)                                = 0
>> 
>> mmap(NULL, 104869888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>> -1, 0) = 0x7f62af35b000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 104857600) = 104857600
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 104857600) = 104857600
>> 
>> close(0)                                = 0
>> 
>> close(1)                                = 0
>> 
>> openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
>> 
>> fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
>> 
>> read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
>> 
>> read(0, "", 4096)                       = 0
>> 
>> close(0)                                = 0
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
>> -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
>> -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
>> ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY)
>> = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD,
>> "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) =
>> 0
>> 
>> fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0
>> 
>> mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f62b6196000
>> 
>> close(0)                                = 0
>> 
>> write(2, "1+0 records in\n1+0 records out\n", 311+0 records in
>> 
>> 1+0 records out
>> 
>> ) = 31
>> 
>> write(2, "104857600 bytes (105 MB, 100 MiB"..., 61104857600 bytes (105
>> MB, 100 MiB) copied, 1.01552 s, 103 MB/s) = 61
>> 
>> write(2, "\n", 1
>> 
>> )                       = 1
>> 
>> close(2)                                = 0
>> 
>> exit_group(0)                           = ?
>> 
>> +++ exited with 0 +++
>> 
>> 
>> 
>> This one was unsuccessful write and hung until I restarted the NFS server.
>> 
>> 
>> 
>> strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=6
>> 
>> execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img",
>> "bs=1000M", "count=6"], 0x7ffc8ee7e8b8 /* 30 vars */) = 0
>> 
>> brk(NULL)                               = 0x5648bd26a000
>> 
>> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
>> 
>> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
>> 
>> fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0
>> 
>> mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4f8dda1000
>> 
>> close(3)                                = 0
>> 
>> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
>> 
>> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
>> 
>> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
>> 832) = 832
>> 
>> fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
>> 
>> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x7f4f8dd9f000
>> 
>> mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
>> 0) = 0x7f4f8d7a8000
>> 
>> mprotect(0x7f4f8d98f000, 2097152, PROT_NONE) = 0
>> 
>> mmap(0x7f4f8db8f000, 24576, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f4f8db8f000
>> 
>> mmap(0x7f4f8db95000, 15072, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4f8db95000
>> 
>> close(3)                                = 0
>> 
>> arch_prctl(ARCH_SET_FS, 0x7f4f8dda0540) = 0
>> 
>> mprotect(0x7f4f8db8f000, 16384, PROT_READ) = 0
>> 
>> mprotect(0x5648bb739000, 4096, PROT_READ) = 0
>> 
>> mprotect(0x7f4f8ddc0000, 4096, PROT_READ) = 0
>> 
>> munmap(0x7f4f8dda1000, 125550)          = 0
>> 
>> rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
>> 
>> rt_sigaction(SIGUSR1, {sa_handler=0x5648bb52c0e0, sa_mask=[INT USR1],
>> sa_flags=SA_RESTORER, sa_restorer=0x7f4f8d7e6f20}, NULL, 8) = 0
>> 
>> rt_sigaction(SIGINT, {sa_handler=0x5648bb52c0d0, sa_mask=[INT USR1],
>> sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
>> sa_restorer=0x7f4f8d7e6f20}, NULL, 8) = 0
>> 
>> brk(NULL)                               = 0x5648bd26a000
>> 
>> brk(0x5648bd28b000)                     = 0x5648bd28b000
>> 
>> openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
>> 
>> fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0
>> 
>> mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4f8cc97000
>> 
>> close(3)                                = 0
>> 
>> openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
>> 
>> dup2(3, 0)                              = 0
>> 
>> close(3)                                = 0
>> 
>> lseek(0, 0, SEEK_CUR)                   = 0
>> 
>> openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
>> 
>> dup2(3, 1)                              = 1
>> 
>> close(3)                                = 0
>> 
>> mmap(NULL, 1048588288, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4f4e494000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>> 1048576000) = 1048576000
>> 
>> close(0)                                = 0
>> 
>> close(1
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> I did some packet tracing and on the smaller writes, I would see a
>> bunch of writes and replies to those writes, the client would COMMIT
>> and then NFS server would reply with an ack COMMIT, client would then
>> issue a CLOSE, and server would respond with a CLOSE. However, on
>> larger writes where it would hang it would do…client would do a bunch
>> of writes packets, NFS server would respond to the write packets,
>> client would issue a COMMIT command, server would respond with a
>> COMMIT, and then nothing else would happen. No CLOSE would seemingly
>> be sent from the client, so I don’t know if the client was waiting on
>> the server to send back some type of message saying that the system
>> could issue a CLOSE or the client was never sending a CLOSE.
>> 
>> 
>> 
>> Once it is in the locked state, if I restart the NFS server as I
>> mentioned above the close(1…will continue and finish the write and
>> exit normally.
>> 
>> 
>> 
>> I did a kernel Stack Traceback on the NFS client (the light weight
>> version of just printing out /proc/*/wchan) while the dd command was
>> stuck in the “close(1” and the process for the dd command just says.
>> 
>> 
>> 
>> Process /proc/32161/wchan
>> 
>> ep_poll
>> 
>> 
>> 
>> I also enabled the NFS debugging and while its stuck it just keep
>> repeating the bottom section.
>> 
>> 
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270750] NFS:       commit
>> (0:52/9439 4096@4609744896) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270751] NFS:       commit
>> (0:52/9439 4096@4609748992) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270754] NFS:       commit
>> (0:52/9439 4096@4609753088) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270757] NFS:       commit
>> (0:52/9439 4096@4609757184) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270759] NFS:       commit
>> (0:52/9439 4096@4609761280) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270761] NFS:       commit
>> (0:52/9439 4096@4609765376) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270763] NFS:       commit
>> (0:52/9439 4096@4609769472) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270767] NFS:       commit
>> (0:52/9439 4096@4609773568) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270769] NFS:       commit
>> (0:52/9439 4096@4609777664) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270772] NFS:       commit
>> (0:52/9439 4096@4609781760) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270773] NFS:       commit
>> (0:52/9439 4096@4609785856) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270775] NFS:       commit
>> (0:52/9439 4096@4609789952) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270777] NFS:       commit
>> (0:52/9439 4096@4609794048) OK
>> 
>> Sep 19 06:25:19 examplehost kernel: [772614.270781] NFS:       commit
>> (0:52/9439 4096@4609798144) OK
>> 
>> Sep 19 06:25:31 examplehost kernel: [772626.175927] <--
>> nfs41_proc_async_sequence status=0
>> 
>> Sep 19 06:25:31 examplehost kernel: [772626.176018] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=2907 slotid=0 max_slotid=35
>> cache_this=0
>> 
>> Sep 19 06:25:31 examplehost kernel: [772626.176572]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:25:31 examplehost kernel: [772626.176576]
>> nfs41_sequence_call_done rpc_cred 000000004b01718a
>> 
>> Sep 19 06:25:31 examplehost kernel: [772626.176577] <-- nfs41_sequence_call_done
>> 
>> Sep 19 06:25:43 examplehost kernel: [772638.207976] <--
>> nfs41_proc_async_sequence status=0
>> 
>> Sep 19 06:25:43 examplehost kernel: [772638.208074] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=2908 slotid=0 max_slotid=35
>> cache_this=0
>> 
>> Sep 19 06:25:43 examplehost kernel: [772638.208642]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:25:43 examplehost kernel: [772638.208646]
>> nfs41_sequence_call_done rpc_cred 000000004b01718a
>> 
>> Sep 19 06:25:43 examplehost kernel: [772638.208648] <-- nfs41_sequence_call_done
>> 
>> Sep 19 06:25:55 examplehost kernel: [772650.240046] <--
>> nfs41_proc_async_sequence status=0
>> 
>> Sep 19 06:25:55 examplehost kernel: [772650.240129] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=2909 slotid=0 max_slotid=35
>> cache_this=0
>> 
>> Sep 19 06:25:55 examplehost kernel: [772650.240904]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:25:55 examplehost kernel: [772650.240908]
>> nfs41_sequence_call_done rpc_cred 000000004b01718a
>> 
>> Sep 19 06:25:55 examplehost kernel: [772650.240909] <-- nfs41_sequence_call_done
>> 
>> 
>> 
>> 
>> 
>> On a small file that actually completes it looks like this…
>> 
>> 
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532059] NFS:       commit
>> (0:52/9439 4096@7409315840) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532061] NFS:       commit
>> (0:52/9439 4096@7409319936) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532064] NFS:       commit
>> (0:52/9439 4096@7409324032) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532066] NFS:       commit
>> (0:52/9439 4096@7409328128) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532069] NFS:       commit
>> (0:52/9439 4096@7409332224) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532071] NFS:       commit
>> (0:52/9439 4096@7409336320) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532073] NFS:       commit
>> (0:52/9439 4096@7409340416) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532076] NFS:       commit
>> (0:52/9439 4096@7409344512) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532078] NFS:       commit
>> (0:52/9439 4096@7409348608) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532081] NFS:       commit
>> (0:52/9439 4096@7409352704) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532083] NFS:       commit
>> (0:52/9439 4096@7409356800) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532086] NFS:       commit
>> (0:52/9439 4096@7409360896) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532088] NFS:       commit
>> (0:52/9439 4096@7409364992) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532090] NFS:       commit
>> (0:52/9439 4096@7409369088) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532093] NFS:       commit
>> (0:52/9439 4096@7409373184) OK
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532171] NFS: release(/test.img)
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532193] nfs4_close_prepare: begin!
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532197] nfs4_close_prepare: done!
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532235] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=6 slotid=0 max_slotid=0
>> cache_this=1
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532525] nfs4_close_done: begin!
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532528]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532531] nfs4_close_done:
>> done, ret = 0!
>> 
>> Sep 19 06:13:29 examplehost kernel: [771904.532555] NFS:
>> dentry_delete(/test.img, 48084c)
>> 
>> Sep 19 06:13:41 examplehost kernel: [771916.540491] <--
>> nfs41_proc_async_sequence status=0
>> 
>> Sep 19 06:13:41 examplehost kernel: [771916.540586] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=7 slotid=0 max_slotid=0
>> cache_this=0
>> 
>> Sep 19 06:13:41 examplehost kernel: [771916.541182]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:13:41 examplehost kernel: [771916.541186]
>> nfs41_sequence_call_done rpc_cred 000000004b01718a
>> 
>> Sep 19 06:13:41 examplehost kernel: [771916.541187] <-- nfs41_sequence_call_done
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.572599] <--
>> nfs41_proc_async_sequence status=0
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.572675] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=8 slotid=0 max_slotid=0
>> cache_this=0
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.572895] encode_sequence:
>> sessionid=61:1568410703:305419896:61 seqid=2 slotid=1 max_slotid=1
>> cache_this=1
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573163]
>> nfs41_sequence_process: Error 0 free the slot
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573166]
>> nfs41_sequence_call_done rpc_cred 000000004b01718a
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573167] <-- nfs41_sequence_call_done
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573236] decode_attr_type: type=00
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573238]
>> decode_attr_change: change attribute=6738121685800950864
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573240] decode_attr_size:
>> file size=8388608000
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573242] decode_attr_fsid:
>> fsid=(0x0/0x0)
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573243] decode_attr_fileid: fileid=0
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573245]
>> decode_attr_fs_locations: fs_locations done, error = 0
>> 
>> Sep 19 06:13:53 examplehost kernel: [771928.573246] decode_attr_mode:
>> file mode=00
>> 
>> 
>> 
>> As you can see the NFS: release(/test.img) is immediately after all of
>> the writes on the successful version. In the version that gets stuck
>> it never responds with that “release”.
>> 
>> 
>> 
>> It’s hard for me to do any debugging on the NFS Server sinces its more
>> of a “managed” box (e.g I don’t have root and only have access to
>> certain debugging utilities). I do have a ticket open with Dell and
>> their engineering team is looking at this but I’m trying anything I
>> can do to figure this out as soon as I can.
>> 
>> 
>> 
>> 
>> Any help on this matter would be appreciated.
> 

  parent reply	other threads:[~2019-09-19 14:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18 23:36 NFSv4 client locks up on larger writes with Kerberos enabled Kevin Vasko
     [not found] ` <E172CA50-EC89-4072-9C1D-1B825DC3FE8B@lysator.liu.se>
2019-09-19 14:19   ` Kevin Vasko [this message]
     [not found]     ` <72E82F62-C743-4783-B81E-98E6A8E35738@lysator.liu.se>
2019-09-19 14:58       ` Kevin Vasko
2019-09-25 16:48 ` J. Bruce Fields
2019-09-25 17:06   ` Chuck Lever
2019-09-25 18:44     ` Kevin Vasko
2019-09-25 18:49       ` Chuck Lever
2019-09-25 19:10         ` Kevin Vasko
2019-09-25 20:07         ` Bruce Fields
2019-09-26 15:55           ` Chuck Lever
2019-09-26 16:05             ` Bruce Fields
2019-09-26 19:55             ` Bruce Fields
2019-09-30 14:51               ` Kevin Vasko
2019-09-30 16:19                 ` Bruce Fields
2019-09-26  7:30       ` Daniel Kobras
2019-09-26 16:25 Kevin Vasko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=280ADE78-D617-4A97-9B92-18DCE3063B93@gmail.com \
    --to=kvasko@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=pen@lysator.liu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).