All of lore.kernel.org
 help / color / mirror / Atom feed
* copy_file_range() infinitely hangs on NFSv4.2 over RDMA
@ 2021-02-14  3:31 Timo Rothenpieler
  2021-02-16 20:12 ` Olga Kornievskaia
  0 siblings, 1 reply; 20+ messages in thread
From: Timo Rothenpieler @ 2021-02-14  3:31 UTC (permalink / raw)
  To: linux-rdma, Linux NFS Mailing List

On our Fileserver, running a few weeks old 5.10, we are running into a 
weird issue with NFS 4.2 Server-Side Copy and RDMA (and ZFS, though I'm 
not sure how relevant that is to the issue).
The servers are connected via InfiniBand, on a Mellanox ConnectX-4 card, 
using the mlx5 driver.

Anything using the copy_file_range() syscall to copy stuff just hangs.
In strace, the syscall never returns.

Simple way to reproduce on the client:
 > xfs_io -fc "pwrite 0 1M" testfile
 > xfs_io -fc "copy_range testfile" testfile.copy

The second call just never exits. It sits in S+ state, with no CPU 
usage, and can easily be killed via Ctrl+C.
I let it sit for a couple hours as well, it does not seem to ever complete.

Some more observations about it:

If I do a fresh reboot of the client, the operation works fine for a 
short while (like, 10~15 minutes). No load is on the system during that 
time, it's effectively idle.

The operation actually does successfully copy all data. The size and 
checksum of the target file is as expected. It just never returns.

This only happens when mounting via RDMA. Mounting the same NFS share 
via plain TCP has the operation work reliably.

Had this issue with Kernel 5.4 already, and had hoped that 5.10 might 
have fixed it, but unfortunately it didn't.

I tried two server and 30 different client machines, they all exhibit 
the exact same behaviour. So I'd carefully rule out a hardware issue.


Any pointers on how to debug or maybe even fix this?



Thanks,
Timo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-02-21 17:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-14  3:31 copy_file_range() infinitely hangs on NFSv4.2 over RDMA Timo Rothenpieler
2021-02-16 20:12 ` Olga Kornievskaia
2021-02-16 20:37   ` Timo Rothenpieler
2021-02-16 22:27     ` Timo Rothenpieler
2021-02-17 22:37       ` Olga Kornievskaia
2021-02-18  1:12         ` Timo Rothenpieler
2021-02-18  3:52           ` Olga Kornievskaia
2021-02-18 13:28             ` Timo Rothenpieler
2021-02-18 15:55               ` Timo Rothenpieler
2021-02-18 18:30                 ` Olga Kornievskaia
2021-02-18 20:22                   ` Timo Rothenpieler
2021-02-19 17:38                     ` Olga Kornievskaia
2021-02-19 17:48                       ` Chuck Lever
2021-02-19 18:01                         ` Timo Rothenpieler
2021-02-19 18:48                           ` Chuck Lever
2021-02-19 20:37                             ` Timo Rothenpieler
2021-02-19 20:43                             ` Olga Kornievskaia
2021-02-19 20:55                               ` Chuck Lever
2021-02-20 21:03                             ` Timo Rothenpieler
2021-02-21 17:45                               ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.