All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olga Kornievskaia <aglo@umich.edu>
To: Timo Rothenpieler <timo@rothenpieler.org>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: copy_file_range() infinitely hangs on NFSv4.2 over RDMA
Date: Tue, 16 Feb 2021 15:12:45 -0500	[thread overview]
Message-ID: <CAN-5tyE4OyNOZRXGnyONcdGsHaRAF39LSE5416Kj964m-+_C2A@mail.gmail.com> (raw)
In-Reply-To: <57f67888-160f-891c-6217-69e174d7e42b@rothenpieler.org>

Hi Timo,

Can you get a network trace? Also, you say that the copy_file_range()
(after what looks like a successful copy) never returns (and
application hangs), can you get a sysrq output of what the process's
stack (echo t > /proc/sysrq-trigger  and see what gets dumped into the
var log messages and locate your application and report what the stack
says)?

On Sat, Feb 13, 2021 at 10:41 PM Timo Rothenpieler
<timo@rothenpieler.org> wrote:
>
> On our Fileserver, running a few weeks old 5.10, we are running into a
> weird issue with NFS 4.2 Server-Side Copy and RDMA (and ZFS, though I'm
> not sure how relevant that is to the issue).
> The servers are connected via InfiniBand, on a Mellanox ConnectX-4 card,
> using the mlx5 driver.
>
> Anything using the copy_file_range() syscall to copy stuff just hangs.
> In strace, the syscall never returns.
>
> Simple way to reproduce on the client:
>  > xfs_io -fc "pwrite 0 1M" testfile
>  > xfs_io -fc "copy_range testfile" testfile.copy
>
> The second call just never exits. It sits in S+ state, with no CPU
> usage, and can easily be killed via Ctrl+C.
> I let it sit for a couple hours as well, it does not seem to ever complete.
>
> Some more observations about it:
>
> If I do a fresh reboot of the client, the operation works fine for a
> short while (like, 10~15 minutes). No load is on the system during that
> time, it's effectively idle.
>
> The operation actually does successfully copy all data. The size and
> checksum of the target file is as expected. It just never returns.
>
> This only happens when mounting via RDMA. Mounting the same NFS share
> via plain TCP has the operation work reliably.
>
> Had this issue with Kernel 5.4 already, and had hoped that 5.10 might
> have fixed it, but unfortunately it didn't.
>
> I tried two server and 30 different client machines, they all exhibit
> the exact same behaviour. So I'd carefully rule out a hardware issue.
>
>
> Any pointers on how to debug or maybe even fix this?
>
>
>
> Thanks,
> Timo

  reply	other threads:[~2021-02-16 20:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14  3:31 copy_file_range() infinitely hangs on NFSv4.2 over RDMA Timo Rothenpieler
2021-02-16 20:12 ` Olga Kornievskaia [this message]
2021-02-16 20:37   ` Timo Rothenpieler
2021-02-16 22:27     ` Timo Rothenpieler
2021-02-17 22:37       ` Olga Kornievskaia
2021-02-18  1:12         ` Timo Rothenpieler
2021-02-18  3:52           ` Olga Kornievskaia
2021-02-18 13:28             ` Timo Rothenpieler
2021-02-18 15:55               ` Timo Rothenpieler
2021-02-18 18:30                 ` Olga Kornievskaia
2021-02-18 20:22                   ` Timo Rothenpieler
2021-02-19 17:38                     ` Olga Kornievskaia
2021-02-19 17:48                       ` Chuck Lever
2021-02-19 18:01                         ` Timo Rothenpieler
2021-02-19 18:48                           ` Chuck Lever
2021-02-19 20:37                             ` Timo Rothenpieler
2021-02-19 20:43                             ` Olga Kornievskaia
2021-02-19 20:55                               ` Chuck Lever
2021-02-20 21:03                             ` Timo Rothenpieler
2021-02-21 17:45                               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyE4OyNOZRXGnyONcdGsHaRAF39LSE5416Kj964m-+_C2A@mail.gmail.com \
    --to=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=timo@rothenpieler.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.