All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timo Rothenpieler <timo@rothenpieler.org>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: copy_file_range() infinitely hangs on NFSv4.2 over RDMA
Date: Thu, 18 Feb 2021 14:28:29 +0100	[thread overview]
Message-ID: <3f946e6b-6872-641c-8828-35ddd5c8fed0@rothenpieler.org> (raw)
In-Reply-To: <CAN-5tyFT4+kkqk6E0Jxe-vMYm7q5mHyTeq0Ht7AEYasA30ZaGw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3941 bytes --]

On 18.02.2021 04:52, Olga Kornievskaia wrote:
> On Wed, Feb 17, 2021 at 8:12 PM Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>
>> On 17.02.2021 23:37, Olga Kornievskaia wrote:
>>> On Tue, Feb 16, 2021 at 5:27 PM Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>>>
>>>> On 16.02.2021 21:37, Timo Rothenpieler wrote:
>>>>> I can't get a network (I assume just TCP/20049 is fine, and not also
>>>>> some RDMA trace?) right now, but I will once a user has finished their
>>>>> work on the machine.
>>>>
>>>> There wasn't any TCP traffic to dump on the NFSoRDMA Port, probably
>>>> because everything is handled via RDMA/IB.
>>>
>>> Yeah, I'm not sure if tcpdump can snoop on the IB traffic. I know that
>>> upstream tcpdump can snoop on RDMA mellanox card (but I only know
>>> about the Roce mode).
>>
>> I managed to get https://github.com/Mellanox/ibdump working. Attached is
>> what it records when I run the xfs_io copy_range command that gets
>> stuck(sniffer.pcap).
>> Additionally, I rebooted the client machine, and captured the traffic
>> when it does a then successful copy during the first few minutes of
>> uptime(sniffer2.pcap).
>>
>> Both those commands were run on a the same 500M file.
>>
>>>> But I recorded a trace log of rpcrdma and sunrpc observing the situation.
>>>>
>>>> To me it looks like the COPY task (task:15886@7) completes successfully?
>>>> The compressed trace.dat is attached.
>>>
>>> I'm having a hard time reproducing the problem. But I only tried
>>> "xfs", "btrfs", "ext4" (first two send a CLONE since the file system
>>> supports it), the last one exercises a copy. In all my tries your
>>
>> I can also reproduce this on a test NFS share from an ext4 filesystem.
>> Have not tested xfs yet.
>>
>>> xfs_io commands succeed. The differences between our environments are
>>> (1) ZFS vs (xfs, etc) and (2) IB vs RoCE. Question is: does any
>>> copy_file_range work over RDMA/IB. One thing to try a synchronous
>>
>> It works, on any size of file, when the client machine is freshly booted
>> (within its first 10~30 minutes of uptime).
> 
> Reboot of the client or the server machine?

Just the client. The server is in production use, so I can't freely 
reboot it without organizing a maintenance window a couple days ahead.

>>> copy: create a small file 10bytes and do a copy. Is this the case
>>> where we have copy and the callback racing, so instead do a really
>>> large copy: create a >=1GB file and do a copy. that will be an async
>>> copy but will not have a racy condition. Can you try those 2 examples
>>> for me?
>>
>> I have observed in the past, that the xfs_io copy is more likely to
>> succeed the smaller the file is, though I did not make out a definite
>> pattern.
> 
> That's because small files are done with a synchronous copy. Your
> network captures (while not fully decodable by wireshark because the
> trace lacks the connection establishment needed for the wireshark to
> parse the RDMA replies) the fact that no callback is sent from the
> server and thus the client is stuck waiting for it. So the focus
> should be on the server piece as to why it's not sending it. There are
> some error conditions on the server that if that happens, it will not
> be able to send a callback. What springs to mind looking thru the code
> is that somehow the server thinks there is no callback channel.  Can
> you turn on nfsd tracepoints please? I wonder if we can see something
> of interest there.

On the server I guess?
I'll have to figure out a way to do that while it's not in active use.
Otherwise the trace log will be enormous and contain a lot of noise from 
the general system use.

I'll report back once I got a trace log.

> The logic for determining whether the copy is sent sync or async
> depends on server's rsize, if a file smaller than 2 RPC payloads, it's
> sent synchronously.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4494 bytes --]

  reply	other threads:[~2021-02-18 16:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14  3:31 copy_file_range() infinitely hangs on NFSv4.2 over RDMA Timo Rothenpieler
2021-02-16 20:12 ` Olga Kornievskaia
2021-02-16 20:37   ` Timo Rothenpieler
2021-02-16 22:27     ` Timo Rothenpieler
2021-02-17 22:37       ` Olga Kornievskaia
2021-02-18  1:12         ` Timo Rothenpieler
2021-02-18  3:52           ` Olga Kornievskaia
2021-02-18 13:28             ` Timo Rothenpieler [this message]
2021-02-18 15:55               ` Timo Rothenpieler
2021-02-18 18:30                 ` Olga Kornievskaia
2021-02-18 20:22                   ` Timo Rothenpieler
2021-02-19 17:38                     ` Olga Kornievskaia
2021-02-19 17:48                       ` Chuck Lever
2021-02-19 18:01                         ` Timo Rothenpieler
2021-02-19 18:48                           ` Chuck Lever
2021-02-19 20:37                             ` Timo Rothenpieler
2021-02-19 20:43                             ` Olga Kornievskaia
2021-02-19 20:55                               ` Chuck Lever
2021-02-20 21:03                             ` Timo Rothenpieler
2021-02-21 17:45                               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f946e6b-6872-641c-8828-35ddd5c8fed0@rothenpieler.org \
    --to=timo@rothenpieler.org \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.