On 17.02.2021 23:37, Olga Kornievskaia wrote: > On Tue, Feb 16, 2021 at 5:27 PM Timo Rothenpieler wrote: >> >> On 16.02.2021 21:37, Timo Rothenpieler wrote: >>> I can't get a network (I assume just TCP/20049 is fine, and not also >>> some RDMA trace?) right now, but I will once a user has finished their >>> work on the machine. >> >> There wasn't any TCP traffic to dump on the NFSoRDMA Port, probably >> because everything is handled via RDMA/IB. > > Yeah, I'm not sure if tcpdump can snoop on the IB traffic. I know that > upstream tcpdump can snoop on RDMA mellanox card (but I only know > about the Roce mode). I managed to get https://github.com/Mellanox/ibdump working. Attached is what it records when I run the xfs_io copy_range command that gets stuck(sniffer.pcap). Additionally, I rebooted the client machine, and captured the traffic when it does a then successful copy during the first few minutes of uptime(sniffer2.pcap). Both those commands were run on a the same 500M file. >> But I recorded a trace log of rpcrdma and sunrpc observing the situation. >> >> To me it looks like the COPY task (task:15886@7) completes successfully? >> The compressed trace.dat is attached. > > I'm having a hard time reproducing the problem. But I only tried > "xfs", "btrfs", "ext4" (first two send a CLONE since the file system > supports it), the last one exercises a copy. In all my tries your I can also reproduce this on a test NFS share from an ext4 filesystem. Have not tested xfs yet. > xfs_io commands succeed. The differences between our environments are > (1) ZFS vs (xfs, etc) and (2) IB vs RoCE. Question is: does any > copy_file_range work over RDMA/IB. One thing to try a synchronous It works, on any size of file, when the client machine is freshly booted (within its first 10~30 minutes of uptime). > copy: create a small file 10bytes and do a copy. Is this the case > where we have copy and the callback racing, so instead do a really > large copy: create a >=1GB file and do a copy. that will be an async > copy but will not have a racy condition. Can you try those 2 examples > for me? I have observed in the past, that the xfs_io copy is more likely to succeed the smaller the file is, though I did not make out a definite pattern. I did some bisecting on the number of bytes, and came up with the following: A 2097153 byte sized file gets stuck, while a 2097152(=2^21) sized one still works. It's been stable at that cutoff point for a while now, so I think that's actually the point where it starts happening, and different behaviour I saw in the past was an issue in my testing. > Not sure how useful tracepoints here are. The results of the COPY > isn't interesting as this is an async copy. The server should have > sent a CB_COMPOUND with the copy's results. The process stack tells me > that COPY is waiting for the results (waiting for the callback). So > the question is there a problem of sending a callback over RDMA/IB? Or > did the client receive it and missed it somehow? We really do need > some better tracepoints in the copy (but we don't have them > currently). > > Would you be willing to install the upstream libpcap/tcpdump to see if > it can capture RDMA/IB traffic or perhaps Chunk knows that it doesn't > work for sure? Managed to get ibdump working, as stated above.