All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Timo Rothenpieler <timo@rothenpieler.org>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Leon Romanovsky <leon@kernel.org>
Subject: Re: NFS over RDMA issues on Linux 5.4
Date: Tue, 4 Aug 2020 08:49:14 -0400	[thread overview]
Message-ID: <DAF6EFDA-5863-4887-B495-0BE3CA714209@oracle.com> (raw)
In-Reply-To: <20200804122557.GB4432@unreal>



> On Aug 4, 2020, at 8:25 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Tue, Aug 04, 2020 at 12:52:27PM +0200, Timo Rothenpieler wrote:
>> On 04.08.2020 11:36, Leon Romanovsky wrote:
>>> On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote:
>>>> Hi Timo-
>>>> 
>>>>> On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it.
>>>>> 
>>>>> However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server:
>>>>> 
>>>>>> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log
>>>>> 
>>>>> The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done.
>>>>> 
>>>>> This is on Linux 5.4.54, using nfs-utils 2.4.3.
>>>>> The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520.
>>>>> 
>>>>> Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong.
>>>>> 
>>>>> Is this an issue on my end, or did I run into a bug somewhere here?
>>>>> Any pointers, patches and solutions to test are welcome.
>>>> 
>>>> I haven't seen that failure mode here, so best I can recommend is
>>>> keep investigating. I've copied linux-rdma in case they have any
>>>> advice.
>>> 
>>> The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA.
>>> Are you running NFS over IPoIB?
>> 
>> For all I'm aware, NFS over RDMA still needs an IP and port to be targeted
>> to, so IPoIB is mandatory?
>> At least the admin guide in the kernel says so.
>> 
>> Right now I actually am running NFS over IPoIB (without RDMA), because of
>> the issue at hand. And would like to turn on RDMA for enhanced performance.
>> 
>>> From brief look on CQE error syndrome (local length error), the client sends wrong WQE.
>> 
>> Does that point at an issue in the kernel code, or something I did wrong?
>> 
>> The fstab entries for these mounts look like this:
>> 
>> 10.110.10.200:/home /home nfs4
>> rw,rdma,port=20049,noatime,async,vers=4.2,_netdev 0 0
>> 
>> Is there anything more I can investigate? I tried turning connected mode off
>> and lowering the mtu in turn, but that did not have any effect.
> 
> Chuck,
> 
> You probably know which traces Timo should enable on the client.
> The fact that NFS over (not-enahnced) IPoIB works highly reduces
> driver/FW issues.

Timo, I tend to think this is not a configuration issue.
Do you know of a known working kernel?


--
Chuck Lever




  reply	other threads:[~2020-08-04 12:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 15:05 NFS over RDMA issues on Linux 5.4 Timo Rothenpieler
2020-08-03 16:24 ` Chuck Lever
2020-08-04  9:36   ` Leon Romanovsky
2020-08-04 10:52     ` Timo Rothenpieler
2020-08-04 12:25       ` Leon Romanovsky
2020-08-04 12:49         ` Chuck Lever [this message]
2020-08-04 13:08           ` Timo Rothenpieler
2020-08-04 13:12             ` Chuck Lever
2020-08-04 13:19               ` Timo Rothenpieler
2020-08-04 13:24                 ` Chuck Lever
2020-08-04 13:40                   ` Timo Rothenpieler
2020-08-04 13:46               ` Leon Romanovsky
2020-08-04 13:53                 ` Chuck Lever
2020-08-04 15:34                   ` Chuck Lever
2020-08-04 15:39                     ` Timo Rothenpieler
2020-08-04 15:46                       ` Chuck Lever
2020-08-04 15:50                         ` Timo Rothenpieler
2020-08-04 16:07                           ` Timo Rothenpieler
2020-08-04 15:55                     ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DAF6EFDA-5863-4887-B495-0BE3CA714209@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=leon@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=timo@rothenpieler.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.