All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about nfs in infiniband environment
@ 2018-08-28  7:45 Volker Lieder
  2018-08-28 12:37 ` Volker Lieder
  0 siblings, 1 reply; 11+ messages in thread
From: Volker Lieder @ 2018-08-28  7:45 UTC (permalink / raw)
  To: linux-nfs

Hi list,

we have a setup with round about 15 centos 7.5 server.

All are connected via infiniband 56Gbit and installed with new mellanox driver.
One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.

The server exports 4-6 mounts to each client.

Since we added 3 further nodes to the setup, we recieve following messages:

On nfs-server:
[Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
[Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
[Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
[Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
[Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
[Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
[Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket

on nfs-clients:
[229903.273435] nfs: server 172.16.55.221 not responding, still trying
[229903.523455] nfs: server 172.16.55.221 OK
[229939.080276] nfs: server 172.16.55.221 OK
[236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
[248874.777322] RPC: Could not send backchannel reply error: -105
[249484.823793] RPC: Could not send backchannel reply error: -105
[250382.497448] RPC: Could not send backchannel reply error: -105
[250671.054112] RPC: Could not send backchannel reply error: -105
[251284.622707] RPC: Could not send backchannel reply error: -105

Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.

I googled all messages and tried different things without success.
We are now going on to upgrade cpu power on nfs server. 

Do you also have any hints or points i can look for?

Best regards,
Volker

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28  7:45 Question about nfs in infiniband environment Volker Lieder
@ 2018-08-28 12:37 ` Volker Lieder
  2018-08-28 15:26   ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Volker Lieder @ 2018-08-28 12:37 UTC (permalink / raw)
  To: linux-nfs

Hi,

a short update from our site.

We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.

Is there a guide what hardware requirements a fast nfs server has?

Or an information, how many nfs prozesses are needed for x nfs clients?

Best regards,
Volker

> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
> 
> Hi list,
> 
> we have a setup with round about 15 centos 7.5 server.
> 
> All are connected via infiniband 56Gbit and installed with new mellanox driver.
> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
> 
> The server exports 4-6 mounts to each client.
> 
> Since we added 3 further nodes to the setup, we recieve following messages:
> 
> On nfs-server:
> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
> 
> on nfs-clients:
> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
> [229903.523455] nfs: server 172.16.55.221 OK
> [229939.080276] nfs: server 172.16.55.221 OK
> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
> [248874.777322] RPC: Could not send backchannel reply error: -105
> [249484.823793] RPC: Could not send backchannel reply error: -105
> [250382.497448] RPC: Could not send backchannel reply error: -105
> [250671.054112] RPC: Could not send backchannel reply error: -105
> [251284.622707] RPC: Could not send backchannel reply error: -105
> 
> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
> 
> I googled all messages and tried different things without success.
> We are now going on to upgrade cpu power on nfs server. 
> 
> Do you also have any hints or points i can look for?
> 
> Best regards,
> Volker

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 12:37 ` Volker Lieder
@ 2018-08-28 15:26   ` Chuck Lever
  2018-08-28 15:31     ` Volker Lieder
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2018-08-28 15:26 UTC (permalink / raw)
  To: Volker Lieder; +Cc: Linux NFS Mailing List

Hi Volker-


> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> =
wrote:
>=20
> Hi,
>=20
> a short update from our site.
>=20
> We resized CPU and RAM on the nfs server and the performance is good =
right now and the error messages are gone.
>=20
> Is there a guide what hardware requirements a fast nfs server has?
>=20
> Or an information, how many nfs prozesses are needed for x nfs =
clients?

The nfsd thread count depends on number of clients _and_ their workload.
There isn't a hard and fast rule.

The default thread count is probably too low for your workload. You can
edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
64, and restart your NFS server.

With InfiniBand you also have the option of using NFS/RDMA. Mount with
"proto=3Drdma,port=3D20049" to try it.


> Best regards,
> Volker
>=20
>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
>>=20
>> Hi list,
>>=20
>> we have a setup with round about 15 centos 7.5 server.
>>=20
>> All are connected via infiniband 56Gbit and installed with new =
mellanox driver.
>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf =
with round about 500TB data.
>>=20
>> The server exports 4-6 mounts to each client.
>>=20
>> Since we added 3 further nodes to the setup, we recieve following =
messages:
>>=20
>> On nfs-server:
>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when =
sending 1048684 bytes - shutting down socket
>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when =
sending 1048684 bytes - shutting down socket
>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when =
sending 630392 bytes - shutting down socket
>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 524396 bytes - shutting down socket
>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 308 bytes - shutting down socket
>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 172 bytes - shutting down socket
>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 164 bytes - shutting down socket
>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when =
sending 1048684 bytes - shutting down socket
>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 244 bytes - shutting down socket
>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 1048684 bytes - shutting down socket
>>=20
>> on nfs-clients:
>> [229903.273435] nfs: server 172.16.55.221 not responding, still =
trying
>> [229903.523455] nfs: server 172.16.55.221 OK
>> [229939.080276] nfs: server 172.16.55.221 OK
>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering =
kernel.perf_event_max_sample_rate to 32000
>> [248874.777322] RPC: Could not send backchannel reply error: -105
>> [249484.823793] RPC: Could not send backchannel reply error: -105
>> [250382.497448] RPC: Could not send backchannel reply error: -105
>> [250671.054112] RPC: Could not send backchannel reply error: -105
>> [251284.622707] RPC: Could not send backchannel reply error: -105
>>=20
>> Also file requests or "df -h" ended sometimes in a stale nfs status =
whcih will be good after a minute.
>>=20
>> I googled all messages and tried different things without success.
>> We are now going on to upgrade cpu power on nfs server.=20
>>=20
>> Do you also have any hints or points i can look for?
>>=20
>> Best regards,
>> Volker
>=20

--
Chuck Lever
chucklever@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 15:26   ` Chuck Lever
@ 2018-08-28 15:31     ` Volker Lieder
  2018-08-28 15:40       ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Volker Lieder @ 2018-08-28 15:31 UTC (permalink / raw)
  To: Linux NFS Mailing List

Hi Chuck,

> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
> 
> Hi Volker-
> 
> 
>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
>> 
>> Hi,
>> 
>> a short update from our site.
>> 
>> We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.
>> 
>> Is there a guide what hardware requirements a fast nfs server has?
>> 
>> Or an information, how many nfs prozesses are needed for x nfs clients?
> 
> The nfsd thread count depends on number of clients _and_ their workload.
> There isn't a hard and fast rule.
> 
> The default thread count is probably too low for your workload. You can
> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
> 64, and restart your NFS server.

I tried this, but then the load on the "small" server was to high to serve further requests, so that was the idea to grow this up.

> 
> With InfiniBand you also have the option of using NFS/RDMA. Mount with
> "proto=rdma,port=20049" to try it.

Yes, thats true, but in the mellanox driver set they disabled nfsordma in Version 3.4.
It should work with centos driver, but we didnt tested it right now in newer setups.

One more question, since other problems seem to be solved:

What about this message?

[Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID with incorrect client ID


> 
> 
>> Best regards,
>> Volker
>> 
>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
>>> 
>>> Hi list,
>>> 
>>> we have a setup with round about 15 centos 7.5 server.
>>> 
>>> All are connected via infiniband 56Gbit and installed with new mellanox driver.
>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
>>> 
>>> The server exports 4-6 mounts to each client.
>>> 
>>> Since we added 3 further nodes to the setup, we recieve following messages:
>>> 
>>> On nfs-server:
>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
>>> 
>>> on nfs-clients:
>>> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
>>> [229903.523455] nfs: server 172.16.55.221 OK
>>> [229939.080276] nfs: server 172.16.55.221 OK
>>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
>>> [248874.777322] RPC: Could not send backchannel reply error: -105
>>> [249484.823793] RPC: Could not send backchannel reply error: -105
>>> [250382.497448] RPC: Could not send backchannel reply error: -105
>>> [250671.054112] RPC: Could not send backchannel reply error: -105
>>> [251284.622707] RPC: Could not send backchannel reply error: -105
>>> 
>>> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
>>> 
>>> I googled all messages and tried different things without success.
>>> We are now going on to upgrade cpu power on nfs server. 
>>> 
>>> Do you also have any hints or points i can look for?
>>> 
>>> Best regards,
>>> Volker
>> 
> 
> --
> Chuck Lever
> chucklever@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 15:31     ` Volker Lieder
@ 2018-08-28 15:40       ` Chuck Lever
  2018-08-28 17:00         ` Jeff Becker
  2018-08-28 19:10         ` Olga Kornievskaia
  0 siblings, 2 replies; 11+ messages in thread
From: Chuck Lever @ 2018-08-28 15:40 UTC (permalink / raw)
  To: Volker Lieder; +Cc: Linux NFS Mailing List



> On Aug 28, 2018, at 11:31 AM, Volker Lieder <v.lieder@uvensys.de> =
wrote:
>=20
> Hi Chuck,
>=20
>> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
>>=20
>> Hi Volker-
>>=20
>>=20
>>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> =
wrote:
>>>=20
>>> Hi,
>>>=20
>>> a short update from our site.
>>>=20
>>> We resized CPU and RAM on the nfs server and the performance is good =
right now and the error messages are gone.
>>>=20
>>> Is there a guide what hardware requirements a fast nfs server has?
>>>=20
>>> Or an information, how many nfs prozesses are needed for x nfs =
clients?
>>=20
>> The nfsd thread count depends on number of clients _and_ their =
workload.
>> There isn't a hard and fast rule.
>>=20
>> The default thread count is probably too low for your workload. You =
can
>> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
>> 64, and restart your NFS server.
>=20
> I tried this, but then the load on the "small" server was to high to =
serve further requests, so that was the idea to grow this up.

That rather suggests the disks are slow. A deeper performance
analysis might help.


>> With InfiniBand you also have the option of using NFS/RDMA. Mount =
with
>> "proto=3Drdma,port=3D20049" to try it.
>=20
> Yes, thats true, but in the mellanox driver set they disabled nfsordma =
in Version 3.4.

Not quite sure what you mean by "mellanox driver". Do you
mean MOFED? My impression of the stock CentOS 7.5 code is
that it is close to upstream, and you shouldn't need to
replace it except in some very special circumstances (high
end database, eg).


> It should work with centos driver, but we didnt tested it right now in =
newer setups.
>=20
> One more question, since other problems seem to be solved:
>=20
> What about this message?
>=20
> [Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID =
with incorrect client ID

Looks like an NFS bug. Someone else on the list should be able
to comment.


>>> Best regards,
>>> Volker
>>>=20
>>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
>>>>=20
>>>> Hi list,
>>>>=20
>>>> we have a setup with round about 15 centos 7.5 server.
>>>>=20
>>>> All are connected via infiniband 56Gbit and installed with new =
mellanox driver.
>>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf =
with round about 500TB data.
>>>>=20
>>>> The server exports 4-6 mounts to each client.
>>>>=20
>>>> Since we added 3 further nodes to the setup, we recieve following =
messages:
>>>>=20
>>>> On nfs-server:
>>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when =
sending 1048684 bytes - shutting down socket
>>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when =
sending 1048684 bytes - shutting down socket
>>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when =
sending 630392 bytes - shutting down socket
>>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 524396 bytes - shutting down socket
>>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 308 bytes - shutting down socket
>>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 172 bytes - shutting down socket
>>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 164 bytes - shutting down socket
>>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when =
sending 1048684 bytes - shutting down socket
>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 244 bytes - shutting down socket
>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when =
sending 1048684 bytes - shutting down socket
>>>>=20
>>>> on nfs-clients:
>>>> [229903.273435] nfs: server 172.16.55.221 not responding, still =
trying
>>>> [229903.523455] nfs: server 172.16.55.221 OK
>>>> [229939.080276] nfs: server 172.16.55.221 OK
>>>> [236527.473064] perf: interrupt took too long (6226 > 6217), =
lowering kernel.perf_event_max_sample_rate to 32000
>>>> [248874.777322] RPC: Could not send backchannel reply error: -105
>>>> [249484.823793] RPC: Could not send backchannel reply error: -105
>>>> [250382.497448] RPC: Could not send backchannel reply error: -105
>>>> [250671.054112] RPC: Could not send backchannel reply error: -105
>>>> [251284.622707] RPC: Could not send backchannel reply error: -105
>>>>=20
>>>> Also file requests or "df -h" ended sometimes in a stale nfs status =
whcih will be good after a minute.
>>>>=20
>>>> I googled all messages and tried different things without success.
>>>> We are now going on to upgrade cpu power on nfs server.=20
>>>>=20
>>>> Do you also have any hints or points i can look for?
>>>>=20
>>>> Best regards,
>>>> Volker
>>>=20
>>=20
>> --
>> Chuck Lever
>> chucklever@gmail.com

--
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 15:40       ` Chuck Lever
@ 2018-08-28 17:00         ` Jeff Becker
  2018-08-28 19:10         ` Olga Kornievskaia
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Becker @ 2018-08-28 17:00 UTC (permalink / raw)
  To: Chuck Lever, Volker Lieder; +Cc: Linux NFS Mailing List

Hi. Comment about MOFED below.

On 08/28/2018 08:40 AM, Chuck Lever wrote:
>
>> On Aug 28, 2018, at 11:31 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
>>
>> Hi Chuck,
>>
>>> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
>>>
>>> Hi Volker-
>>>
>>>
>>>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> a short update from our site.
>>>>
>>>> We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.
>>>>
>>>> Is there a guide what hardware requirements a fast nfs server has?
>>>>
>>>> Or an information, how many nfs prozesses are needed for x nfs clients?
>>> The nfsd thread count depends on number of clients _and_ their workload.
>>> There isn't a hard and fast rule.
>>>
>>> The default thread count is probably too low for your workload. You can
>>> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
>>> 64, and restart your NFS server.
>> I tried this, but then the load on the "small" server was to high to serve further requests, so that was the idea to grow this up.
> That rather suggests the disks are slow. A deeper performance
> analysis might help.
>
>
>>> With InfiniBand you also have the option of using NFS/RDMA. Mount with
>>> "proto=rdma,port=20049" to try it.
>> Yes, thats true, but in the mellanox driver set they disabled nfsordma in Version 3.4.
> Not quite sure what you mean by "mellanox driver". Do you
> mean MOFED? My impression of the stock CentOS 7.5 code is
> that it is close to upstream, and you shouldn't need to
> replace it except in some very special circumstances (high
> end database, eg).

Volker is right. Mellanox disables NFSRDMA in MOFED, because they don't 
backport it like I do for OFED, i.e, Mellanox forked off MOFED from OFED 
long time ago. As Chuck says, there probably isn't too much advantage of 
the MOFED kernel bits over the distro ones. It's probably good to use 
the subnet manager from MOFED if you can, as that is proprietary (closed 
source), and has several improvements in it. Not sure about now, but a 
few years ago, we had to switch to MOFED on our large Pleiades cluster 
since opensm (the open source version) couldn't handle the scale. Today, 
our primary reason for running MOFED is that we have a support contract 
with Mellanox.

HTH

-jeff
>
>> It should work with centos driver, but we didnt tested it right now in newer setups.
>>
>> One more question, since other problems seem to be solved:
>>
>> What about this message?
>>
>> [Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID with incorrect client ID
> Looks like an NFS bug. Someone else on the list should be able
> to comment.
>
>
>>>> Best regards,
>>>> Volker
>>>>
>>>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
>>>>>
>>>>> Hi list,
>>>>>
>>>>> we have a setup with round about 15 centos 7.5 server.
>>>>>
>>>>> All are connected via infiniband 56Gbit and installed with new mellanox driver.
>>>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
>>>>>
>>>>> The server exports 4-6 mounts to each client.
>>>>>
>>>>> Since we added 3 further nodes to the setup, we recieve following messages:
>>>>>
>>>>> On nfs-server:
>>>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
>>>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
>>>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
>>>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
>>>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
>>>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
>>>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
>>>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
>>>>>
>>>>> on nfs-clients:
>>>>> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
>>>>> [229903.523455] nfs: server 172.16.55.221 OK
>>>>> [229939.080276] nfs: server 172.16.55.221 OK
>>>>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
>>>>> [248874.777322] RPC: Could not send backchannel reply error: -105
>>>>> [249484.823793] RPC: Could not send backchannel reply error: -105
>>>>> [250382.497448] RPC: Could not send backchannel reply error: -105
>>>>> [250671.054112] RPC: Could not send backchannel reply error: -105
>>>>> [251284.622707] RPC: Could not send backchannel reply error: -105
>>>>>
>>>>> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
>>>>>
>>>>> I googled all messages and tried different things without success.
>>>>> We are now going on to upgrade cpu power on nfs server.
>>>>>
>>>>> Do you also have any hints or points i can look for?
>>>>>
>>>>> Best regards,
>>>>> Volker
>>> --
>>> Chuck Lever
>>> chucklever@gmail.com
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 15:40       ` Chuck Lever
  2018-08-28 17:00         ` Jeff Becker
@ 2018-08-28 19:10         ` Olga Kornievskaia
  2018-08-29  9:03           ` Volker Lieder
  1 sibling, 1 reply; 11+ messages in thread
From: Olga Kornievskaia @ 2018-08-28 19:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: v.lieder, linux-nfs

On Tue, Aug 28, 2018 at 11:41 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>
> > On Aug 28, 2018, at 11:31 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
> >
> > Hi Chuck,
> >
> >> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
> >>
> >> Hi Volker-
> >>
> >>
> >>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
> >>>
> >>> Hi,
> >>>
> >>> a short update from our site.
> >>>
> >>> We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.
> >>>
> >>> Is there a guide what hardware requirements a fast nfs server has?
> >>>
> >>> Or an information, how many nfs prozesses are needed for x nfs clients?
> >>
> >> The nfsd thread count depends on number of clients _and_ their workload.
> >> There isn't a hard and fast rule.
> >>
> >> The default thread count is probably too low for your workload. You can
> >> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
> >> 64, and restart your NFS server.
> >
> > I tried this, but then the load on the "small" server was to high to serve further requests, so that was the idea to grow this up.
>
> That rather suggests the disks are slow. A deeper performance
> analysis might help.
>
>
> >> With InfiniBand you also have the option of using NFS/RDMA. Mount with
> >> "proto=rdma,port=20049" to try it.
> >
> > Yes, thats true, but in the mellanox driver set they disabled nfsordma in Version 3.4.
>
> Not quite sure what you mean by "mellanox driver". Do you
> mean MOFED? My impression of the stock CentOS 7.5 code is
> that it is close to upstream, and you shouldn't need to
> replace it except in some very special circumstances (high
> end database, eg).
>
>
> > It should work with centos driver, but we didnt tested it right now in newer setups.
> >
> > One more question, since other problems seem to be solved:
> >
> > What about this message?
> >
> > [Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID with incorrect client ID
>
> Looks like an NFS bug. Someone else on the list should be able
> to comment.

I ran into this problem while testing RHEL7.5 NFSoRDMA (over
SoftRoCE). Here's a bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1518006

I was having a hard time reproducing it consistently to debug it.
Because it was really a non-error error (and it wasn't upstream), it
went on a back burner.

>
>
> >>> Best regards,
> >>> Volker
> >>>
> >>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
> >>>>
> >>>> Hi list,
> >>>>
> >>>> we have a setup with round about 15 centos 7.5 server.
> >>>>
> >>>> All are connected via infiniband 56Gbit and installed with new mellanox driver.
> >>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
> >>>>
> >>>> The server exports 4-6 mounts to each client.
> >>>>
> >>>> Since we added 3 further nodes to the setup, we recieve following messages:
> >>>>
> >>>> On nfs-server:
> >>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
> >>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
> >>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
> >>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
> >>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
> >>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
> >>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
> >>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
> >>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
> >>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
> >>>>
> >>>> on nfs-clients:
> >>>> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
> >>>> [229903.523455] nfs: server 172.16.55.221 OK
> >>>> [229939.080276] nfs: server 172.16.55.221 OK
> >>>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
> >>>> [248874.777322] RPC: Could not send backchannel reply error: -105
> >>>> [249484.823793] RPC: Could not send backchannel reply error: -105
> >>>> [250382.497448] RPC: Could not send backchannel reply error: -105
> >>>> [250671.054112] RPC: Could not send backchannel reply error: -105
> >>>> [251284.622707] RPC: Could not send backchannel reply error: -105
> >>>>
> >>>> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
> >>>>
> >>>> I googled all messages and tried different things without success.
> >>>> We are now going on to upgrade cpu power on nfs server.
> >>>>
> >>>> Do you also have any hints or points i can look for?
> >>>>
> >>>> Best regards,
> >>>> Volker
> >>>
> >>
> >> --
> >> Chuck Lever
> >> chucklever@gmail.com
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-28 19:10         ` Olga Kornievskaia
@ 2018-08-29  9:03           ` Volker Lieder
  2018-08-29 14:01             ` Olga Kornievskaia
  2018-09-05 21:26             ` J. Bruce Fields
  0 siblings, 2 replies; 11+ messages in thread
From: Volker Lieder @ 2018-08-29  9:03 UTC (permalink / raw)
  To: linux-nfs

Hi Olga,

i dont have a redhat account.

Can you, if helpful, paste the result right here?

Regards
Volker

> Am 28.08.2018 um 21:10 schrieb Olga Kornievskaia <aglo@umich.edu>:
> 
> On Tue, Aug 28, 2018 at 11:41 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Aug 28, 2018, at 11:31 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
>>> 
>>> Hi Chuck,
>>> 
>>>> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
>>>> 
>>>> Hi Volker-
>>>> 
>>>> 
>>>>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> a short update from our site.
>>>>> 
>>>>> We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.
>>>>> 
>>>>> Is there a guide what hardware requirements a fast nfs server has?
>>>>> 
>>>>> Or an information, how many nfs prozesses are needed for x nfs clients?
>>>> 
>>>> The nfsd thread count depends on number of clients _and_ their workload.
>>>> There isn't a hard and fast rule.
>>>> 
>>>> The default thread count is probably too low for your workload. You can
>>>> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
>>>> 64, and restart your NFS server.
>>> 
>>> I tried this, but then the load on the "small" server was to high to serve further requests, so that was the idea to grow this up.
>> 
>> That rather suggests the disks are slow. A deeper performance
>> analysis might help.
>> 
>> 
>>>> With InfiniBand you also have the option of using NFS/RDMA. Mount with
>>>> "proto=rdma,port=20049" to try it.
>>> 
>>> Yes, thats true, but in the mellanox driver set they disabled nfsordma in Version 3.4.
>> 
>> Not quite sure what you mean by "mellanox driver". Do you
>> mean MOFED? My impression of the stock CentOS 7.5 code is
>> that it is close to upstream, and you shouldn't need to
>> replace it except in some very special circumstances (high
>> end database, eg).
>> 
>> 
>>> It should work with centos driver, but we didnt tested it right now in newer setups.
>>> 
>>> One more question, since other problems seem to be solved:
>>> 
>>> What about this message?
>>> 
>>> [Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID with incorrect client ID
>> 
>> Looks like an NFS bug. Someone else on the list should be able
>> to comment.
> 
> I ran into this problem while testing RHEL7.5 NFSoRDMA (over
> SoftRoCE). Here's a bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=1518006
> 
> I was having a hard time reproducing it consistently to debug it.
> Because it was really a non-error error (and it wasn't upstream), it
> went on a back burner.
> 
>> 
>> 
>>>>> Best regards,
>>>>> Volker
>>>>> 
>>>>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
>>>>>> 
>>>>>> Hi list,
>>>>>> 
>>>>>> we have a setup with round about 15 centos 7.5 server.
>>>>>> 
>>>>>> All are connected via infiniband 56Gbit and installed with new mellanox driver.
>>>>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
>>>>>> 
>>>>>> The server exports 4-6 mounts to each client.
>>>>>> 
>>>>>> Since we added 3 further nodes to the setup, we recieve following messages:
>>>>>> 
>>>>>> On nfs-server:
>>>>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
>>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
>>>>>> 
>>>>>> on nfs-clients:
>>>>>> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
>>>>>> [229903.523455] nfs: server 172.16.55.221 OK
>>>>>> [229939.080276] nfs: server 172.16.55.221 OK
>>>>>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
>>>>>> [248874.777322] RPC: Could not send backchannel reply error: -105
>>>>>> [249484.823793] RPC: Could not send backchannel reply error: -105
>>>>>> [250382.497448] RPC: Could not send backchannel reply error: -105
>>>>>> [250671.054112] RPC: Could not send backchannel reply error: -105
>>>>>> [251284.622707] RPC: Could not send backchannel reply error: -105
>>>>>> 
>>>>>> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
>>>>>> 
>>>>>> I googled all messages and tried different things without success.
>>>>>> We are now going on to upgrade cpu power on nfs server.
>>>>>> 
>>>>>> Do you also have any hints or points i can look for?
>>>>>> 
>>>>>> Best regards,
>>>>>> Volker
>>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>>>> chucklever@gmail.com
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-29  9:03           ` Volker Lieder
@ 2018-08-29 14:01             ` Olga Kornievskaia
  2018-09-05 21:26             ` J. Bruce Fields
  1 sibling, 0 replies; 11+ messages in thread
From: Olga Kornievskaia @ 2018-08-29 14:01 UTC (permalink / raw)
  To: v.lieder; +Cc: linux-nfs

Hi Volker,

The issue was trigger from the following set of events. NFSoRDMA
connection over softRoCE was experiencing problems. There was a gap of
3mins between the last operation (OPEN) in the network trace and
(because RDMA connection was re-established) the BIND_CONN_TO_SESSION
which got the BAD_SESSION error. Client's lease has expired in 3mins
so that explains the BAD_SESSION error.

Then after the client notices that the session is bad and recovers
clientid and session and starts state recovery. When client sends the
recovery of OPEN the server sends NO_GRACE so client switches from
reboot recovery to no-grace recovery which includes testing the
stateid before sending the open. That's why we are sending the old
stateid (from the old client id) and gets that error and logs that
message in var log messages.

A network trace would be needed from your environment to tell if this
is a similar situation.
On Wed, Aug 29, 2018 at 5:03 AM Volker Lieder <v.lieder@uvensys.de> wrote:
>
> Hi Olga,
>
> i dont have a redhat account.
>
> Can you, if helpful, paste the result right here?
>
> Regards
> Volker
>
> > Am 28.08.2018 um 21:10 schrieb Olga Kornievskaia <aglo@umich.edu>:
> >
> > On Tue, Aug 28, 2018 at 11:41 AM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Aug 28, 2018, at 11:31 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
> >>>
> >>> Hi Chuck,
> >>>
> >>>> Am 28.08.2018 um 17:26 schrieb Chuck Lever <chucklever@gmail.com>:
> >>>>
> >>>> Hi Volker-
> >>>>
> >>>>
> >>>>> On Aug 28, 2018, at 8:37 AM, Volker Lieder <v.lieder@uvensys.de> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> a short update from our site.
> >>>>>
> >>>>> We resized CPU and RAM on the nfs server and the performance is good right now and the error messages are gone.
> >>>>>
> >>>>> Is there a guide what hardware requirements a fast nfs server has?
> >>>>>
> >>>>> Or an information, how many nfs prozesses are needed for x nfs clients?
> >>>>
> >>>> The nfsd thread count depends on number of clients _and_ their workload.
> >>>> There isn't a hard and fast rule.
> >>>>
> >>>> The default thread count is probably too low for your workload. You can
> >>>> edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say,
> >>>> 64, and restart your NFS server.
> >>>
> >>> I tried this, but then the load on the "small" server was to high to serve further requests, so that was the idea to grow this up.
> >>
> >> That rather suggests the disks are slow. A deeper performance
> >> analysis might help.
> >>
> >>
> >>>> With InfiniBand you also have the option of using NFS/RDMA. Mount with
> >>>> "proto=rdma,port=20049" to try it.
> >>>
> >>> Yes, thats true, but in the mellanox driver set they disabled nfsordma in Version 3.4.
> >>
> >> Not quite sure what you mean by "mellanox driver". Do you
> >> mean MOFED? My impression of the stock CentOS 7.5 code is
> >> that it is close to upstream, and you shouldn't need to
> >> replace it except in some very special circumstances (high
> >> end database, eg).
> >>
> >>
> >>> It should work with centos driver, but we didnt tested it right now in newer setups.
> >>>
> >>> One more question, since other problems seem to be solved:
> >>>
> >>> What about this message?
> >>>
> >>> [Tue Aug 28 15:10:44 2018] NFSD: client 172.16.YY.XXX testing state ID with incorrect client ID
> >>
> >> Looks like an NFS bug. Someone else on the list should be able
> >> to comment.
> >
> > I ran into this problem while testing RHEL7.5 NFSoRDMA (over
> > SoftRoCE). Here's a bugzilla
> > https://bugzilla.redhat.com/show_bug.cgi?id=1518006
> >
> > I was having a hard time reproducing it consistently to debug it.
> > Because it was really a non-error error (and it wasn't upstream), it
> > went on a back burner.
> >
> >>
> >>
> >>>>> Best regards,
> >>>>> Volker
> >>>>>
> >>>>>> Am 28.08.2018 um 09:45 schrieb Volker Lieder <v.lieder@uvensys.de>:
> >>>>>>
> >>>>>> Hi list,
> >>>>>>
> >>>>>> we have a setup with round about 15 centos 7.5 server.
> >>>>>>
> >>>>>> All are connected via infiniband 56Gbit and installed with new mellanox driver.
> >>>>>> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.
> >>>>>>
> >>>>>> The server exports 4-6 mounts to each client.
> >>>>>>
> >>>>>> Since we added 3 further nodes to the setup, we recieve following messages:
> >>>>>>
> >>>>>> On nfs-server:
> >>>>>> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
> >>>>>> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket
> >>>>>>
> >>>>>> on nfs-clients:
> >>>>>> [229903.273435] nfs: server 172.16.55.221 not responding, still trying
> >>>>>> [229903.523455] nfs: server 172.16.55.221 OK
> >>>>>> [229939.080276] nfs: server 172.16.55.221 OK
> >>>>>> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
> >>>>>> [248874.777322] RPC: Could not send backchannel reply error: -105
> >>>>>> [249484.823793] RPC: Could not send backchannel reply error: -105
> >>>>>> [250382.497448] RPC: Could not send backchannel reply error: -105
> >>>>>> [250671.054112] RPC: Could not send backchannel reply error: -105
> >>>>>> [251284.622707] RPC: Could not send backchannel reply error: -105
> >>>>>>
> >>>>>> Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.
> >>>>>>
> >>>>>> I googled all messages and tried different things without success.
> >>>>>> We are now going on to upgrade cpu power on nfs server.
> >>>>>>
> >>>>>> Do you also have any hints or points i can look for?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Volker
> >>>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>>> chucklever@gmail.com
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-08-29  9:03           ` Volker Lieder
  2018-08-29 14:01             ` Olga Kornievskaia
@ 2018-09-05 21:26             ` J. Bruce Fields
  2018-09-06  6:42               ` Volker Lieder
  1 sibling, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2018-09-05 21:26 UTC (permalink / raw)
  To: Volker Lieder; +Cc: linux-nfs

On Wed, Aug 29, 2018 at 11:03:22AM +0200, Volker Lieder wrote:
> i dont have a redhat account.

You should be able to see that bug now:

	https://bugzilla.redhat.com/show_bug.cgi?id=1518006

--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about nfs in infiniband environment
  2018-09-05 21:26             ` J. Bruce Fields
@ 2018-09-06  6:42               ` Volker Lieder
  0 siblings, 0 replies; 11+ messages in thread
From: Volker Lieder @ 2018-09-06  6:42 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Am 05.09.2018 um 23:26 schrieb J. Bruce Fields <bfields@fieldses.org>:
> 
> On Wed, Aug 29, 2018 at 11:03:22AM +0200, Volker Lieder wrote:
>> i dont have a redhat account.
> 
> You should be able to see that bug now:
> 
> 	https://bugzilla.redhat.com/show_bug.cgi?id=1518006
> 
> --b.

Hi Bruce,

thank you.

Best regards,
Volker

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-09-06 11:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28  7:45 Question about nfs in infiniband environment Volker Lieder
2018-08-28 12:37 ` Volker Lieder
2018-08-28 15:26   ` Chuck Lever
2018-08-28 15:31     ` Volker Lieder
2018-08-28 15:40       ` Chuck Lever
2018-08-28 17:00         ` Jeff Becker
2018-08-28 19:10         ` Olga Kornievskaia
2018-08-29  9:03           ` Volker Lieder
2018-08-29 14:01             ` Olga Kornievskaia
2018-09-05 21:26             ` J. Bruce Fields
2018-09-06  6:42               ` Volker Lieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.