Question about nfs in infiniband environment

* Question about nfs in infiniband environment
@ 2018-08-28  7:45 Volker Lieder
  2018-08-28 12:37 ` Volker Lieder
  0 siblings, 1 reply; 11+ messages in thread
From: Volker Lieder @ 2018-08-28  7:45 UTC (permalink / raw)
  To: linux-nfs

Hi list,

we have a setup with round about 15 centos 7.5 server.

All are connected via infiniband 56Gbit and installed with new mellanox driver.
One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf with round about 500TB data.

The server exports 4-6 mounts to each client.

Since we added 3 further nodes to the setup, we recieve following messages:

On nfs-server:
[Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when sending 630392 bytes - shutting down socket
[Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when sending 524396 bytes - shutting down socket
[Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when sending 308 bytes - shutting down socket
[Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when sending 172 bytes - shutting down socket
[Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when sending 164 bytes - shutting down socket
[Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when sending 1048684 bytes - shutting down socket
[Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 244 bytes - shutting down socket
[Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when sending 1048684 bytes - shutting down socket

on nfs-clients:
[229903.273435] nfs: server 172.16.55.221 not responding, still trying
[229903.523455] nfs: server 172.16.55.221 OK
[229939.080276] nfs: server 172.16.55.221 OK
[236527.473064] perf: interrupt took too long (6226 > 6217), lowering kernel.perf_event_max_sample_rate to 32000
[248874.777322] RPC: Could not send backchannel reply error: -105
[249484.823793] RPC: Could not send backchannel reply error: -105
[250382.497448] RPC: Could not send backchannel reply error: -105
[250671.054112] RPC: Could not send backchannel reply error: -105
[251284.622707] RPC: Could not send backchannel reply error: -105

Also file requests or "df -h" ended sometimes in a stale nfs status whcih will be good after a minute.

I googled all messages and tried different things without success.
We are now going on to upgrade cpu power on nfs server. 

Do you also have any hints or points i can look for?

Best regards,
Volker

^ permalink raw reply	[flat|nested] 11+ messages in thread