From: scar <scar@drigon.com>
To: linux-nfs@vger.kernel.org
Subject: RDMA connection lost and not re-opened
Date: Thu, 3 May 2018 13:40:09 -0700 [thread overview]
Message-ID: <pcfrv6$940$1@blaine.gmane.org> (raw)
We are using NFSoRDMA on our cluster, which is using CentOS 6.9 with
kernel 2.6.32-696.1.1.el6.x86_64. 2/10 of the clients had to be
rebooted recently. It appears due to NFS connection closed but not
reopened. For example, we will commonly see these messages:
May 2 14:46:08 n006 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 15:42:39 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 2 15:42:44 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 2 16:04:02 n006 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 16:04:02 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 2 18:46:00 n006 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 19:16:09 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 2 19:28:49 n006 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 21:14:42 n006 kernel: rpcrdma: connection to 10.10.11.10:20049
closed (-103)
May 3 11:51:13 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 3 11:56:13 n006 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 3 13:14:34 n006 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
I asked about these messages previously and they are just normal
operations. You can see the connection is usually reopened immediately
if the resource is still required, but the message at 21:14:42 was not
accompanied with a re-opening message, and this is about the time the
client hung and became unresponsive. I noticed similar messages on the
other server that had to be rebooted:
May 2 15:46:52 n001 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 16:08:39 n001 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 2 19:14:23 n001 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 2 21:14:38 n001 kernel: rpcrdma: connection to 10.10.11.10:20049
closed (-103)
May 3 11:54:58 n001 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 3 11:59:59 n001 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
May 3 12:50:57 n001 kernel: rpcrdma: connection to 10.10.11.249:2050 on
mlx4_0, memreg 5 slots 32 ird 16
May 3 12:55:58 n001 kernel: rpcrdma: connection to 10.10.11.249:2050
closed (-103)
You can see on each machine that the connection to 10.10.11.249:2050 was
re-opened when i tried to login today on May 3 but the connection to
10.10.11.10:20049 was not re-opened. Meanwhile our other clients still
have the connection to 10.10.11.10:20049 and the server at 10.10.11.10
is working fine.
Any idea why this happened and how it could possibly be resolved without
having to reboot the server and losing work?
Thanks
next reply other threads:[~2018-05-03 20:40 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-03 20:40 scar [this message]
2018-05-03 23:02 RDMA connection lost and not re-opened scar
2018-05-04 16:58 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pcfrv6$940$1@blaine.gmane.org' \
--to=scar@drigon.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.