linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 214523] New: RDMA Mellanox RoCE drivers are unresponsive to ARP updates during a reconnect
@ 2021-09-24 15:34 bugzilla-daemon
  2021-09-26  8:02 ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: bugzilla-daemon @ 2021-09-24 15:34 UTC (permalink / raw)
  To: linux-rdma

https://bugzilla.kernel.org/show_bug.cgi?id=214523

            Bug ID: 214523
           Summary: RDMA Mellanox RoCE drivers are unresponsive to ARP
                    updates during a reconnect
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.14
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Infiniband/RDMA
          Assignee: drivers_infiniband-rdma@kernel-bugs.osdl.org
          Reporter: kolga@netapp.com
        Regression: No

RoCE RDMA connection uses CMA protocol to establish an RDMA connection. During
the setup the code uses hard coded timeout/retry values. These values are used
for when Connect Request is not being answered to to re-try the request. During
the re-try attempts the ARP updates of the destination server are ignored.
Current timeout values lead to 4+minutes long attempt at connecting to a server
that no longer owns the IP since the ARP update happens. 

The ask is to make the timeout/retry values configurable via procfs or sysfs.
This will allow for environments that use RoCE to reduce the timeouts to a more
reasonable values and be able to react to the ARP updates faster. Other CMA
users (eg IB or others) can continue to use existing values.

The problem exist in all kernel versions but bugzilla is filed for 5.14 kernel.

The use case is (RoCE-based) NFSoRDMA where a server went down and another
server was brought up in its place. RDMA layer introduces 4+ minutes in being
able to re-establish an RDMA connection and let IO resume, due to inability to
react to the ARP update.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-15  6:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24 15:34 [Bug 214523] New: RDMA Mellanox RoCE drivers are unresponsive to ARP updates during a reconnect bugzilla-daemon
2021-09-26  8:02 ` Leon Romanovsky
2021-09-26 17:36   ` Chuck Lever III
2021-09-27 12:09     ` Leon Romanovsky
2021-09-27 12:24       ` Jason Gunthorpe
2021-09-27 12:55         ` Mark Zhang
2021-09-27 13:10           ` Jason Gunthorpe
2021-09-27 13:32             ` Haakon Bugge
2021-10-15  6:35               ` Mark Zhang
2021-09-27 16:14       ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).