From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guillaume Nault Date: Fri, 9 Apr 2021 23:11:07 +0200 Subject: [Cluster-devel] [PATCHv3 dlm/next 7/8] fs: dlm: add reliable connection if reconnect In-Reply-To: References: <20210326173337.44231-1-aahringo@redhat.com> <20210326173337.44231-8-aahringo@redhat.com> <20210402205351.GA24027@linux-2.home> Message-ID: <20210409211107.GD30244@linux-2.home> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, Apr 05, 2021 at 04:29:10PM -0400, Alexander Ahring Oder Aring wrote: > Hi, > > On Mon, Apr 5, 2021 at 1:33 PM Alexander Ahring Oder Aring > wrote: > > > > Hi, > > > > On Sat, Apr 3, 2021 at 11:34 AM Alexander Ahring Oder Aring > > wrote: > > > > > ... > > > > > > > It seems to me that the only time DLM might need to retransmit data, is > > > > when recovering from a connection failure. So why can't we just resend > > > > unacknowledged data at reconnection time? That'd probably simplify the > > > > code a lot (no need to maintain a retransmission timeout on TX, no need > > > > to handle sequence numbers that are in the future on RX). > > > > > > > > > > I can try to remove the timer, timeout and do the above approach to > > > retransmit at reconnect. Then I test it again and I will report back > > > to see if it works or why we have other problems. > > > > > > > I have an implementation of this running and so far I don't see any problems. > > There is a problem but it's related to the behaviour how reconnections > are triggered. The whole communication can be stuck because the send() > triggers a reconnection if not connected anymore. Before, the timer > was triggering some send() and this was triggering a reconnection in a > periodic way. Therefore we never had any stuck situation where nobody > was sending anything anymore. It's a rare case but I am currently > running into it. However I think I need to change how the > reconnections are triggered with some "forever periodic try" which > should solve this issue. Would it be sufficient to detect socket errors to avoid this problem? For example by letting lowcomms_error_report() do the reconnection when necessary? > > - Alex >