From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Ahring Oder Aring <aahringo@redhat.com>
Date: Mon, 12 Apr 2021 11:30:25 -0400
Subject: [Cluster-devel] [PATCHv3 dlm/next 7/8] fs: dlm: add reliable
	connection if reconnect
In-Reply-To: <20210409204443.GC30244@linux-2.home>
References: <20210326173337.44231-1-aahringo@redhat.com>
	<20210326173337.44231-8-aahringo@redhat.com>
	<20210402205351.GA24027@linux-2.home>
	<CAK-6q+hnj94xQS+QceDF3GyDR78ns61-T1UVLs7o6kJsPzT=Fw@mail.gmail.com>
	<CAK-6q+giMt8HUg5jY0msrKGazUeRnGNqC6nNPqNa2Mca8NRCuQ@mail.gmail.com>
	<20210409204443.GC30244@linux-2.home>
Message-ID: <CAK-6q+gBDO2UO78ohssyLdqRsjvkcMYw9H6v2DvDJZL-VdhpZQ@mail.gmail.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

On Fri, Apr 9, 2021 at 4:44 PM Guillaume Nault <gnault@redhat.com> wrote:
>
> On Mon, Apr 05, 2021 at 01:33:48PM -0400, Alexander Ahring Oder Aring wrote:
> > Hi,
> >
> > On Sat, Apr 3, 2021 at 11:34 AM Alexander Ahring Oder Aring
> > <aahringo@redhat.com> wrote:
> > >
> > ...
> > >
> > > > It seems to me that the only time DLM might need to retransmit data, is
> > > > when recovering from a connection failure. So why can't we just resend
> > > > unacknowledged data at reconnection time? That'd probably simplify the
> > > > code a lot (no need to maintain a retransmission timeout on TX, no need
> > > > to handle sequence numbers that are in the future on RX).
> > > >
> > >
> > > I can try to remove the timer, timeout and do the above approach to
> > > retransmit at reconnect. Then I test it again and I will report back
> > > to see if it works or why we have other problems.
> > >
> >
> > I have an implementation of this running and so far I don't see any problems.
> >
> > > > Also, couldn't we set the DLM sequence numbers in
> > > > dlm_midcomms_commit_buffer_3_2() rather than using a callback function
> > > > in dlm_lowcomms_new_buffer()?
> > > >
> > ...
> > >
> > > Yes, I looked into TCP_REPAIR at first and I agree it can be used to
> > > solve this problem. However TCP_REPAIR can be used as a part of a more
> > > generic solution, there needs to be something "additional handling"
> > > done e.g. additional socket options to let the application layer save
> > > states before receiving errors. I am also concerned how it would work
> >
> > The code [0] is what I meant above. It will call
> > tcp_write_queue_purge(); before reporting the error over error
> > queue/callback. That need to be handled differently to allow dumping
> > the actual TCP state and restore at reconnect, at least that is what I
> > have in my mind.
>
> Thanks. That's not usable as is, indeed.
> Also, by retransmitting data from the previous send-queue, we risk
> resending messages that the peer already received (for example because
> the previous connection didn't receive the latest ACKs). I guess that
> receiving the same DLM messages twice is going to confuse the peer.
> So it looks like we'll need application level sequence numbers anyway.

I agree, the new "retransmit all unacknowledged messages on reconnect"
method will filter at the receiving side the already received messages
because they have the sequence numbers, this case occurs a lot.

However I think there is still the possibility to use TCP_REPAIR here,
we need to restore states about all 3 queues, rx, tx (write,
retransmit) and sequence numbers. Window size is an optional
additional thing. On the application layer we need to be sure that we
don't drop anything if error occurs and start to transmit them after
restoring the state again. Of course both endpoints need to support it
and have been correctly configured.

- Alex