From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Abeni Date: Thu, 11 Mar 2021 10:08:59 +0100 Subject: [Cluster-devel] [PATCHv2 dlm/next 0/8] fs: dlm: introduce dlm re-transmission layer In-Reply-To: <20210310191745.80824-1-aahringo@redhat.com> References: <20210310191745.80824-1-aahringo@redhat.com> Message-ID: <7536fa5a3661675c583a448cf1bbc3f026bfea23.camel@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hello, Thank you for the new version. On Wed, 2021-03-10 at 14:17 -0500, Alexander Aring wrote: > this is the final patch-series to make dlm reliable when re-connection > occurs. You can easily generate a couple of re-connections by running: > > tcpkill -9 -i $IFACE port 21064 > > on your own to test these patches. At some time dlm will detect message > drops and will re-transmit messages if necessary. It introduces a new dlm > protocol behaviour and increases the dlm protocol version. I tested it > with SCTP as well and tried to be backwards compatible with dlm protocol > version 3.1. However I don't recommend at all to mix these versions > in a setup since dlm version 3.2 fixes long-term issues. > > - Alex > > changes since v2: > - make timer handling pending only if messages are on air, the sync > isn't quite correct there but doesn't need to be precise > - use before() from tcp to check if seq is before other seq with > respect of overflows > - change srcu handling to hold srcu in all places where nodes are > referencing - we should not get a disadvantage of holding that > lock. We should update also lowcomms regarding to that. > - add some WARN_ON() to check that nothing in send/recv is going > anymore otherwise it's likely an issue. > - add more future work regarding to fencing of nodes if over > cluster manager timeout/bad seq happens > - add note about missing length size check of tail payload > (resource name length) regarding to the receive buffer > - remove some include which isn't necessary in recoverd.c I plan/hope to go through this iteration at the very end of this week or early next one. I just noticed that some email from you targeting netdev landed in my spam folder thanks to our corporate anti-spam filter. So I possibly lost some replies from you. If you already answered the following, I'm sorry I lost that but it's not my fault! Please kindly resend the message ;) The relevant questions was/are: - is there git tree avail with all the series applied, to help with the review? - DEFAULT_BUFFER_SIZE == LOWCOMMS_MAX_TX_BUFFER_LEN in current net- next, so looks like a change below is actually a no op ?!? - Could you please add more info WRT the reference to unaligned memory access in the code comments? Which field[s] is[are] subject to that? Thanks! Paolo