All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCHv2 dlm/next 0/8] fs: dlm: introduce dlm re-transmission layer
@ 2021-03-10 19:17 Alexander Aring
  2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 1/8] fs: dlm: public header in out utility Alexander Aring
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Alexander Aring @ 2021-03-10 19:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

this is the final patch-series to make dlm reliable when re-connection
occurs. You can easily generate a couple of re-connections by running:

tcpkill -9 -i $IFACE port 21064

on your own to test these patches. At some time dlm will detect message
drops and will re-transmit messages if necessary. It introduces a new dlm
protocol behaviour and increases the dlm protocol version. I tested it
with SCTP as well and tried to be backwards compatible with dlm protocol
version 3.1. However I don't recommend at all to mix these versions
in a setup since dlm version 3.2 fixes long-term issues.

- Alex

changes since v2:
 - make timer handling pending only if messages are on air, the sync
   isn't quite correct there but doesn't need to be precise
 - use before() from tcp to check if seq is before other seq with
   respect of overflows
 - change srcu handling to hold srcu in all places where nodes are
   referencing - we should not get a disadvantage of holding that
   lock. We should update also lowcomms regarding to that.
 - add some WARN_ON() to check that nothing in send/recv is going
   anymore otherwise it's likely an issue.
 - add more future work regarding to fencing of nodes if over
   cluster manager timeout/bad seq happens
 - add note about missing length size check of tail payload
   (resource name length) regarding to the receive buffer
 - remove some include which isn't necessary in recoverd.c

Thanks to Paolo Abeni for his review and recommendations.

changes since patch series split:
 - fixup "fs: dlm: make new buffer handling softirq ready" into
   "fs: dlm: add functionality to re-transmit a message"
 - change hooks to work with shutdown hook
 - use DEFAULT_BUFFER_SIZE for max send/recv buffer, because
   backwards compatiblility with dlm 3.1
 - remove "fs: dlm: add per node receive flush", I think we don't
   run into issues.
 - change comments regarding why DLM_FIN is needed.
 - move midcomms_remove_member hook in dlm protocol when removing
   ls->nodes_gone list, so far I can see it's the last point
   that the lockspace has this node as a member in some datastructures.
 - make deep free of recv/send queue when midcomms_node is freed
 - add DLM_NODE_FLAG_STOP_TX flag for midcomms node to warn
   if a node tries to send at point where dlm application should
   not starting transmissions anymore. It indicates a bug.
 - change some retransmit/timeout timings, I think we need somehow to
   figure out by waiting e.g. user reports if we getting problems with
   them.

Alexander Aring (8):
  fs: dlm: public header in out utility
  fs: dlm: add more midcomms hooks
  fs: dlm: make buffer handling per msg
  fs: dlm: add functionality to re-transmit a message
  fs: dlm: move out some hash functionality
  fs: dlm: add union in dlm header for lockspace id
  fs: dlm: add reliable connection if reconnect
  fs: dlm: don't allow half transmitted messages

 fs/dlm/config.c       |    3 +-
 fs/dlm/dlm_internal.h |   35 +-
 fs/dlm/lock.c         |   14 +-
 fs/dlm/lockspace.c    |   14 +-
 fs/dlm/lowcomms.c     |  197 +++++--
 fs/dlm/lowcomms.h     |   23 +-
 fs/dlm/member.c       |   12 +-
 fs/dlm/midcomms.c     | 1296 ++++++++++++++++++++++++++++++++++++++++-
 fs/dlm/midcomms.h     |   10 +
 fs/dlm/rcom.c         |   63 +-
 fs/dlm/util.c         |   10 +-
 fs/dlm/util.h         |    2 +
 12 files changed, 1566 insertions(+), 113 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-03-22 23:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 19:17 [Cluster-devel] [PATCHv2 dlm/next 0/8] fs: dlm: introduce dlm re-transmission layer Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 1/8] fs: dlm: public header in out utility Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 2/8] fs: dlm: add more midcomms hooks Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 3/8] fs: dlm: make buffer handling per msg Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 4/8] fs: dlm: add functionality to re-transmit a message Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 5/8] fs: dlm: move out some hash functionality Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 6/8] fs: dlm: add union in dlm header for lockspace id Alexander Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 7/8] fs: dlm: add reliable connection if reconnect Alexander Aring
2021-03-16 17:33   ` Paolo Abeni
2021-03-22 23:06     ` Alexander Ahring Oder Aring
2021-03-10 19:17 ` [Cluster-devel] [PATCHv2 dlm/next 8/8] fs: dlm: don't allow half transmitted messages Alexander Aring
2021-03-11  9:08 ` [Cluster-devel] [PATCHv2 dlm/next 0/8] fs: dlm: introduce dlm re-transmission layer Paolo Abeni
2021-03-12 14:52   ` Alexander Ahring Oder Aring
2021-03-16 18:37     ` Paolo Abeni
2021-03-22 22:54       ` Alexander Ahring Oder Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.