All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCHv3 dlm/next 00/20] fs: dlm: introduce dlm re-transmission layer
@ 2021-01-04 21:00 Alexander Aring
  2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 01/20] fs: dlm: set connected bit after accept Alexander Aring
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Alexander Aring @ 2021-01-04 21:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

this is the final patch-series to make dlm reliable when re-connection
occurs. You can easily generate a couple of re-connections by running:

tcpkill -9 -i $IFACE port 21064

on your own to test these patches. At some time dlm will detect message
drops and will re-transmit messages if necessary. It introduces a new dlm
protocol behaviour and increases the dlm protocol version. I tested it
with SCTP as well and tried to be backwards compatible with dlm protocol
version 3.1. However I don't recommend at all to mix these versions
in a setup since dlm version 3.2 fixes long-term issues.

- Alex

changes since v3:
 - make dlm messages to 8 byte boundary size (more pads), because there
   exists uint64_t fields and we should prepared for future 8 byte fields.
   This will make it directly aligned to 4 and 2 as well.
 - change unaligned memory access handling. I will not fix it yet. It
   seems nobody is using dlm on an architecture which cannot handle
   unaligned memory access at all (panics). However I added a note that
   this is a known problem. There is a slightly performance improvement
   (depends on many things e.g. if another message gets allocated after a
   (len % 8) != 0 message length got allocated). However I saw that such
   cases are rarely (for now some user space messages only) occur.

   The receiving side is not the problem here, the sending side is it
   and we run in a unaligned memory access in dlm messages fields there
   as well. However, fixing sending side will fix the receiving side and
   more length checks can be applied then to drop invalid message
   lengths.
 - be sure to remove node from hash at first at close call

   I am a little bit worried about the midcomms/lowcomms close call and
   the timer is running at exactly this time and maybe begins to
   re-transmit messages. I thought about to stop/start the timer but now
   I ended up to remove the node from the hash at first and be sure that
   no readers are left when calling lowcomms close. I think this should
   be fine because we "should" not receive any dlm messages from this
   node while close is running.

 - add patch "fs: dlm: add per node receive flush"

   As I was worried about that the lowcomms close call flushes the receive
   work on a socket close and we already removed the node from the hash,
   I added a functionality to flush the receive work right before we remove
   the node. With this functionality we male sure we don't receive any
   messages after we removed the node from the hash.
 - add patch "fs: dlm: remove obsolete code and comment"
 - add patch "fs: dlm: check for invalid namelen"

changes since v2:
 - add patch "fs: dlm: set connected bit after accept"
 - add patch "fs: dlm: set subclass for othercon sock_mutex"
 - change title "fs: dlm: public utils header utils" to
   "fs: dlm: public header in out utility"
 - squash "fs: dlm: add check for minimum allocation length" into
   "fs: dlm: remove unaligned memory access handling"
 - make the midcomms timeout a little bit longer, because I saw
   sometimes it's not enough (I hope that was the reason)
 - midcomms: fix version mismatch handling
 - remove DLM_ACK in invalid sequence handling
 - add additional length check in dlm_opts_check_msglen()
 - use optlen to skip DLM_OPTS header
 - add DLM_MSGLEN_IS_NOT_ALIGNED to check if msglen is proper
   aligned before parsing
 - change dlm_midcomms_close() to close first then cut queues,
   because lowcomms close will may flush some messages which
   need to be dropped afterwards if seq doesn't fit.
 - remove newline in "fs: dlm: add more midcomms hooks"
 - may more changes which I don't have on track.
 - change defines handling for calculating max application buffer
   size vs max allocation size
 - run aspell on my commit msgs

Alexander Aring (20):
  fs: dlm: set connected bit after accept
  fs: dlm: set subclass for othercon sock_mutex
  fs: dlm: add errno handling to check callback
  fs: dlm: add check if dlm is currently running
  fs: dlm: change allocation limits
  fs: dlm: public header in out utility
  fs: dlm: use GFP_ZERO for page buffer
  fs: dlm: simplify writequeue handling
  fs: dlm: add more midcomms hooks
  fs: dlm: make buffer handling per msg
  fs: dlm: make new buffer handling softirq ready
  fs: dlm: add functionality to re-transmit a message
  fs: dlm: move out some hash functionality
  fs: dlm: remove unaligned memory access handling
  fs: dlm: add union in dlm header for lockspace id
  fs: dlm: add per node receive flush
  fs: dlm: add reliable connection if reconnect
  fs: dlm: don't allow half transmitted messages
  fs: dlm: remove obsolete code and comment
  fs: dlm: check for invalid namelen

 fs/dlm/config.c       |   60 ++-
 fs/dlm/dlm_internal.h |   41 +-
 fs/dlm/lock.c         |   16 +-
 fs/dlm/lockspace.c    |    5 +-
 fs/dlm/lowcomms.c     |  288 ++++++++---
 fs/dlm/lowcomms.h     |   27 +-
 fs/dlm/member.c       |   16 +
 fs/dlm/member.h       |    1 +
 fs/dlm/midcomms.c     | 1112 +++++++++++++++++++++++++++++++++++++++--
 fs/dlm/midcomms.h     |   10 +
 fs/dlm/rcom.c         |   61 ++-
 fs/dlm/recoverd.c     |    3 +
 fs/dlm/user.c         |    3 +
 fs/dlm/util.c         |   10 +-
 fs/dlm/util.h         |    2 +
 15 files changed, 1473 insertions(+), 182 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-01-04 21:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 21:00 [Cluster-devel] [PATCHv3 dlm/next 00/20] fs: dlm: introduce dlm re-transmission layer Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 01/20] fs: dlm: set connected bit after accept Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 02/20] fs: dlm: set subclass for othercon sock_mutex Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 03/20] fs: dlm: add errno handling to check callback Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 04/20] fs: dlm: add check if dlm is currently running Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 05/20] fs: dlm: change allocation limits Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 06/20] fs: dlm: public header in out utility Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 07/20] fs: dlm: use GFP_ZERO for page buffer Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 08/20] fs: dlm: simplify writequeue handling Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 09/20] fs: dlm: add more midcomms hooks Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 10/20] fs: dlm: make buffer handling per msg Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 11/20] fs: dlm: make new buffer handling softirq ready Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 12/20] fs: dlm: add functionality to re-transmit a message Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 13/20] fs: dlm: move out some hash functionality Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 14/20] fs: dlm: remove unaligned memory access handling Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 15/20] fs: dlm: add union in dlm header for lockspace id Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 16/20] fs: dlm: add per node receive flush Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 17/20] fs: dlm: add reliable connection if reconnect Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 18/20] fs: dlm: don't allow half transmitted messages Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 19/20] fs: dlm: remove obsolete code and comment Alexander Aring
2021-01-04 21:00 ` [Cluster-devel] [PATCHv3 dlm/next 20/20] fs: dlm: check for invalid namelen Alexander Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.