[Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes
@ 2020-07-23 14:49 Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 1/4] fs: dlm: don't close socket on invalid message Alexander Aring
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Alexander Aring @ 2020-07-23 14:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

this patch series will do some changes according to the receive handling
in dlm. My goal is to survive tcpkill [0] which is my main testcase while
doing some heavy load on dlm. The code is still in a situation where dlm
can end in a deadlock state while e.g. using gfs2. For generating a lot
of dlm traffic I use gfs2 and running make_panic [1].

I mainly just run make_panic on two nodes and let run:

tcpkill -9 -i $IFACE port 21064

on both nodes. The first two patches changes the behaviour how to react
on invalid dlm messages (which does not occur on my test case). The third
patch increase in my test scenario the stability of dlm connection and
running tcpkill. The last patch will ensure that the other end agrees
to close the socket if we want to do a clean tcp termination.

FUTURE WORK:

I still get deadlocks, as PATCH 3/4 shows that I think the TCP stack
will drop some packets when receiving a tcp reset (and the application
layer think it's successful send). This is one of my future work to
solve this behaviour. I also think about to extend the test case to e.g.
umount/mount gfs2 while my testcase is running and doing randomly
drops/delaying/reordering of dlm tcp by using the introduced mark
functionality and traffic control.

- Alex

[0] https://salsa.debian.org/pkg-security-team/dsniff
[1] https://fedorapeople.org/cgit/teigland/public_git/dct-stuff.git/tree/fs/make_panic

Alexander Aring (4):
  fs: dlm: don't close socket on invalid message
  fs: dlm: fix report error of invalid messages
  fs: dlm: change handling of reconnects
  fs: dlm: implement tcp graceful shutdown

 fs/dlm/lowcomms.c | 108 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 84 insertions(+), 24 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm-next 1/4] fs: dlm: don't close socket on invalid message
  2020-07-23 14:49 [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes Alexander Aring
@ 2020-07-23 14:49 ` Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages Alexander Aring
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2020-07-23 14:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch doesn't close sockets when there is an invalid dlm message
received. The connection will probably reconnect anyway so. To not
close the connection will reduce the number of possible failtures.
As we don't have a different strategy to react on such scenario
just keep going the connection and ignore the message.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3fa1b93dbbc7e..ec7ed228a9843 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -690,8 +690,6 @@ static int receive_from_sock(struct connection *con)
 			  page_address(con->rx_page), con->cb.base,
 			  con->cb.len, r);
 	}
-	if (ret < 0)
-		goto out_close;
 	cbuf_eat(&con->cb, ret);
 
 	if (cbuf_empty(&con->cb) && !call_again_soon) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages
  2020-07-23 14:49 [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 1/4] fs: dlm: don't close socket on invalid message Alexander Aring
@ 2020-07-23 14:49 ` Alexander Aring
  2020-07-24 14:04   ` Alexander Ahring Oder Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 3/4] fs: dlm: change handling of reconnects Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 4/4] fs: dlm: implement tcp graceful shutdown Alexander Aring
  3 siblings, 1 reply; 6+ messages in thread
From: Alexander Aring @ 2020-07-23 14:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch fix the error reporting of invalid messages, the return value
of -EBADMSG is never returned by dlm_process_incoming_buffer(), so we
just check for negative return values.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index ec7ed228a9843..19b50d69babef 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -685,9 +685,9 @@ static int receive_from_sock(struct connection *con)
 					  page_address(con->rx_page),
 					  con->cb.base, con->cb.len,
 					  PAGE_SIZE);
-	if (ret == -EBADMSG) {
-		log_print("lowcomms: addr=%p, base=%u, len=%u, read=%d",
-			  page_address(con->rx_page), con->cb.base,
+	if (ret < 0) {
+		log_print("lowcomms err %d: addr=%p, base=%u, len=%u, read=%d",
+			  ret, page_address(con->rx_page), con->cb.base,
 			  con->cb.len, r);
 	}
 	cbuf_eat(&con->cb, ret);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm-next 3/4] fs: dlm: change handling of reconnects
  2020-07-23 14:49 [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 1/4] fs: dlm: don't close socket on invalid message Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages Alexander Aring
@ 2020-07-23 14:49 ` Alexander Aring
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 4/4] fs: dlm: implement tcp graceful shutdown Alexander Aring
  3 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2020-07-23 14:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes the handling of reconnects. At first we only close
the connection related to the communication failure. If we get a new
connection for an already existing connection we close the existing
connection and take the new one.

This patch improves significantly the stability of tcp connections while
running "tcpkill -9 -i $IFACE port 21064" while generating a lot of dlm
messages e.g. on a gfs2 mount and [0]. My test setup shows that a
deadlock is "more" unlikely. Before this patch I wasn't able to get
not a deadlock after 5 seconds. After this patch my observation is
that it's more likely to survive after 5 seconds and more, but still a
deadlock occurs after certain time. My guess is that there are still
"segments" inside the tcp writequeue or retransmit queue which get dropped
when receiving a tcp reset [1]. Hard to reproduce because the right message
need to be inside these queues, which might even be in the 5 first seconds
with this patch.

[0] https://fedorapeople.org/cgit/teigland/public_git/dct-stuff.git/tree/fs/make_panic
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_input.c?h=v5.8-rc6#n4122

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 19b50d69babef..f5ac40db5e7c1 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -711,7 +711,7 @@ static int receive_from_sock(struct connection *con)
 out_close:
 	mutex_unlock(&con->sock_mutex);
 	if (ret != -EAGAIN) {
-		close_connection(con, true, true, false);
+		close_connection(con, false, true, false);
 		/* Reconnect when there is something to send */
 	}
 	/* Don't return success if we really got EOF */
@@ -802,21 +802,16 @@ static int accept_from_sock(struct connection *con)
 			INIT_WORK(&othercon->swork, process_send_sockets);
 			INIT_WORK(&othercon->rwork, process_recv_sockets);
 			set_bit(CF_IS_OTHERCON, &othercon->flags);
+		} else {
+			/* close other sock con if we have something new */
+			close_connection(othercon, false, true, false);
 		}
+
 		mutex_lock_nested(&othercon->sock_mutex, 2);
-		if (!othercon->sock) {
-			newcon->othercon = othercon;
-			add_sock(newsock, othercon);
-			addcon = othercon;
-			mutex_unlock(&othercon->sock_mutex);
-		}
-		else {
-			printk("Extra connection from node %d attempted\n", nodeid);
-			result = -EAGAIN;
-			mutex_unlock(&othercon->sock_mutex);
-			mutex_unlock(&newcon->sock_mutex);
-			goto accept_err;
-		}
+		newcon->othercon = othercon;
+		add_sock(newsock, othercon);
+		addcon = othercon;
+		mutex_unlock(&othercon->sock_mutex);
 	}
 	else {
 		newcon->rx_action = receive_from_sock;
@@ -1413,7 +1408,7 @@ static void send_to_sock(struct connection *con)
 
 send_error:
 	mutex_unlock(&con->sock_mutex);
-	close_connection(con, true, false, true);
+	close_connection(con, false, false, true);
 	/* Requeue the send work. When the work daemon runs again, it will try
 	   a new connection, then call this function again. */
 	queue_work(send_workqueue, &con->swork);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm-next 4/4] fs: dlm: implement tcp graceful shutdown
  2020-07-23 14:49 [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes Alexander Aring
                   ` (2 preceding siblings ...)
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 3/4] fs: dlm: change handling of reconnects Alexander Aring
@ 2020-07-23 14:49 ` Alexander Aring
  3 siblings, 0 replies; 6+ messages in thread
From: Alexander Aring @ 2020-07-23 14:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

During my code inspection I saw there is no implementation of a graceful
shutdown for tcp. This patch will introduce a graceful shutdown for tcp
connections. The shutdown is implemented synchronized as
dlm_lowcomms_stop() is called to end all dlm communication. After shutdown
is done, a lot of flush and closing functionality will be called. However
I don't see a problem with that.

The waitqueue for synchronize the shutdown has a timeout of 10 seconds, if
timeout a force close will be exectued.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lowcomms.c | 77 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 72 insertions(+), 5 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index f5ac40db5e7c1..359d5e003f749 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -63,6 +63,7 @@
 
 /* Number of messages to send before rescheduling */
 #define MAX_SEND_MSG_COUNT 25
+#define DLM_SHUTDOWN_WAIT_TIMEOUT msecs_to_jiffies(10000)
 
 struct cbuf {
 	unsigned int base;
@@ -110,10 +111,12 @@ struct connection {
 #define CF_CLOSE 6
 #define CF_APP_LIMITED 7
 #define CF_CLOSING 8
+#define CF_SHUTDOWN 9
 	struct list_head writequeue;  /* List of outgoing writequeue_entries */
 	spinlock_t writequeue_lock;
 	int (*rx_action) (struct connection *);	/* What to do when active */
 	void (*connect_action) (struct connection *);	/* What to do to connect */
+	void (*shutdown_action)(struct connection *con); /* What to do to shutdown */
 	struct page *rx_page;
 	struct cbuf cb;
 	int retries;
@@ -122,6 +125,7 @@ struct connection {
 	struct connection *othercon;
 	struct work_struct rwork; /* Receive workqueue */
 	struct work_struct swork; /* Send workqueue */
+	wait_queue_head_t shutdown_wait; /* wait for graceful shutdown */
 };
 #define sock2con(x) ((struct connection *)(x)->sk_user_data)
 
@@ -218,6 +222,7 @@ static struct connection *__nodeid2con(int nodeid, gfp_t alloc)
 	spin_lock_init(&con->writequeue_lock);
 	INIT_WORK(&con->swork, process_send_sockets);
 	INIT_WORK(&con->rwork, process_recv_sockets);
+	init_waitqueue_head(&con->shutdown_wait);
 
 	/* Setup action pointers for child sockets */
 	if (con->nodeid) {
@@ -619,6 +624,54 @@ static void close_connection(struct connection *con, bool and_other,
 	clear_bit(CF_CLOSING, &con->flags);
 }
 
+static void shutdown_connection(struct connection *con)
+{
+	int ret;
+
+	if (cancel_work_sync(&con->swork)) {
+		log_print("canceled swork for node %d", con->nodeid);
+		clear_bit(CF_WRITE_PENDING, &con->flags);
+	}
+
+	mutex_lock(&con->sock_mutex);
+	/* nothing to shutdown */
+	if (!con->sock) {
+		mutex_unlock(&con->sock_mutex);
+		return;
+	}
+
+	set_bit(CF_SHUTDOWN, &con->flags);
+	ret = kernel_sock_shutdown(con->sock, SHUT_WR);
+	mutex_unlock(&con->sock_mutex);
+	if (ret) {
+		log_print("Connection %p failed to shutdown: %d will force close",
+			  con, ret);
+		goto force_close;
+	} else {
+		ret = wait_event_timeout(con->shutdown_wait,
+					 !test_bit(CF_SHUTDOWN, &con->flags),
+					 DLM_SHUTDOWN_WAIT_TIMEOUT);
+		if (ret == 0) {
+			log_print("Connection %p shutdown timed out, will force close",
+				  con);
+			goto force_close;
+		}
+	}
+
+	return;
+
+force_close:
+	clear_bit(CF_SHUTDOWN, &con->flags);
+	close_connection(con, false, true, true);
+}
+
+static void dlm_tcp_shutdown(struct connection *con)
+{
+	if (con->othercon)
+		shutdown_connection(con->othercon);
+	shutdown_connection(con);
+}
+
 /* Data received from remote end */
 static int receive_from_sock(struct connection *con)
 {
@@ -711,13 +764,18 @@ static int receive_from_sock(struct connection *con)
 out_close:
 	mutex_unlock(&con->sock_mutex);
 	if (ret != -EAGAIN) {
-		close_connection(con, false, true, false);
 		/* Reconnect when there is something to send */
+		close_connection(con, false, true, false);
+		if (ret == 0) {
+			log_print("connection %p got EOF from %d",
+				  con, con->nodeid);
+			/* handling for tcp shutdown */
+			clear_bit(CF_SHUTDOWN, &con->flags);
+			wake_up(&con->shutdown_wait);
+			/* signal to breaking receive worker */
+			ret = -1;
+		}
 	}
-	/* Don't return success if we really got EOF */
-	if (ret == 0)
-		ret = -EAGAIN;
-
 	return ret;
 }
 
@@ -801,6 +859,7 @@ static int accept_from_sock(struct connection *con)
 			spin_lock_init(&othercon->writequeue_lock);
 			INIT_WORK(&othercon->swork, process_send_sockets);
 			INIT_WORK(&othercon->rwork, process_recv_sockets);
+			init_waitqueue_head(&othercon->shutdown_wait);
 			set_bit(CF_IS_OTHERCON, &othercon->flags);
 		} else {
 			/* close other sock con if we have something new */
@@ -1045,6 +1104,7 @@ static void tcp_connect_to_sock(struct connection *con)
 
 	con->rx_action = receive_from_sock;
 	con->connect_action = tcp_connect_to_sock;
+	con->shutdown_action = dlm_tcp_shutdown;
 	add_sock(sock, con);
 
 	/* Bind to our cluster-known address connecting to avoid
@@ -1540,6 +1600,12 @@ static void stop_conn(struct connection *con)
 	_stop_conn(con, true);
 }
 
+static void shutdown_conn(struct connection *con)
+{
+	if (con->shutdown_action)
+		con->shutdown_action(con);
+}
+
 static void free_conn(struct connection *con)
 {
 	close_connection(con, true, true, true);
@@ -1591,6 +1657,7 @@ void dlm_lowcomms_stop(void)
 	mutex_lock(&connections_lock);
 	dlm_allow_conn = 0;
 	mutex_unlock(&connections_lock);
+	foreach_conn(shutdown_conn);
 	work_flush();
 	clean_writequeues();
 	foreach_conn(free_conn);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages
  2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages Alexander Aring
@ 2020-07-24 14:04   ` Alexander Ahring Oder Aring
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Ahring Oder Aring @ 2020-07-24 14:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Thu, Jul 23, 2020 at 10:49 AM Alexander Aring <aahringo@redhat.com> wrote:
>
> This patch fix the error reporting of invalid messages, the return value
> of -EBADMSG is never returned by dlm_process_incoming_buffer(), so we
> just check for negative return values.
>
> Signed-off-by: Alexander Aring <aahringo@redhat.com>
> ---
>  fs/dlm/lowcomms.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index ec7ed228a9843..19b50d69babef 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -685,9 +685,9 @@ static int receive_from_sock(struct connection *con)
>                                           page_address(con->rx_page),
>                                           con->cb.base, con->cb.len,
>                                           PAGE_SIZE);
> -       if (ret == -EBADMSG) {
> -               log_print("lowcomms: addr=%p, base=%u, len=%u, read=%d",
> -                         page_address(con->rx_page), con->cb.base,
> +       if (ret < 0) {
> +               log_print("lowcomms err %d: addr=%p, base=%u, len=%u, read=%d",
> +                         ret, page_address(con->rx_page), con->cb.base,
>                           con->cb.len, r);
>         }
>         cbuf_eat(&con->cb, ret);

found a problem here. This should be in an else branch of the
condition above. In case of ret < 0 we should call "cbuf_eat(&con->cb,
r);". I believe...
I will send a v2.

- Alex



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-24 14:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 14:49 [Cluster-devel] [PATCH dlm-next 0/4] fs: dlm: receive handling changes Alexander Aring
2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 1/4] fs: dlm: don't close socket on invalid message Alexander Aring
2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 2/4] fs: dlm: fix report error of invalid messages Alexander Aring
2020-07-24 14:04   ` Alexander Ahring Oder Aring
2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 3/4] fs: dlm: change handling of reconnects Alexander Aring
2020-07-23 14:49 ` [Cluster-devel] [PATCH dlm-next 4/4] fs: dlm: implement tcp graceful shutdown Alexander Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.