[Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect
@ 2014-06-13  1:48 Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-06-13  1:48 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

This patch serial is to fix a possible message lost bug in ocfs2 when
network go bad. This bug will cause ocfs2 hung forever even network
become good again.
The messages may lost in this case. After the tcp connection is established
between two nodes, an idle timer will be set to check its state periodically,
if no messages are received during this time, idle timer will timeout, it will
shutdown the connection and try to reconnect, so pending messages in tcp queues
will be lost. This messages may be from dlm. Dlm may get hung in this case. This
may cause the whole ocfs2 cluster hung. 
This is very possible to happen when network state goes bad. Do the reconnect is
useless, it will fail if network state is still bad. Just waiting there for
network recovering may be a good idea, it will not lost messages and some node
will be fenced until cluster goes into split-brain state, for this case, Tcp user
timeout is used to override the tcp retransmit timeout. It will timeout after 25
days, user should have notice this through the provided log and fix the network,
if they don't, ocfs2 will fall back to original reconnect way.
This is a resend of the patches, no changes since last time. Please help review.

Thanks,
Junxiao.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout
  2014-06-13  1:48 [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
@ 2014-06-13  1:48 ` Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value Junxiao Bi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-06-13  1:48 UTC (permalink / raw)
  To: ocfs2-devel

Some messages in the tcp queue maybe lost if we shutdown the connection
and reconnect when idle timeout. If packets lost and reconnect success,
then the ocfs2 cluster maybe hung.

To fix this, we can leave the connection there and do the fence decision
when idle timeout, if network recover before fence dicision is made, the
connection survive without lost any messages.

This bug can be saw when network state go bad. It may cause ocfs2 hung
forever if some packets lost. With this fix, ocfs2 will recover from
hung if network becomes good again.

Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/cluster/tcp.c |   25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index c6b90e6..76ef3d8 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1536,16 +1536,20 @@ static void o2net_idle_timer(unsigned long data)
 #endif
 
 	printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has been "
-	       "idle for %lu.%lu secs, shutting it down.\n", SC_NODEF_ARGS(sc),
-	       msecs / 1000, msecs % 1000);
+	       "idle for %lu.%lu secs.\n",
+	       SC_NODEF_ARGS(sc), msecs / 1000, msecs % 1000);
 
-	/*
-	 * Initialize the nn_timeout so that the next connection attempt
-	 * will continue in o2net_start_connect.
+	/* idle timerout happen, don't shutdown the connection, but
+	 * make fence decision. Maybe the connection can recover before
+	 * the decision is made.
 	 */
 	atomic_set(&nn->nn_timeout, 1);
+	o2quo_conn_err(o2net_num_from_nn(nn));
+	queue_delayed_work(o2net_wq, &nn->nn_still_up,
+			msecs_to_jiffies(O2NET_QUORUM_DELAY_MS));
+
+	o2net_sc_reset_idle_timer(sc);
 
-	o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
 }
 
 static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc)
@@ -1560,6 +1564,15 @@ static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc)
 
 static void o2net_sc_postpone_idle(struct o2net_sock_container *sc)
 {
+	struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
+
+	/* clear fence decision since the connection recover from timeout*/
+	if (atomic_read(&nn->nn_timeout)) {
+		o2quo_conn_up(o2net_num_from_nn(nn));
+		cancel_delayed_work(&nn->nn_still_up);
+		atomic_set(&nn->nn_timeout, 0);
+	}
+
 	/* Only push out an existing timer */
 	if (timer_pending(&sc->sc_idle_timeout))
 		o2net_sc_reset_idle_timer(sc);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value
  2014-06-13  1:48 [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
@ 2014-06-13  1:48 ` Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi
  2014-06-13  1:56 ` [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
  3 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-06-13  1:48 UTC (permalink / raw)
  To: ocfs2-devel

When tcp retransmit timeout(15mins), the connection will be closed.
Pending messages may be lost during this time. So we set tcp user
timeout to override the retransmit timeout to the max value.
This is OK for ocfs2 since we have disk heartbeat, if peer crash,
the disk heartbeat will timeout and it will be evicted, if disk
heartbeat not timeout and connection idle for a long time, then
this means the cluster enters split-brain state, since fence can't
happen, we'd better keep the connection and wait network recover.

Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/cluster/tcp.c |   20 ++++++++++++++++++++
 fs/ocfs2/cluster/tcp.h |    1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 76ef3d8..eae58d8 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1480,6 +1480,14 @@ static int o2net_set_nodelay(struct socket *sock)
 	return ret;
 }
 
+static int o2net_set_usertimeout(struct socket *sock)
+{
+	int user_timeout = O2NET_TCP_USER_TIMEOUT;
+
+	return kernel_setsockopt(sock, SOL_TCP, TCP_USER_TIMEOUT,
+				(char *)&user_timeout, sizeof(user_timeout));
+}
+
 static void o2net_initialize_handshake(void)
 {
 	o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32(
@@ -1663,6 +1671,12 @@ static void o2net_start_connect(struct work_struct *work)
 		goto out;
 	}
 
+	ret = o2net_set_usertimeout(sock);
+	if (ret) {
+		mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret);
+		goto out;
+	}
+
 	o2net_register_callbacks(sc->sc_sock->sk, sc);
 
 	spin_lock(&nn->nn_lock);
@@ -1842,6 +1856,12 @@ static int o2net_accept_one(struct socket *sock)
 		goto out;
 	}
 
+	ret = o2net_set_usertimeout(new_sock);
+	if (ret) {
+		mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret);
+		goto out;
+	}
+
 	slen = sizeof(sin);
 	ret = new_sock->ops->getname(new_sock, (struct sockaddr *) &sin,
 				       &slen, 1);
diff --git a/fs/ocfs2/cluster/tcp.h b/fs/ocfs2/cluster/tcp.h
index 5bada2a..c571e84 100644
--- a/fs/ocfs2/cluster/tcp.h
+++ b/fs/ocfs2/cluster/tcp.h
@@ -63,6 +63,7 @@ typedef void (o2net_post_msg_handler_func)(int status, void *data,
 #define O2NET_KEEPALIVE_DELAY_MS_DEFAULT	2000
 #define O2NET_IDLE_TIMEOUT_MS_DEFAULT		30000
 
+#define O2NET_TCP_USER_TIMEOUT			0x7fffffff
 
 /* TODO: figure this out.... */
 static inline int o2net_link_down(int err, struct socket *sock)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced
  2014-06-13  1:48 [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value Junxiao Bi
@ 2014-06-13  1:48 ` Junxiao Bi
  2014-06-13  1:56 ` [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
  3 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-06-13  1:48 UTC (permalink / raw)
  To: ocfs2-devel

For debug use, we can see from the log whether the fence decision
is made and why it is not fenced.

Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/cluster/quorum.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/cluster/quorum.c b/fs/ocfs2/cluster/quorum.c
index 1ec141e..62e8ec6 100644
--- a/fs/ocfs2/cluster/quorum.c
+++ b/fs/ocfs2/cluster/quorum.c
@@ -160,9 +160,18 @@ static void o2quo_make_decision(struct work_struct *work)
 	}
 
 out:
-	spin_unlock(&qs->qs_lock);
-	if (fence)
+	if (fence) {
+		spin_unlock(&qs->qs_lock);
 		o2quo_fence_self();
+	} else {
+		mlog(ML_NOTICE, "not fencing this node, heartbeating: %d, "
+			"connected: %d, lowest: %d (%sreachable)\n",
+			qs->qs_heartbeating, qs->qs_connected, lowest_hb,
+			lowest_reachable ? "" : "un");
+		spin_unlock(&qs->qs_lock);
+
+	}
+
 }
 
 static void o2quo_set_hold(struct o2quo_state *qs, u8 node)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect
  2014-06-13  1:48 [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
                   ` (2 preceding siblings ...)
  2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi
@ 2014-06-13  1:56 ` Junxiao Bi
  3 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-06-13  1:56 UTC (permalink / raw)
  To: ocfs2-devel

Not sure why Joseph Qi is excluded from cc list of git send-email.
Cc him.

On 06/13/2014 09:48 AM, Junxiao Bi wrote:
>
> Hi,
>
> This patch serial is to fix a possible message lost bug in ocfs2 when
> network go bad. This bug will cause ocfs2 hung forever even network
> become good again.
> The messages may lost in this case. After the tcp connection is established
> between two nodes, an idle timer will be set to check its state periodically,
> if no messages are received during this time, idle timer will timeout, it will
> shutdown the connection and try to reconnect, so pending messages in tcp queues
> will be lost. This messages may be from dlm. Dlm may get hung in this case. This
> may cause the whole ocfs2 cluster hung. 
> This is very possible to happen when network state goes bad. Do the reconnect is
> useless, it will fail if network state is still bad. Just waiting there for
> network recovering may be a good idea, it will not lost messages and some node
> will be fenced until cluster goes into split-brain state, for this case, Tcp user
> timeout is used to override the tcp retransmit timeout. It will timeout after 25
> days, user should have notice this through the provided log and fix the network,
> if they don't, ocfs2 will fall back to original reconnect way.
> This is a resend of the patches, no changes since last time. Please help review.
>
> Thanks,
> Junxiao.
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced
  2014-05-15  4:26 [Ocfs2-devel] [PATCH 0/3] " Junxiao Bi
@ 2014-05-15  4:26 ` Junxiao Bi
  0 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2014-05-15  4:26 UTC (permalink / raw)
  To: ocfs2-devel

For debug use, we can see from the log whether the fence decision
is made and why it is not fenced.

Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/cluster/quorum.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/cluster/quorum.c b/fs/ocfs2/cluster/quorum.c
index 1ec141e..62e8ec6 100644
--- a/fs/ocfs2/cluster/quorum.c
+++ b/fs/ocfs2/cluster/quorum.c
@@ -160,9 +160,18 @@ static void o2quo_make_decision(struct work_struct *work)
 	}
 
 out:
-	spin_unlock(&qs->qs_lock);
-	if (fence)
+	if (fence) {
+		spin_unlock(&qs->qs_lock);
 		o2quo_fence_self();
+	} else {
+		mlog(ML_NOTICE, "not fencing this node, heartbeating: %d, "
+			"connected: %d, lowest: %d (%sreachable)\n",
+			qs->qs_heartbeating, qs->qs_connected, lowest_hb,
+			lowest_reachable ? "" : "un");
+		spin_unlock(&qs->qs_lock);
+
+	}
+
 }
 
 static void o2quo_set_hold(struct o2quo_state *qs, u8 node)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-13  1:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13  1:48 [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value Junxiao Bi
2014-06-13  1:48 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi
2014-06-13  1:56 ` [Ocfs2-devel] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
  -- strict thread matches above, loose matches on Subject: below --
2014-05-15  4:26 [Ocfs2-devel] [PATCH 0/3] " Junxiao Bi
2014-05-15  4:26 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.