All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging
@ 2016-05-20  8:45 Eric Ren
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Ren @ 2016-05-20  8:45 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When there are 3 or more partitions that merge, none may see enough
clean nodes. Therefore, DLM would be stuck there forever unitl
administrator manually reset/restart enough nodes to produce sufficient
clean nodes. Therefore, output explicit information for higher code (e.g. pcmk)
about the stateful merging state. Now, higher code can use `dlm status
-v` to get "stateful_merge_wait". If it equals "1", we know dlm is
waiting manual intervention. Then, higher code can choose one of nodes
to fence. DLM will continue to work if "clean nodes >= stateful merged
nodes" becomes true.

David advised me to do the right thing;-) Thanks a lot!

Signed-off-by: Eric Ren <zren@suse.com>
---
 dlm_controld/daemon_cpg.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
index 356e80d..0d55027 100644
--- a/dlm_controld/daemon_cpg.c
+++ b/dlm_controld/daemon_cpg.c
@@ -118,6 +118,7 @@ static int zombie_count;
 
 static int fence_result_pid;
 static unsigned int fence_result_try;
+static int stateful_merge_wait; /* cluster is stuck in waiting for manual intervention */
 
 static void send_fence_result(int nodeid, int result, uint32_t flags, uint64_t walltime);
 static void send_fence_clear(int nodeid, int result, uint32_t flags, uint64_t walltime);
@@ -847,10 +848,14 @@ static void daemon_fence_work(void)
 
 		if ((clean_count >= merge_count) && !part_count && (low == our_nodeid))
 			kick_stateful_merge_members();
+		if ((clean_count < merge_count) && !part_count)
+			stateful_merge_wait = 1;
 
 		retry = 1;
 		goto out;
 	}
+	if (stateful_merge_wait)
+		stateful_merge_wait = 0;
 
 	/*
 	 * startup fencing
@@ -2382,7 +2387,8 @@ static int print_state_daemon(char *str)
 		 "fence_pid=%d "
 		 "fence_in_progress_unknown=%d "
 		 "zombie_count=%d "
-		 "monotime=%llu ",
+		 "monotime=%llu "
+		 "stateful_merge_wait=%d ",
 		 daemon_member_count,
 		 daemon_joined_count,
 		 daemon_remove_count,
@@ -2392,7 +2398,8 @@ static int print_state_daemon(char *str)
 		 daemon_fence_pid,
 		 fence_in_progress_unknown,
 		 zombie_count,
-		 (unsigned long long)monotime());
+		 (unsigned long long)monotime(),
+		 stateful_merge_wait);
 
 	return strlen(str) + 1;
 }
-- 
2.6.6



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging
  2016-05-17 12:10 [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful Eric Ren
@ 2016-05-17 12:10 ` Eric Ren
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Ren @ 2016-05-17 12:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When there are 3 or more partitions that merge, none may see enough
clean nodes. Therefore, DLM would be stuck there forever unitl
administrator manually reset/restart enough nodes to produce sufficient
clean nodes. Therefore, output explicit information for higher code (e.g. pcmk)
about the stateful merging state. Now, higher code can use `dlm status
-v` to get "stateful_merge_wait". If it equals "1", we know dlm is
waiting manual intervention. Then, higher code can choose one of nodes
to fence. DLM will continue to work if "clean nodes >= stateful merged
nodes" becomes true.

David advised me to do the right thing;-) Thanks a lot!

Signed-off-by: Eric Ren <zren@suse.com>
---
 dlm_controld/daemon_cpg.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
index 356e80d..8f6434f 100644
--- a/dlm_controld/daemon_cpg.c
+++ b/dlm_controld/daemon_cpg.c
@@ -118,6 +118,7 @@ static int zombie_count;
 
 static int fence_result_pid;
 static unsigned int fence_result_try;
+static int stateful_merge_wait; /* cluster is stuck in waiting for manual intervention */
 
 static void send_fence_result(int nodeid, int result, uint32_t flags, uint64_t walltime);
 static void send_fence_clear(int nodeid, int result, uint32_t flags, uint64_t walltime);
@@ -847,10 +848,13 @@ static void daemon_fence_work(void)
 
 		if ((clean_count >= merge_count) && !part_count && (low == our_nodeid))
 			kick_stateful_merge_members();
+		if ((clean_count < merge_count) && !part_count)
+			stateful_merge_wait = 1;
 
 		retry = 1;
 		goto out;
 	}
+	stateful_merge_wait = 0; /* where should this line go? */
 
 	/*
 	 * startup fencing
@@ -2382,7 +2386,8 @@ static int print_state_daemon(char *str)
 		 "fence_pid=%d "
 		 "fence_in_progress_unknown=%d "
 		 "zombie_count=%d "
-		 "monotime=%llu ",
+		 "monotime=%llu "
+		 "stateful_merge_wait=%d ",
 		 daemon_member_count,
 		 daemon_joined_count,
 		 daemon_remove_count,
@@ -2392,7 +2397,8 @@ static int print_state_daemon(char *str)
 		 daemon_fence_pid,
 		 fence_in_progress_unknown,
 		 zombie_count,
-		 (unsigned long long)monotime());
+		 (unsigned long long)monotime(),
+		 stateful_merge_wait);
 
 	return strlen(str) + 1;
 }
-- 
2.6.6



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-05-20  8:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20  8:45 [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging Eric Ren
  -- strict thread matches above, loose matches on Subject: below --
2016-05-17 12:10 [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful Eric Ren
2016-05-17 12:10 ` [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging Eric Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.