All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2: handle ocfs2 node down event more correctly
@ 2011-09-01 15:28 Jiaju Zhang
  2011-09-02  8:57 ` Jiaju Zhang
  0 siblings, 1 reply; 2+ messages in thread
From: Jiaju Zhang @ 2011-09-01 15:28 UTC (permalink / raw)
  To: ocfs2-devel

In the scenario that ocfs2 is used with in-kernel fs/dlm and user-space
cluster stack, osb->node_num == node_num in ocfs2_do_node_down doesn't
mean it is a bug any more. This is because ocfs2_controld might receive
the node down information first, in the normal case, dlm_controld should
receive that node down information soon then osb->node_num != node_num.
But a rare case is before dlm_controld receive the node down information,
that node is up again and dlm_controld won't receive node down any more,
which results in osb->node_num == node_num here, this case can happen and
it should not be a bug. Just return here and won't trigger the recovery
thread should be the right way to go. Also, it won't introduce other side
effect when using o2cb stack.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
---
 fs/ocfs2/heartbeat.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
index d8208b2..632e855 100644
--- a/fs/ocfs2/heartbeat.c
+++ b/fs/ocfs2/heartbeat.c
@@ -64,10 +64,11 @@ void ocfs2_do_node_down(int node_num, void *data)
 {
 	struct ocfs2_super *osb = data;
 
-	BUG_ON(osb->node_num == node_num);
-
 	trace_ocfs2_do_node_down(node_num);
 
+	if (osb->node_num == node_num)
+		return;
+
 	if (!osb->cconn) {
 		/*
 		 * No cluster connection means we're not even ready to

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: handle ocfs2 node down event more correctly
  2011-09-01 15:28 [Ocfs2-devel] [PATCH] ocfs2: handle ocfs2 node down event more correctly Jiaju Zhang
@ 2011-09-02  8:57 ` Jiaju Zhang
  0 siblings, 0 replies; 2+ messages in thread
From: Jiaju Zhang @ 2011-09-02  8:57 UTC (permalink / raw)
  To: ocfs2-devel

Just found out this patch may not be correct since it also need some change
in user-space, I'll look into the issue more closely to see if it can
be resolved
in user-space totally.

So please ignore this patch, sorry for the noise;)

Thanks,
Jiaju

On Thu, Sep 1, 2011 at 11:28 PM, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
> In the scenario that ocfs2 is used with in-kernel fs/dlm and user-space
> cluster stack, osb->node_num == node_num in ocfs2_do_node_down doesn't
> mean it is a bug any more. This is because ocfs2_controld might receive
> the node down information first, in the normal case, dlm_controld should
> receive that node down information soon then osb->node_num != node_num.
> But a rare case is before dlm_controld receive the node down information,
> that node is up again and dlm_controld won't receive node down any more,
> which results in osb->node_num == node_num here, this case can happen and
> it should not be a bug. Just return here and won't trigger the recovery
> thread should be the right way to go. Also, it won't introduce other side
> effect when using o2cb stack.
>
> Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
> ---
> ?fs/ocfs2/heartbeat.c | ? ?5 +++--
> ?1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
> index d8208b2..632e855 100644
> --- a/fs/ocfs2/heartbeat.c
> +++ b/fs/ocfs2/heartbeat.c
> @@ -64,10 +64,11 @@ void ocfs2_do_node_down(int node_num, void *data)
> ?{
> ? ? ? ?struct ocfs2_super *osb = data;
>
> - ? ? ? BUG_ON(osb->node_num == node_num);
> -
> ? ? ? ?trace_ocfs2_do_node_down(node_num);
>
> + ? ? ? if (osb->node_num == node_num)
> + ? ? ? ? ? ? ? return;
> +
> ? ? ? ?if (!osb->cconn) {
> ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? * No cluster connection means we're not even ready to
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-09-02  8:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 15:28 [Ocfs2-devel] [PATCH] ocfs2: handle ocfs2 node down event more correctly Jiaju Zhang
2011-09-02  8:57 ` Jiaju Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.