All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [patch 3/5] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler
@ 2016-07-28 21:06 akpm at linux-foundation.org
  2016-07-28 22:01 ` Mark Fasheh
  0 siblings, 1 reply; 2+ messages in thread
From: akpm at linux-foundation.org @ 2016-07-28 21:06 UTC (permalink / raw)
  To: ocfs2-devel

From: piaojun <piaojun@huawei.com>
Subject: ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler

We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
unexpected that described below.  To solve the bug, we disable the BUG_ON
and purge lockres in dlm_do_local_recovery_cleanup.

Node 1                               Node 2(master)
dlm_purge_lockres
                                     dlm_deref_lockres_handler

                                     DLM_LOCK_RES_SETREF_INPROG is set
                                     response DLM_DEREF_RESPONSE_INPROG

receive DLM_DEREF_RESPONSE_INPROG
stop puring in dlm_purge_lockres
and wait for DLM_DEREF_RESPONSE_DONE

                                     dispatch dlm_deref_lockres_worker
                                     response DLM_DEREF_RESPONSE_DONE

receive DLM_DEREF_RESPONSE_DONE and
prepare to purge lockres

                                     Node 2 goes down

find Node2 down and do local
clean up for Node2:
dlm_do_local_recovery_cleanup
  -> clear DLM_LOCK_RES_DROPPING_REF

when purging lockres, BUG_ON happens
because DLM_LOCK_RES_DROPPING_REF is clear:
dlm_deref_lockres_done_handler
  ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));

[akpm at linux-foundation.org: fix duplicated write to `ret']
Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
Link: http://lkml.kernel.org/r/57845055.9080702 at huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/dlm/dlmmaster.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-disable-bug_on-when-dlm_lock_res_dropping_ref-is-cleared-before-dlm_deref_lockres_done_handler fs/ocfs2/dlm/dlmmaster.c
--- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-disable-bug_on-when-dlm_lock_res_dropping_ref-is-cleared-before-dlm_deref_lockres_done_handler
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -2416,7 +2416,17 @@ int dlm_deref_lockres_done_handler(struc
 	}
 
 	spin_lock(&res->spinlock);
-	BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
+	if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
+		spin_unlock(&res->spinlock);
+		spin_unlock(&dlm->spinlock);
+		mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
+			"but it is already derefed!\n", dlm->name,
+			res->lockname.len, res->lockname.name, node);
+		dlm_lockres_put(res);
+		ret = 0;
+		goto done;
+	}
+
 	if (!list_empty(&res->purge)) {
 		mlog(0, "%s: Removing res %.*s from purgelist\n",
 			dlm->name, res->lockname.len, res->lockname.name);
@@ -2456,7 +2466,6 @@ int dlm_deref_lockres_done_handler(struc
 	spin_unlock(&dlm->spinlock);
 
 	ret = 0;
-
 done:
 	dlm_put(dlm);
 	return ret;
_

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Ocfs2-devel] [patch 3/5] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler
  2016-07-28 21:06 [Ocfs2-devel] [patch 3/5] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler akpm at linux-foundation.org
@ 2016-07-28 22:01 ` Mark Fasheh
  0 siblings, 0 replies; 2+ messages in thread
From: Mark Fasheh @ 2016-07-28 22:01 UTC (permalink / raw)
  To: ocfs2-devel

On Thu, Jul 28, 2016 at 02:06:02PM -0700, Andrew Morton wrote:
> From: piaojun <piaojun@huawei.com>
> Subject: ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler
> 
> We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
> unexpected that described below.  To solve the bug, we disable the BUG_ON
> and purge lockres in dlm_do_local_recovery_cleanup.
> 
> Node 1                               Node 2(master)
> dlm_purge_lockres
>                                      dlm_deref_lockres_handler
> 
>                                      DLM_LOCK_RES_SETREF_INPROG is set
>                                      response DLM_DEREF_RESPONSE_INPROG
> 
> receive DLM_DEREF_RESPONSE_INPROG
> stop puring in dlm_purge_lockres
> and wait for DLM_DEREF_RESPONSE_DONE
> 
>                                      dispatch dlm_deref_lockres_worker
>                                      response DLM_DEREF_RESPONSE_DONE
> 
> receive DLM_DEREF_RESPONSE_DONE and
> prepare to purge lockres
> 
>                                      Node 2 goes down
> 
> find Node2 down and do local
> clean up for Node2:
> dlm_do_local_recovery_cleanup
>   -> clear DLM_LOCK_RES_DROPPING_REF
> 
> when purging lockres, BUG_ON happens
> because DLM_LOCK_RES_DROPPING_REF is clear:
> dlm_deref_lockres_done_handler
>   ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));

Thanks Piaojun,

Reviewed-by: Mark Fasheh <mfasheh@suse.de>
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-07-28 22:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28 21:06 [Ocfs2-devel] [patch 3/5] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler akpm at linux-foundation.org
2016-07-28 22:01 ` Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.