All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled
@ 2018-12-03 12:20 wangjian
  2018-12-05  1:49 ` Changwei Ge
  0 siblings, 1 reply; 25+ messages in thread
From: wangjian @ 2018-12-03 12:20 UTC (permalink / raw)
  To: ocfs2-devel

In the dlm_move_lockres_to_recovery_list function, if the lock
is in the granted queue and cancel_pending is set, it will
encounter a BUG. I think this is a meaningless BUG,
so be prepared to remove it. A scenario that causes
this BUG will be given below.

At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
             want to get EX lock.

                             want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                             receive BAST from
                             Node 1. downconvert
                             thread begin to
                             cancel PR to EX conversion.
                             In dlmunlock_common function,
                             downconvert thread has set
                             lock->cancel_pending,
                             but did not enter
                             dlm_send_remote_unlock_request
                             function.

             Node2 dies because
             the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                             receive AST from Node 1.
                             change lock level to EX,
                             move lock to granted list.

Node1 dies because
the host is powered down.

                             In dlm_move_lockres_to_recovery_list
                             function. the lock is in the
                             granted queue and cancel_pending
                             is set. BUG_ON.

But after clearing this BUG, process will encounter
the second BUG in the ocfs2_unlock_ast function.
Here is a scenario that will cause the second BUG
in ocfs2_unlock_ast as follows:

At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
             want to get EX lock.

                             want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                             receive BAST from
                             Node 1. downconvert
                             thread begin to
                             cancel PR to EX conversion.
                             In dlmunlock_common function,
                             downconvert thread has released
                             lock->spinlock and res->spinlock,
                             but did not enter
                             dlm_send_remote_unlock_request
                             function.

             Node2 dies because
             the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                             receive AST from Node 1.
                             change lock level to EX,
                             move lock to granted list,
                             set lockres->l_unlock_action
                             as OCFS2_UNLOCK_INVALID
                             in ocfs2_locking_ast function.

Node2 dies because
the host is powered down.

                             Node 3 realize that Node 1
                             is dead, remove Node 1 from
                             domain_map. downconvert thread
                             get DLM_NORMAL from
                             dlm_send_remote_unlock_request
                             function and set *call_ast as 1.
                             Then downconvert thread meet
                             BUG in ocfs2_unlock_ast function.

To avoid meet the second BUG, function dlmunlock_common shuold
return DLM_CANCELGRANT if the lock is on granted list and
the operation is canceled.

Signed-off-by: Jian Wang <wangjian161@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
---
  fs/ocfs2/dlm/dlmrecovery.c | 1 -
  fs/ocfs2/dlm/dlmunlock.c   | 5 +++++
  2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 802636d..7489652 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
  				 * if this had completed successfully
  				 * before sending this lock state to the
  				 * new master */
-				BUG_ON(i != DLM_CONVERTING_LIST);
  				mlog(0, "node died with cancel pending "
  				     "on %.*s. move back to granted list.\n",
  				     res->lockname.len, res->lockname.name);
diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c
index 63d701c..505bb6c 100644
--- a/fs/ocfs2/dlm/dlmunlock.c
+++ b/fs/ocfs2/dlm/dlmunlock.c
@@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
  							flags, owner);
  		spin_lock(&res->spinlock);
  		spin_lock(&lock->spinlock);
+
+		if ((flags & LKM_CANCEL) &&
+				dlm_lock_on_list(&res->granted, lock))
+			status = DLM_CANCELGRANT;
+
  		/* if the master told us the lock was already granted,
  		 * let the ast handle all of these actions */
  		if (status == DLM_CANCELGRANT) {
-- 
1.8.3.1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20181203/845049c8/attachment.html 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-02-22  3:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 12:20 [Ocfs2-devel] [PATCH] ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled wangjian
2018-12-05  1:49 ` Changwei Ge
2018-12-06 12:05   ` wangjian
2018-12-07  3:12     ` Changwei Ge
2018-12-08 10:05       ` wangjian
2019-02-14  9:04         ` piaojun
2019-02-14  9:59           ` Changwei Ge
2019-02-14 10:25             ` piaojun
2019-02-14 10:28               ` Changwei Ge
2019-02-14 10:13           ` Changwei Ge
2019-02-15  6:50             ` piaojun
2019-02-15  7:06               ` Changwei Ge
2019-02-15  7:35                 ` piaojun
2019-02-15  7:56                   ` Changwei Ge
2019-02-15  9:19                     ` piaojun
2019-02-15  9:27                       ` Changwei Ge
2019-02-15  9:48                         ` piaojun
2019-02-18  9:25                           ` Changwei Ge
2019-02-19  0:47                             ` piaojun
2019-02-19  2:38                               ` Changwei Ge
2019-02-19  8:26                                 ` piaojun
2019-02-21  6:46                                   ` Changwei Ge
2019-02-22  3:15                                     ` piaojun
2019-02-22  3:32                                       ` Changwei Ge
2019-02-22  3:34                                         ` piaojun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.