All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
@ 2017-09-12  9:01 tsutomu.owa
  2017-09-12 16:56 ` David Teigland
  0 siblings, 1 reply; 3+ messages in thread
From: tsutomu.owa @ 2017-09-12  9:01 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock
was detected, the caller of can_be_granted() was unknown.
We change the behavior of can_be_granted() and change it to detect
conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag
is set or not. And depending on whether the DLM_LKF_NODLCKWT flag
is set or not, we change the behavior at the caller of can_be_granted().

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
---
 fs/dlm/lock.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..edee26a 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2466,14 +2466,12 @@ static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now,
 		if (lkb->lkb_exflags & DLM_LKF_CONVDEADLK) {
 			lkb->lkb_grmode = DLM_LOCK_NL;
 			lkb->lkb_sbflags |= DLM_SBF_DEMOTED;
-		} else if (!(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) {
-			if (err)
-				*err = -EDEADLK;
-			else {
-				log_print("can_be_granted deadlock %x now %d",
-					  lkb->lkb_id, now);
-				dlm_dump_rsb(r);
-			}
+		} else if (err) {
+			*err = -EDEADLK;
+		} else {
+			log_print("can_be_granted deadlock %x now %d",
+				  lkb->lkb_id, now);
+			dlm_dump_rsb(r);
 		}
 		goto out;
 	}
@@ -2519,6 +2517,7 @@ static int grant_pending_convert(struct dlm_rsb *r, int high, int *cw,
 	int recover = rsb_flag(r, RSB_RECOVER_GRANT);
 	int hi, demoted, quit, grant_restart, demote_restart;
 	int deadlk;
+	int exflags;
 
 	quit = 0;
  restart:
@@ -2549,6 +2548,17 @@ static int grant_pending_convert(struct dlm_rsb *r, int high, int *cw,
 			log_print("WARN: pending deadlock %x node %d %s",
 				  lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
 			dlm_dump_rsb(r);
+			/*
+			 * If DLM_LKB_NODLKWT flag is set and conversion
+			 * deadlock is detected, we request blocking AST and
+			 * down (or cancel) conversion.
+			 */
+			if (lkb->lkb_exflags & DLM_LKF_NODLCKWT &&
+			    lkb->lkb_highbast < lkb->lkb_rqmode) {
+				queue_bast(r, lkb, lkb->lkb_rqmode);
+				lkb->lkb_highbast = lkb->lkb_rqmode;
+			}
+
 			continue;
 		}
 
@@ -3124,7 +3134,7 @@ static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb)
 	   deadlock, so we leave it on the granted queue and return EDEADLK in
 	   the ast for the convert. */
 
-	if (deadlk) {
+	if (deadlk && !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) {
 		/* it's left on the granted queue */
 		revert_lock(r, lkb);
 		queue_cast(r, lkb, -EDEADLK);
-- 
2.7.4







^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
  2017-09-12  9:01 [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set tsutomu.owa
@ 2017-09-12 16:56 ` David Teigland
  2017-09-14  9:31   ` tsutomu.owa
  0 siblings, 1 reply; 3+ messages in thread
From: David Teigland @ 2017-09-12 16:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Sep 12, 2017 at 09:01:31AM +0000, tsutomu.owa at toshiba.co.jp wrote:
> When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock
> was detected, the caller of can_be_granted() was unknown.
> We change the behavior of can_be_granted() and change it to detect
> conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag
> is set or not. And depending on whether the DLM_LKF_NODLCKWT flag
> is set or not, we change the behavior at the caller of can_be_granted().

Thanks for sorting this out.  Could we expand this with a description of
the externally visible effects from the change?

> @@ -2549,6 +2548,17 @@ static int grant_pending_convert(struct dlm_rsb *r, int high, int *cw,
>  			log_print("WARN: pending deadlock %x node %d %s",
>  				  lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
>  			dlm_dump_rsb(r);
> +			/*
> +			 * If DLM_LKB_NODLKWT flag is set and conversion
> +			 * deadlock is detected, we request blocking AST and
> +			 * down (or cancel) conversion.
> +			 */
> +			if (lkb->lkb_exflags & DLM_LKF_NODLCKWT &&
> +			    lkb->lkb_highbast < lkb->lkb_rqmode) {
> +				queue_bast(r, lkb, lkb->lkb_rqmode);
> +				lkb->lkb_highbast = lkb->lkb_rqmode;
> +			}

Is this the main change in behavior you are looking for?  It seems to make
sense.  Do you find that this case is occurring during normal operation?
I did not previously expect that this code path would be used, but if it
is, then we should change the FIXME comment to explain how and when it
happens, and the warning and dump_rsb should be suppresssed.

> +
>  			continue;
>  		}
>  
> @@ -3124,7 +3134,7 @@ static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb)
>  	   deadlock, so we leave it on the granted queue and return EDEADLK in
>  	   the ast for the convert. */
>  
> -	if (deadlk) {
> +	if (deadlk && !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) {
>  		/* it's left on the granted queue */
>  		revert_lock(r, lkb);
>  		queue_cast(r, lkb, -EDEADLK);
> -- 
> 2.7.4



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
  2017-09-12 16:56 ` David Teigland
@ 2017-09-14  9:31   ` tsutomu.owa
  0 siblings, 0 replies; 3+ messages in thread
From: tsutomu.owa @ 2017-09-14  9:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

> Thanks for sorting this out.  Could we expand this with a description of
> the externally visible effects from the change?

This fix has no effect except when using DLM_LKF_NODLCKWT flag.
We are using fsdlm with ocfs2. Currently, DLM_LKF_NODLCKWT flag is used
only by ocfs2. ocfs2 does not expect a cancel operation by detecting
conversion deadlock when calling dlm_lock(). ocfs2 is implemented to
perform a cancel operation by requesting BASTs (callback).

> Is this the main change in behavior you are looking for?  It seems to make
> sense.  Do you find that this case is occurring during normal operation?

Conversion deadlock was infrequently seen when multiple nodes mount/umount
ocfs2 concurrently. It seems that it happens mainly after the master node
exits from lock space and the recovery process has run on another node.
However, since it depends on the operation timing for each node whether
it occurs or not, it will also occur in another cases.

> I did not previously expect that this code path would be used, but if it
> is, then we should change the FIXME comment to explain how and when it
> happens, and the warning and dump_rsb should be suppresssed.

How about with the corrections below?
(Let us leave the fix of FIXME to you.)

thanks,
-- owa
p.s. Could we re-send try#3 patch not to add extra/unused "int exflags" in grant_pending_convert()?

--- fs/dlm/lock.c.orig  2017-09-13 14:29:37.604795992 +0900
+++ fs/dlm/lock.c       2017-09-13 14:41:26.013722764 +0900
@@ -2544,18 +2544,21 @@
 		}
 
 		if (deadlk) {
-			log_print("WARN: pending deadlock %x node %d %s",
-				  lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
-			dlm_dump_rsb(r);
 			/*
 			 * If DLM_LKB_NODLKWT flag is set and conversion
 			 * deadlock is detected, we request blocking AST and
 			 * down (or cancel) conversion.
 			 */
-			if (lkb->lkb_exflags & DLM_LKF_NODLCKWT &&
-			    lkb->lkb_highbast < lkb->lkb_rqmode) {
-				queue_bast(r, lkb, lkb->lkb_rqmode);
-				lkb->lkb_highbast = lkb->lkb_rqmode;
+			if (lkb->lkb_exflags & DLM_LKF_NODLCKWT) {
+				if (lkb->lkb_highbast < lkb->lkb_rqmode) {
+					queue_bast(r, lkb, lkb->lkb_rqmode);
+					lkb->lkb_highbast = lkb->lkb_rqmode;
+				}
+			} else {
+				log_print("WARN: pending deadlock %x node %d %s",
+					  lkb->lkb_id, lkb->lkb_nodeid,
+					  r->res_name);
+				dlm_dump_rsb(r);
 			}
 			continue;
 		}
---

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com] 
Sent: Wednesday, September 13, 2017 1:56 AM
To: owa tsutomu(?? ? ??? ?????????????)
Cc: cluster-devel at redhat.com; miyauchi tadashi(?? ?? ???? ?????????)
Subject: Re: [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set

On Tue, Sep 12, 2017 at 09:01:31AM +0000, tsutomu.owa at toshiba.co.jp wrote:
> When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock
> was detected, the caller of can_be_granted() was unknown.
> We change the behavior of can_be_granted() and change it to detect
> conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag
> is set or not. And depending on whether the DLM_LKF_NODLCKWT flag
> is set or not, we change the behavior at the caller of can_be_granted().

Thanks for sorting this out.  Could we expand this with a description of
the externally visible effects from the change?

> @@ -2549,6 +2548,17 @@ static int grant_pending_convert(struct dlm_rsb *r, int high, int *cw,
>  			log_print("WARN: pending deadlock %x node %d %s",
>  				  lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
>  			dlm_dump_rsb(r);
> +			/*
> +			 * If DLM_LKB_NODLKWT flag is set and conversion
> +			 * deadlock is detected, we request blocking AST and
> +			 * down (or cancel) conversion.
> +			 */
> +			if (lkb->lkb_exflags & DLM_LKF_NODLCKWT &&
> +			    lkb->lkb_highbast < lkb->lkb_rqmode) {
> +				queue_bast(r, lkb, lkb->lkb_rqmode);
> +				lkb->lkb_highbast = lkb->lkb_rqmode;
> +			}

Is this the main change in behavior you are looking for?  It seems to make
sense.  Do you find that this case is occurring during normal operation?
I did not previously expect that this code path would be used, but if it
is, then we should change the FIXME comment to explain how and when it
happens, and the warning and dump_rsb should be suppresssed.

> +
>  			continue;
>  		}
>  
> @@ -3124,7 +3134,7 @@ static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb)
>  	   deadlock, so we leave it on the granted queue and return EDEADLK in
>  	   the ast for the convert. */
>  
> -	if (deadlk) {
> +	if (deadlk && !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) {
>  		/* it's left on the granted queue */
>  		revert_lock(r, lkb);
>  		queue_cast(r, lkb, -EDEADLK);
> -- 
> 2.7.4





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-09-14  9:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12  9:01 [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set tsutomu.owa
2017-09-12 16:56 ` David Teigland
2017-09-14  9:31   ` tsutomu.owa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.