All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock
Date: Thu,  9 Sep 2021 20:23:38 -0400	[thread overview]
Message-ID: <20210910002338.176677-17-sashal@kernel.org> (raw)
In-Reply-To: <20210910002338.176677-1-sashal@kernel.org>

From: Gang He <ghe@suse.com>

[ Upstream commit 9673e0050c39b0534d0e2ca431223f52089f4959 ]

Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to
the expected level for satisfy dlm bast requests from the other nodes.

But there is a rare situation.  When dlm lock conversion is being
canceled, ocfs2_downconvert_lock() function will return -EBUSY.  You need
to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm
implementation.

If we does not requeue this lockres entry, ocfs2 downconvert thread no
longer handles this dlm lock bast request.  Then, the other nodes will not
get the dlm lock again, the current node's process will be blocked when
acquire this dlm lock again.

Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ocfs2/dlmglue.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 2c3e975126b3..2e6aee19e536 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -32,6 +32,7 @@
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 #include <linux/time.h>
+#include <linux/delay.h>
 #include <linux/quotaops.h>
 
 #define MLOG_MASK_PREFIX ML_DLM_GLUE
@@ -3677,6 +3678,17 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb,
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 	ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb,
 				     gen);
+	/* The dlm lock convert is being cancelled in background,
+	 * ocfs2_cancel_convert() is asynchronous in fs/dlm,
+	 * requeue it, try again later.
+	 */
+	if (ret == -EBUSY) {
+		ctl->requeue = 1;
+		mlog(ML_BASTS, "lockres %s, ReQ: Downconvert busy\n",
+		     lockres->l_name);
+		ret = 0;
+		msleep(20);
+	}
 
 leave:
 	if (ret)
-- 
2.30.2


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Gang He <ghe@suse.com>, Joseph Qi <joseph.qi@linux.alibaba.com>,
	Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>,
	Junxiao Bi <junxiao.bi@oracle.com>,
	Changwei Ge <gechangwei@live.cn>, Jun Piao <piaojun@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock
Date: Thu,  9 Sep 2021 20:23:38 -0400	[thread overview]
Message-ID: <20210910002338.176677-17-sashal@kernel.org> (raw)
In-Reply-To: <20210910002338.176677-1-sashal@kernel.org>

From: Gang He <ghe@suse.com>

[ Upstream commit 9673e0050c39b0534d0e2ca431223f52089f4959 ]

Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to
the expected level for satisfy dlm bast requests from the other nodes.

But there is a rare situation.  When dlm lock conversion is being
canceled, ocfs2_downconvert_lock() function will return -EBUSY.  You need
to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm
implementation.

If we does not requeue this lockres entry, ocfs2 downconvert thread no
longer handles this dlm lock bast request.  Then, the other nodes will not
get the dlm lock again, the current node's process will be blocked when
acquire this dlm lock again.

Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ocfs2/dlmglue.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 2c3e975126b3..2e6aee19e536 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -32,6 +32,7 @@
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 #include <linux/time.h>
+#include <linux/delay.h>
 #include <linux/quotaops.h>
 
 #define MLOG_MASK_PREFIX ML_DLM_GLUE
@@ -3677,6 +3678,17 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb,
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 	ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb,
 				     gen);
+	/* The dlm lock convert is being cancelled in background,
+	 * ocfs2_cancel_convert() is asynchronous in fs/dlm,
+	 * requeue it, try again later.
+	 */
+	if (ret == -EBUSY) {
+		ctl->requeue = 1;
+		mlog(ML_BASTS, "lockres %s, ReQ: Downconvert busy\n",
+		     lockres->l_name);
+		ret = 0;
+		msleep(20);
+	}
 
 leave:
 	if (ret)
-- 
2.30.2


  parent reply	other threads:[~2021-09-10  0:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10  0:23 [PATCH AUTOSEL 4.9 01/17] clk: rockchip: rk3036: fix up the sclk_sfc parent error Sasha Levin
2021-09-10  0:23 ` Sasha Levin
2021-09-10  0:23 ` Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 02/17] scsi: lpfc: Fix cq_id truncation in rq create Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 03/17] HID: usbhid: free raw_report buffers in usbhid_stop Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 04/17] powerpc: make the install target not depend on any build artifact Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 05/17] jbd2: fix portability problems caused by unaligned accesses Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 06/17] scsi: qla2xxx: Fix NPIV create erroneous error Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 07/17] scsi: target: pscsi: Fix possible null-pointer dereference in pscsi_complete_cmd() Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 08/17] fs: dlm: fix return -EINTR on recovery stopped Sasha Levin
2021-09-10  0:23   ` [Cluster-devel] " Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 09/17] powerpc/32: indirect function call use bctrl rather than blrl in ret_from_kernel_thread Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 10/17] powerpc/booke: Avoid link stack corruption in several places Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 11/17] KVM: PPC: Book3S HV: Initialise vcpu MSR with MSR_ME Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 12/17] RDMA/core/sa_query: Retry SA queries Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 13/17] ext4: if zeroout fails fall back to splitting the extent node Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 14/17] ext4: Make sure quota files are not grabbed accidentally Sasha Levin
2021-09-10  0:23 ` [PATCH AUTOSEL 4.9 15/17] checkkconfigsymbols.py: Fix the '--ignore' option Sasha Levin
2021-09-10  0:23 ` [Ocfs2-devel] [PATCH AUTOSEL 4.9 16/17] ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info() Sasha Levin
2021-09-10  0:23   ` Sasha Levin
2021-09-10  0:23 ` Sasha Levin [this message]
2021-09-10  0:23   ` [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210910002338.176677-17-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.