From: Sasha Levin <sashal@kernel.org> To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Sasha Levin <sashal@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock Date: Thu, 9 Sep 2021 20:23:38 -0400 [thread overview] Message-ID: <20210910002338.176677-17-sashal@kernel.org> (raw) In-Reply-To: <20210910002338.176677-1-sashal@kernel.org> From: Gang He <ghe@suse.com> [ Upstream commit 9673e0050c39b0534d0e2ca431223f52089f4959 ] Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to the expected level for satisfy dlm bast requests from the other nodes. But there is a rare situation. When dlm lock conversion is being canceled, ocfs2_downconvert_lock() function will return -EBUSY. You need to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm implementation. If we does not requeue this lockres entry, ocfs2 downconvert thread no longer handles this dlm lock bast request. Then, the other nodes will not get the dlm lock again, the current node's process will be blocked when acquire this dlm lock again. Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> --- fs/ocfs2/dlmglue.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 2c3e975126b3..2e6aee19e536 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -32,6 +32,7 @@ #include <linux/debugfs.h> #include <linux/seq_file.h> #include <linux/time.h> +#include <linux/delay.h> #include <linux/quotaops.h> #define MLOG_MASK_PREFIX ML_DLM_GLUE @@ -3677,6 +3678,17 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb, spin_unlock_irqrestore(&lockres->l_lock, flags); ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb, gen); + /* The dlm lock convert is being cancelled in background, + * ocfs2_cancel_convert() is asynchronous in fs/dlm, + * requeue it, try again later. + */ + if (ret == -EBUSY) { + ctl->requeue = 1; + mlog(ML_BASTS, "lockres %s, ReQ: Downconvert busy\n", + lockres->l_name); + ret = 0; + msleep(20); + } leave: if (ret) -- 2.30.2 _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org> To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Gang He <ghe@suse.com>, Joseph Qi <joseph.qi@linux.alibaba.com>, Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>, Junxiao Bi <junxiao.bi@oracle.com>, Changwei Ge <gechangwei@live.cn>, Jun Piao <piaojun@huawei.com>, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, Sasha Levin <sashal@kernel.org>, ocfs2-devel@oss.oracle.com Subject: [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock Date: Thu, 9 Sep 2021 20:23:38 -0400 [thread overview] Message-ID: <20210910002338.176677-17-sashal@kernel.org> (raw) In-Reply-To: <20210910002338.176677-1-sashal@kernel.org> From: Gang He <ghe@suse.com> [ Upstream commit 9673e0050c39b0534d0e2ca431223f52089f4959 ] Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to the expected level for satisfy dlm bast requests from the other nodes. But there is a rare situation. When dlm lock conversion is being canceled, ocfs2_downconvert_lock() function will return -EBUSY. You need to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm implementation. If we does not requeue this lockres entry, ocfs2 downconvert thread no longer handles this dlm lock bast request. Then, the other nodes will not get the dlm lock again, the current node's process will be blocked when acquire this dlm lock again. Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> --- fs/ocfs2/dlmglue.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 2c3e975126b3..2e6aee19e536 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -32,6 +32,7 @@ #include <linux/debugfs.h> #include <linux/seq_file.h> #include <linux/time.h> +#include <linux/delay.h> #include <linux/quotaops.h> #define MLOG_MASK_PREFIX ML_DLM_GLUE @@ -3677,6 +3678,17 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb, spin_unlock_irqrestore(&lockres->l_lock, flags); ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb, gen); + /* The dlm lock convert is being cancelled in background, + * ocfs2_cancel_convert() is asynchronous in fs/dlm, + * requeue it, try again later. + */ + if (ret == -EBUSY) { + ctl->requeue = 1; + mlog(ML_BASTS, "lockres %s, ReQ: Downconvert busy\n", + lockres->l_name); + ret = 0; + msleep(20); + } leave: if (ret) -- 2.30.2
next prev parent reply other threads:[~2021-09-10 0:24 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-10 0:23 [PATCH AUTOSEL 4.9 01/17] clk: rockchip: rk3036: fix up the sclk_sfc parent error Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 02/17] scsi: lpfc: Fix cq_id truncation in rq create Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 03/17] HID: usbhid: free raw_report buffers in usbhid_stop Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 04/17] powerpc: make the install target not depend on any build artifact Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 05/17] jbd2: fix portability problems caused by unaligned accesses Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 06/17] scsi: qla2xxx: Fix NPIV create erroneous error Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 07/17] scsi: target: pscsi: Fix possible null-pointer dereference in pscsi_complete_cmd() Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 08/17] fs: dlm: fix return -EINTR on recovery stopped Sasha Levin 2021-09-10 0:23 ` [Cluster-devel] " Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 09/17] powerpc/32: indirect function call use bctrl rather than blrl in ret_from_kernel_thread Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 10/17] powerpc/booke: Avoid link stack corruption in several places Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 11/17] KVM: PPC: Book3S HV: Initialise vcpu MSR with MSR_ME Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 12/17] RDMA/core/sa_query: Retry SA queries Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 13/17] ext4: if zeroout fails fall back to splitting the extent node Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 14/17] ext4: Make sure quota files are not grabbed accidentally Sasha Levin 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 15/17] checkkconfigsymbols.py: Fix the '--ignore' option Sasha Levin 2021-09-10 0:23 ` [Ocfs2-devel] [PATCH AUTOSEL 4.9 16/17] ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info() Sasha Levin 2021-09-10 0:23 ` Sasha Levin 2021-09-10 0:23 ` Sasha Levin [this message] 2021-09-10 0:23 ` [PATCH AUTOSEL 4.9 17/17] ocfs2: ocfs2_downconvert_lock failure results in deadlock Sasha Levin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210910002338.176677-17-sashal@kernel.org \ --to=sashal@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=ocfs2-devel@oss.oracle.com \ --cc=stable@vger.kernel.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.