From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Wengang Wang <wen.gang.wang@oracle.com>, ocfs2-devel@oss.oracle.com
Cc: GHe@suse.com
Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock between setattr and dio_end_io_write
Date: Thu, 1 Apr 2021 11:11:09 +0800 [thread overview]
Message-ID: <e557eb35-e0ee-2043-efd0-60b3168bfe35@linux.alibaba.com> (raw)
In-Reply-To: <20210331203654.3911-1-wen.gang.wang@oracle.com>
On 4/1/21 4:36 AM, Wengang Wang wrote:
> The following deadlock is detected:
>
> truncate -> setattr path is waiting for pending direct IO to be done (
> inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
>
> PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9"
> #0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
> #1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
> #2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
> #3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
> #4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
> #5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
> #6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
> #7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
> #8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
> #9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
>
> dio completion path is going to complete one direct IO (decrement
> inode->i_dio_count), but before that it hang at locking inode->i_rwsem.
>
> #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
> #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
> #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
> #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
> #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
> #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
> #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
> #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
> #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
> #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
> #10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
> #11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
> #12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e
>
> Thus above forms ABBA deadlock. The same deadlock was mentioned in
> upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
> that that commit just removed cluster lock (the victim of above dead
> lock) from the ABBA deadlock party.
>
> End-user visible effects:
> Process hang in truncate -> ocfs2_setattr path and other processes hang at
> ocfs2_dio_end_io_write path.
>
> This is to fix the deadlock its self. It removes inode_lock() call from dio
> completion path to remove the deadlock and add ip_alloc_sem lock in setattr
> path to synchronize the inode modifications.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> fs/ocfs2/aops.c | 11 +----------
> fs/ocfs2/file.c | 11 ++++++++++-
> 2 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 3bfb4147895a..ad20403b383f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_alloc_context *meta_ac = NULL;
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> - int ret = 0, credits = 0, locked = 0;
> + int ret = 0, credits = 0;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> !dwc->dw_orphaned)
> goto out;
>
> - /* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
> - * are in that context. */
> - if (dwc->dw_writer_pid != task_pid_nr(current)) {
> - inode_lock(inode);
> - locked = 1;
> - }
> -
> ret = ocfs2_inode_lock(inode, &di_bh, 1);
> if (ret < 0) {
> mlog_errno(ret);
> @@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> if (meta_ac)
> ocfs2_free_alloc_context(meta_ac);
> ocfs2_run_deallocs(osb, &dealloc);
> - if (locked)
> - inode_unlock(inode);
> ocfs2_dio_free_write_ctx(inode, dwc);
>
> return ret;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 85979e2214b3..9feaa0d2425a 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1123,7 +1123,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> handle_t *handle = NULL;
> struct dquot *transfer_to[MAXQUOTAS] = { };
> int qtype;
> - int had_lock;
> + int had_lock, had_alloc_lock = 0;
> struct ocfs2_lock_holder oh;
>
> trace_ocfs2_setattr(inode, dentry,
> @@ -1244,6 +1244,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> goto bail_unlock;
> }
> }
> + down_write(&OCFS2_I(inode)->ip_alloc_sem);
> + had_alloc_lock = 1;
> +
> handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
> 2 * ocfs2_quota_trans_credits(sb));
> if (IS_ERR(handle)) {
> @@ -1255,6 +1258,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> if (status < 0)
> goto bail_commit;
> } else {
> + down_write(&OCFS2_I(inode)->ip_alloc_sem);
> + had_alloc_lock = 1;
> +
It duplicates the code both in 'if' and 'else', so move it out please.
Also, I don't like one more flag here to indicate whether we take
ip_alloc_sem or not.
How about do it in the following way?
...
down_write(&OCFS2_I(inode)->ip_alloc_sem);
if () {
...
if (err)
goto bail_alloc_sem;
...
} else {
...
}
...
bail_alloc_sem:
up_write(&OCFS2_I(inode)->ip_alloc_sem);
Thanks,
Joseph
> handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
> if (IS_ERR(handle)) {
> status = PTR_ERR(handle);
> @@ -1273,6 +1279,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> bail_commit:
> ocfs2_commit_trans(osb, handle);
> bail_unlock:
> + if (had_alloc_lock)
> + up_write(&OCFS2_I(inode)->ip_alloc_sem);
> +
> if (status && inode_locked) {
> ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
> inode_locked = 0;
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
next prev parent reply other threads:[~2021-04-01 3:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-31 20:36 [Ocfs2-devel] [PATCH] ocfs2: fix deadlock between setattr and dio_end_io_write Wengang Wang
2021-04-01 3:11 ` Joseph Qi [this message]
2021-04-01 16:27 ` Wengang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e557eb35-e0ee-2043-efd0-60b3168bfe35@linux.alibaba.com \
--to=joseph.qi@linux.alibaba.com \
--cc=GHe@suse.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).