ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Wengang Wang <wen.gang.wang@oracle.com>
To: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: ocfs2-devel ML <ocfs2-devel@oss.oracle.com>,
	"GHe@suse.com" <GHe@suse.com>
Subject: Re: [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write
Date: Mon, 5 Apr 2021 16:42:06 +0000	[thread overview]
Message-ID: <91ECDE2E-8A1B-4A91-B425-E8157081C083@oracle.com> (raw)
In-Reply-To: <994870d6-5b0e-3a22-6180-9beb501a59b0@linux.alibaba.com>



> On Apr 4, 2021, at 4:25 AM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> 
> 
> 
> On 4/3/21 1:13 AM, Wengang Wang wrote:
>> The following deadlock is detected:
>> 
>> truncate -> setattr path is waiting for pending direct IO to be done (
>> inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
>> 
>> PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
>> #0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
>> #1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
>> #2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
>> #3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
>> #4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
>> #5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
>> #6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
>> #7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
>> #8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
>> #9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
>> 
>> dio completion path is going to complete one direct IO (decrement
>> inode->i_dio_count), but before that it hang at locking inode->i_rwsem.
>> 
>> #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
>> #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
>> #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
>> #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
>> #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
>> #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
>> #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
>> #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
>> #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
>> #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
>> #10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
>> #11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
>> #12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e
>> 
>> Thus above forms ABBA deadlock. The same deadlock was mentioned in
>> upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
>> that that commit just removed cluster lock (the victim of above dead
>> lock) from the ABBA deadlock party.
>> 
>> End-user visible effects:
>> Process hang in truncate -> ocfs2_setattr path and other processes hang at
>> ocfs2_dio_end_io_write path.
>> 
>> This is to fix the deadlock its self. It removes inode_lock() call from dio
>> completion path to remove the deadlock and add ip_alloc_sem lock in setattr
>> path to synchronize the inode modifications.
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>> v1 -> v2: remove the "had_alloc_lock" as suggested.
>> ---
>> fs/ocfs2/aops.c | 11 +----------
>> fs/ocfs2/file.c |  8 ++++++--
>> 2 files changed, 7 insertions(+), 12 deletions(-)
>> 
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index 3bfb4147895a..ad20403b383f 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	struct ocfs2_alloc_context *meta_ac = NULL;
>> 	handle_t *handle = NULL;
>> 	loff_t end = offset + bytes;
>> -	int ret = 0, credits = 0, locked = 0;
>> +	int ret = 0, credits = 0;
>> 
>> 	ocfs2_init_dealloc_ctxt(&dealloc);
>> 
>> @@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	    !dwc->dw_orphaned)
>> 		goto out;
>> 
>> -	/* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
>> -	 * are in that context. */
>> -	if (dwc->dw_writer_pid != task_pid_nr(current)) {
>> -		inode_lock(inode);
>> -		locked = 1;
>> -	}
>> -
>> 	ret = ocfs2_inode_lock(inode, &di_bh, 1);
>> 	if (ret < 0) {
>> 		mlog_errno(ret);
>> @@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	if (meta_ac)
>> 		ocfs2_free_alloc_context(meta_ac);
>> 	ocfs2_run_deallocs(osb, &dealloc);
>> -	if (locked)
>> -		inode_unlock(inode);
>> 	ocfs2_dio_free_write_ctx(inode, dwc);
>> 
>> 	return ret;
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index 85979e2214b3..8880071ee4ee 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -1244,22 +1244,24 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>> 				goto bail_unlock;
>> 			}
>> 		}
>> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
>> 		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
>> 					   2 * ocfs2_quota_trans_credits(sb));
>> 		if (IS_ERR(handle)) {
>> 			status = PTR_ERR(handle);
>> 			mlog_errno(status);
>> -			goto bail_unlock;
>> +			goto bail_unlock_alloc;
>> 		}
>> 		status = __dquot_transfer(inode, transfer_to);
>> 		if (status < 0)
>> 			goto bail_commit;
>> 	} else {
>> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
> 
> Why not do down_write() just before if clause?

Yes, down_write() just before if clause works too. I just wanted lock as less/short as possible.

> Anyway, it also looks fine to me. So with or without that change,
> 
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

Thanks Joseph for review!

Wengang
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

      reply	other threads:[~2021-04-05 16:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02 17:13 [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write Wengang Wang
2021-04-04 11:25 ` Joseph Qi
2021-04-05 16:42   ` Wengang Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91ECDE2E-8A1B-4A91-B425-E8157081C083@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=GHe@suse.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).