ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Wengang Wang <wen.gang.wang@oracle.com>, ocfs2-devel@oss.oracle.com
Cc: GHe@suse.com
Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock between setattr and dio_end_io_write
Date: Thu, 1 Apr 2021 11:11:09 +0800	[thread overview]
Message-ID: <e557eb35-e0ee-2043-efd0-60b3168bfe35@linux.alibaba.com> (raw)
In-Reply-To: <20210331203654.3911-1-wen.gang.wang@oracle.com>



On 4/1/21 4:36 AM, Wengang Wang wrote:
> The following deadlock is detected:
> 
> truncate -> setattr path is waiting for pending direct IO to be done (
> inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
> 
> PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
> #0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
> #1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
> #2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
> #3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
> #4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
> #5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
> #6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
> #7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
> #8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
> #9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
> 
> dio completion path is going to complete one direct IO (decrement
> inode->i_dio_count), but before that it hang at locking inode->i_rwsem.
> 
>  #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
>  #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
>  #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
>  #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
>  #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
>  #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
>  #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
>  #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
>  #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
>  #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
> #10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
> #11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
> #12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e
> 
> Thus above forms ABBA deadlock. The same deadlock was mentioned in
> upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
> that that commit just removed cluster lock (the victim of above dead
> lock) from the ABBA deadlock party.
> 
> End-user visible effects:
> Process hang in truncate -> ocfs2_setattr path and other processes hang at
> ocfs2_dio_end_io_write path.
> 
> This is to fix the deadlock its self. It removes inode_lock() call from dio
> completion path to remove the deadlock and add ip_alloc_sem lock in setattr
> path to synchronize the inode modifications.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/aops.c | 11 +----------
>  fs/ocfs2/file.c | 11 ++++++++++-
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 3bfb4147895a..ad20403b383f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	struct ocfs2_alloc_context *meta_ac = NULL;
>  	handle_t *handle = NULL;
>  	loff_t end = offset + bytes;
> -	int ret = 0, credits = 0, locked = 0;
> +	int ret = 0, credits = 0;
>  
>  	ocfs2_init_dealloc_ctxt(&dealloc);
>  
> @@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	    !dwc->dw_orphaned)
>  		goto out;
>  
> -	/* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
> -	 * are in that context. */
> -	if (dwc->dw_writer_pid != task_pid_nr(current)) {
> -		inode_lock(inode);
> -		locked = 1;
> -	}
> -
>  	ret = ocfs2_inode_lock(inode, &di_bh, 1);
>  	if (ret < 0) {
>  		mlog_errno(ret);
> @@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	if (meta_ac)
>  		ocfs2_free_alloc_context(meta_ac);
>  	ocfs2_run_deallocs(osb, &dealloc);
> -	if (locked)
> -		inode_unlock(inode);
>  	ocfs2_dio_free_write_ctx(inode, dwc);
>  
>  	return ret;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 85979e2214b3..9feaa0d2425a 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1123,7 +1123,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  	handle_t *handle = NULL;
>  	struct dquot *transfer_to[MAXQUOTAS] = { };
>  	int qtype;
> -	int had_lock;
> +	int had_lock, had_alloc_lock = 0;
>  	struct ocfs2_lock_holder oh;
>  
>  	trace_ocfs2_setattr(inode, dentry,
> @@ -1244,6 +1244,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  				goto bail_unlock;
>  			}
>  		}
> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
> +		had_alloc_lock = 1;
> +
>  		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
>  					   2 * ocfs2_quota_trans_credits(sb));
>  		if (IS_ERR(handle)) {
> @@ -1255,6 +1258,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  		if (status < 0)
>  			goto bail_commit;
>  	} else {
> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
> +		had_alloc_lock = 1;
> +

It duplicates the code both in 'if' and 'else', so move it out please.
Also, I don't like one more flag here to indicate whether we take
ip_alloc_sem or not.

How about do it in the following way?

...
down_write(&OCFS2_I(inode)->ip_alloc_sem);
if () {
	...
	if (err)
		goto bail_alloc_sem;
	...
} else {
	...
}

...

bail_alloc_sem:
	up_write(&OCFS2_I(inode)->ip_alloc_sem);

Thanks,
Joseph

>  		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
>  		if (IS_ERR(handle)) {
>  			status = PTR_ERR(handle);
> @@ -1273,6 +1279,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  bail_commit:
>  	ocfs2_commit_trans(osb, handle);
>  bail_unlock:
> +	if (had_alloc_lock)
> +		up_write(&OCFS2_I(inode)->ip_alloc_sem);
> +
>  	if (status && inode_locked) {
>  		ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
>  		inode_locked = 0;
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2021-04-01  3:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 20:36 [Ocfs2-devel] [PATCH] ocfs2: fix deadlock between setattr and dio_end_io_write Wengang Wang
2021-04-01  3:11 ` Joseph Qi [this message]
2021-04-01 16:27   ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e557eb35-e0ee-2043-efd0-60b3168bfe35@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=GHe@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).