ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write
@ 2021-04-02 17:13 Wengang Wang
  2021-04-04 11:25 ` Joseph Qi
  0 siblings, 1 reply; 3+ messages in thread
From: Wengang Wang @ 2021-04-02 17:13 UTC (permalink / raw)
  To: ocfs2-devel; +Cc: GHe

The following deadlock is detected:

truncate -> setattr path is waiting for pending direct IO to be done (
inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
#0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
#1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
#2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
#3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
#4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
#5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
#6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
#7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
#8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
#9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hang at locking inode->i_rwsem.

 #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
 #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
 #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
 #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
 #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
 #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
 #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
 #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
 #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
 #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
#10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
#11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
#12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock. The same deadlock was mentioned in
upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
that that commit just removed cluster lock (the victim of above dead
lock) from the ABBA deadlock party.

End-user visible effects:
Process hang in truncate -> ocfs2_setattr path and other processes hang at
ocfs2_dio_end_io_write path.

This is to fix the deadlock its self. It removes inode_lock() call from dio
completion path to remove the deadlock and add ip_alloc_sem lock in setattr
path to synchronize the inode modifications.

Cc: stable@vger.kernel.org
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
v1 -> v2: remove the "had_alloc_lock" as suggested.
---
 fs/ocfs2/aops.c | 11 +----------
 fs/ocfs2/file.c |  8 ++++++--
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 3bfb4147895a..ad20403b383f 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	struct ocfs2_alloc_context *meta_ac = NULL;
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
-	int ret = 0, credits = 0, locked = 0;
+	int ret = 0, credits = 0;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	    !dwc->dw_orphaned)
 		goto out;
 
-	/* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
-	 * are in that context. */
-	if (dwc->dw_writer_pid != task_pid_nr(current)) {
-		inode_lock(inode);
-		locked = 1;
-	}
-
 	ret = ocfs2_inode_lock(inode, &di_bh, 1);
 	if (ret < 0) {
 		mlog_errno(ret);
@@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	if (meta_ac)
 		ocfs2_free_alloc_context(meta_ac);
 	ocfs2_run_deallocs(osb, &dealloc);
-	if (locked)
-		inode_unlock(inode);
 	ocfs2_dio_free_write_ctx(inode, dwc);
 
 	return ret;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 85979e2214b3..8880071ee4ee 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1244,22 +1244,24 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 				goto bail_unlock;
 			}
 		}
+		down_write(&OCFS2_I(inode)->ip_alloc_sem);
 		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
 					   2 * ocfs2_quota_trans_credits(sb));
 		if (IS_ERR(handle)) {
 			status = PTR_ERR(handle);
 			mlog_errno(status);
-			goto bail_unlock;
+			goto bail_unlock_alloc;
 		}
 		status = __dquot_transfer(inode, transfer_to);
 		if (status < 0)
 			goto bail_commit;
 	} else {
+		down_write(&OCFS2_I(inode)->ip_alloc_sem);
 		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
 		if (IS_ERR(handle)) {
 			status = PTR_ERR(handle);
 			mlog_errno(status);
-			goto bail_unlock;
+			goto bail_unlock_alloc;
 		}
 	}
 
@@ -1272,6 +1274,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 
 bail_commit:
 	ocfs2_commit_trans(osb, handle);
+bail_unlock_alloc:
+	up_write(&OCFS2_I(inode)->ip_alloc_sem);
 bail_unlock:
 	if (status && inode_locked) {
 		ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
-- 
2.21.0 (Apple Git-122.2)


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write
  2021-04-02 17:13 [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write Wengang Wang
@ 2021-04-04 11:25 ` Joseph Qi
  2021-04-05 16:42   ` Wengang Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi @ 2021-04-04 11:25 UTC (permalink / raw)
  To: Wengang Wang, ocfs2-devel; +Cc: GHe



On 4/3/21 1:13 AM, Wengang Wang wrote:
> The following deadlock is detected:
> 
> truncate -> setattr path is waiting for pending direct IO to be done (
> inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
> 
> PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
> #0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
> #1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
> #2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
> #3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
> #4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
> #5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
> #6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
> #7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
> #8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
> #9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
> 
> dio completion path is going to complete one direct IO (decrement
> inode->i_dio_count), but before that it hang at locking inode->i_rwsem.
> 
>  #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
>  #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
>  #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
>  #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
>  #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
>  #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
>  #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
>  #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
>  #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
>  #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
> #10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
> #11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
> #12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e
> 
> Thus above forms ABBA deadlock. The same deadlock was mentioned in
> upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
> that that commit just removed cluster lock (the victim of above dead
> lock) from the ABBA deadlock party.
> 
> End-user visible effects:
> Process hang in truncate -> ocfs2_setattr path and other processes hang at
> ocfs2_dio_end_io_write path.
> 
> This is to fix the deadlock its self. It removes inode_lock() call from dio
> completion path to remove the deadlock and add ip_alloc_sem lock in setattr
> path to synchronize the inode modifications.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> v1 -> v2: remove the "had_alloc_lock" as suggested.
> ---
>  fs/ocfs2/aops.c | 11 +----------
>  fs/ocfs2/file.c |  8 ++++++--
>  2 files changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 3bfb4147895a..ad20403b383f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	struct ocfs2_alloc_context *meta_ac = NULL;
>  	handle_t *handle = NULL;
>  	loff_t end = offset + bytes;
> -	int ret = 0, credits = 0, locked = 0;
> +	int ret = 0, credits = 0;
>  
>  	ocfs2_init_dealloc_ctxt(&dealloc);
>  
> @@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	    !dwc->dw_orphaned)
>  		goto out;
>  
> -	/* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
> -	 * are in that context. */
> -	if (dwc->dw_writer_pid != task_pid_nr(current)) {
> -		inode_lock(inode);
> -		locked = 1;
> -	}
> -
>  	ret = ocfs2_inode_lock(inode, &di_bh, 1);
>  	if (ret < 0) {
>  		mlog_errno(ret);
> @@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	if (meta_ac)
>  		ocfs2_free_alloc_context(meta_ac);
>  	ocfs2_run_deallocs(osb, &dealloc);
> -	if (locked)
> -		inode_unlock(inode);
>  	ocfs2_dio_free_write_ctx(inode, dwc);
>  
>  	return ret;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 85979e2214b3..8880071ee4ee 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1244,22 +1244,24 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  				goto bail_unlock;
>  			}
>  		}
> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
>  		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
>  					   2 * ocfs2_quota_trans_credits(sb));
>  		if (IS_ERR(handle)) {
>  			status = PTR_ERR(handle);
>  			mlog_errno(status);
> -			goto bail_unlock;
> +			goto bail_unlock_alloc;
>  		}
>  		status = __dquot_transfer(inode, transfer_to);
>  		if (status < 0)
>  			goto bail_commit;
>  	} else {
> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);

Why not do down_write() just before if clause?
Anyway, it also looks fine to me. So with or without that change,

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write
  2021-04-04 11:25 ` Joseph Qi
@ 2021-04-05 16:42   ` Wengang Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Wengang Wang @ 2021-04-05 16:42 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel ML, GHe



> On Apr 4, 2021, at 4:25 AM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> 
> 
> 
> On 4/3/21 1:13 AM, Wengang Wang wrote:
>> The following deadlock is detected:
>> 
>> truncate -> setattr path is waiting for pending direct IO to be done (
>> inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
>> 
>> PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
>> #0 [ffffc9000bcf3c08] __schedule at ffffffff818667cc
>> #1 [ffffc9000bcf3ca0] schedule at ffffffff81866de6
>> #2 [ffffc9000bcf3cb8] inode_dio_wait at ffffffff812a2d04
>> #3 [ffffc9000bcf3d28] ocfs2_setattr at ffffffffc05f322e [ocfs2]
>> #4 [ffffc9000bcf3e18] notify_change at ffffffff812a5a09
>> #5 [ffffc9000bcf3e60] do_truncate at ffffffff812808f5
>> #6 [ffffc9000bcf3ed8] do_sys_ftruncate.constprop.18 at ffffffff81280cf2
>> #7 [ffffc9000bcf3f18] sys_ftruncate at ffffffff81280d8e
>> #8 [ffffc9000bcf3f28] do_syscall_64 at ffffffff81003949
>> #9 [ffffc9000bcf3f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
>> 
>> dio completion path is going to complete one direct IO (decrement
>> inode->i_dio_count), but before that it hang at locking inode->i_rwsem.
>> 
>> #0 [ffffc90009b47b40] __schedule+700 at ffffffff818667cc
>> #1 [ffffc90009b47bd8] schedule+54 at ffffffff81866de6
>> #2 [ffffc90009b47bf0] rwsem_down_write_failed+536 at ffffffff8186aa28
>> #3 [ffffc90009b47c88] call_rwsem_down_write_failed+23 at ffffffff8185a1b7
>> #4 [ffffc90009b47cd0] down_write+45 at ffffffff81869c9d
>> #5 [ffffc90009b47ce8] ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
>> #6 [ffffc90009b47dd8] ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
>> #7 [ffffc90009b47e18] dio_complete+140 at ffffffff812c873c
>> #8 [ffffc90009b47e50] dio_aio_complete_work+25 at ffffffff812c89f9
>> #9 [ffffc90009b47e60] process_one_work+361 at ffffffff810b1889
>> #10 [ffffc90009b47ea8] worker_thread+77 at ffffffff810b233d
>> #11 [ffffc90009b47f08] kthread+261 at ffffffff810b7fd5
>> #12 [ffffc90009b47f50] ret_from_fork+62 at ffffffff81a0035e
>> 
>> Thus above forms ABBA deadlock. The same deadlock was mentioned in
>> upstream commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300. well, it seems
>> that that commit just removed cluster lock (the victim of above dead
>> lock) from the ABBA deadlock party.
>> 
>> End-user visible effects:
>> Process hang in truncate -> ocfs2_setattr path and other processes hang at
>> ocfs2_dio_end_io_write path.
>> 
>> This is to fix the deadlock its self. It removes inode_lock() call from dio
>> completion path to remove the deadlock and add ip_alloc_sem lock in setattr
>> path to synchronize the inode modifications.
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>> v1 -> v2: remove the "had_alloc_lock" as suggested.
>> ---
>> fs/ocfs2/aops.c | 11 +----------
>> fs/ocfs2/file.c |  8 ++++++--
>> 2 files changed, 7 insertions(+), 12 deletions(-)
>> 
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index 3bfb4147895a..ad20403b383f 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2295,7 +2295,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	struct ocfs2_alloc_context *meta_ac = NULL;
>> 	handle_t *handle = NULL;
>> 	loff_t end = offset + bytes;
>> -	int ret = 0, credits = 0, locked = 0;
>> +	int ret = 0, credits = 0;
>> 
>> 	ocfs2_init_dealloc_ctxt(&dealloc);
>> 
>> @@ -2306,13 +2306,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	    !dwc->dw_orphaned)
>> 		goto out;
>> 
>> -	/* ocfs2_file_write_iter will get i_mutex, so we need not lock if we
>> -	 * are in that context. */
>> -	if (dwc->dw_writer_pid != task_pid_nr(current)) {
>> -		inode_lock(inode);
>> -		locked = 1;
>> -	}
>> -
>> 	ret = ocfs2_inode_lock(inode, &di_bh, 1);
>> 	if (ret < 0) {
>> 		mlog_errno(ret);
>> @@ -2393,8 +2386,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>> 	if (meta_ac)
>> 		ocfs2_free_alloc_context(meta_ac);
>> 	ocfs2_run_deallocs(osb, &dealloc);
>> -	if (locked)
>> -		inode_unlock(inode);
>> 	ocfs2_dio_free_write_ctx(inode, dwc);
>> 
>> 	return ret;
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index 85979e2214b3..8880071ee4ee 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -1244,22 +1244,24 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>> 				goto bail_unlock;
>> 			}
>> 		}
>> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
>> 		handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS +
>> 					   2 * ocfs2_quota_trans_credits(sb));
>> 		if (IS_ERR(handle)) {
>> 			status = PTR_ERR(handle);
>> 			mlog_errno(status);
>> -			goto bail_unlock;
>> +			goto bail_unlock_alloc;
>> 		}
>> 		status = __dquot_transfer(inode, transfer_to);
>> 		if (status < 0)
>> 			goto bail_commit;
>> 	} else {
>> +		down_write(&OCFS2_I(inode)->ip_alloc_sem);
> 
> Why not do down_write() just before if clause?

Yes, down_write() just before if clause works too. I just wanted lock as less/short as possible.

> Anyway, it also looks fine to me. So with or without that change,
> 
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

Thanks Joseph for review!

Wengang
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-05 16:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-02 17:13 [Ocfs2-devel] [PATCH V2] ocfs2: fix deadlock between setattr and dio_end_io_write Wengang Wang
2021-04-04 11:25 ` Joseph Qi
2021-04-05 16:42   ` Wengang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).