All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix the deadlock between the transaction attach and commit
@ 2013-02-08  6:55 Miao Xie
  2013-02-11 20:35 ` Josef Bacik
  2013-02-12 14:39 ` Josef Bacik
  0 siblings, 2 replies; 5+ messages in thread
From: Miao Xie @ 2013-02-08  6:55 UTC (permalink / raw)
  To: Linux Btrfs

Here is the whole story:
	Trans_Attach_Task		Trans_Commit_Task
					btrfs_commit_transaction()
					 |->wait writers to be 1
	btrfs_attach_transaction()	 |
	btrfs_commit_transaction()	 |
	 |				 |->set trans_no_join to 1
	 |				 |  (close join transaction)
	 |->btrfs_run_ordered_operations |
	    (Those ordered operations	 |
	     are added when releasing	 |
	     file)			 |
	     |->btrfs_join_transaction() |
		|->wait_commit()	 |
					 |->wait writers to be 1

Then these two tasks waited for each other.

As we know, btrfs_attach_transaction() is used to catch the current
transaction, and commit it, so if someone has committed the transaction,
it is unnecessary to join it and commit it, wait is the best choice
for it. In this way, we can fix the above problem.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/transaction.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f154946..7be9d5e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -285,6 +285,14 @@ static int may_wait_transaction(struct btrfs_root *root, int type)
 	if (type == TRANS_USERSPACE)
 		return 1;
 
+	/*
+	 * If we are ATTACH, it means we just want to catch the current
+	 * transaction and commit it. So if someone is committing the
+	 * current transaction now, it is very glad to wait it.
+	 */
+	if (type == TRANS_ATTACH)
+		return 1;
+
 	if (type == TRANS_START &&
 	    !atomic_read(&root->fs_info->open_ioctl_trans))
 		return 1;
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix the deadlock between the transaction attach and commit
  2013-02-08  6:55 [PATCH] Btrfs: fix the deadlock between the transaction attach and commit Miao Xie
@ 2013-02-11 20:35 ` Josef Bacik
  2013-02-12 12:56   ` David Sterba
  2013-02-12 14:39 ` Josef Bacik
  1 sibling, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2013-02-11 20:35 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

On Thu, Feb 07, 2013 at 11:55:51PM -0700, Miao Xie wrote:
> Here is the whole story:
> 	Trans_Attach_Task		Trans_Commit_Task
> 					btrfs_commit_transaction()
> 					 |->wait writers to be 1
> 	btrfs_attach_transaction()	 |
> 	btrfs_commit_transaction()	 |
> 	 |				 |->set trans_no_join to 1
> 	 |				 |  (close join transaction)
> 	 |->btrfs_run_ordered_operations |
> 	    (Those ordered operations	 |
> 	     are added when releasing	 |
> 	     file)			 |
> 	     |->btrfs_join_transaction() |
> 		|->wait_commit()	 |
> 					 |->wait writers to be 1
> 
> Then these two tasks waited for each other.
> 
> As we know, btrfs_attach_transaction() is used to catch the current
> transaction, and commit it, so if someone has committed the transaction,
> it is unnecessary to join it and commit it, wait is the best choice
> for it. In this way, we can fix the above problem.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

This caused another problem

[ 8050.503904] btrfs-transacti D 0000000000000000     0  5546      2 0x00000080
[ 8050.503913]  ffff88037bfb9d18 0000000000000046 ffff88037bfb9cb8 ffffffff810c6d4d
[ 8050.503924]  ffff88037c4d8000 ffff88037bfb9fd8 ffff88037bfb9fd8 ffff88037bfb9fd8
[ 8050.503933]  ffff88042f17a000 ffff88037c4d8000 ffff88042c33b000 ffff88037ba0bdb8
[ 8050.503943] Call Trace:
[ 8050.503953]  [<ffffffff810c6d4d>] ? trace_hardirqs_on+0xd/0x10
[ 8050.503962]  [<ffffffff816507c9>] schedule+0x29/0x70
[ 8050.504002]  [<ffffffffa084eb75>] wait_current_trans+0xb5/0x110 [btrfs]
[ 8050.504011]  [<ffffffff810891f0>] ? __init_waitqueue_head+0x60/0x60
[ 8050.504047]  [<ffffffffa08503c0>] start_transaction+0x160/0x4e0 [btrfs]
[ 8050.504082]  [<ffffffffa0850757>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[ 8050.504114]  [<ffffffffa084857a>] transaction_kthread+0x15a/0x240 [btrfs]
[ 8050.504147]  [<ffffffffa0848420>] ? btrfs_destroy_delayed_refs+0x330/0x330 [btrfs]
[ 8050.504155]  [<ffffffff8108883a>] kthread+0xea/0xf0
[ 8050.504166]  [<ffffffff81088750>] ? flush_kthread_worker+0x150/0x150
[ 8050.504175]  [<ffffffff8165a06c>] ret_from_fork+0x7c/0xb0
[ 8050.504183]  [<ffffffff81088750>] ? flush_kthread_worker+0x150/0x150
[ 8050.504189] sync            D 0000000000000000     0  5572   5342 0x00000080
[ 8050.504198]  ffff88037c235dd8 0000000000000046 ffff88037c235d78 ffffffff810c6d4d
[ 8050.504207]  ffff88037ca8a000 ffff88037c235fd8 ffff88037c235fd8 ffff88037c235fd8
[ 8050.504217]  ffff88042f184000 ffff88037ca8a000 ffff88042c33b000 ffff88037ba0bdb8
[ 8050.504227] Call Trace:
[ 8050.504236]  [<ffffffff810c6d4d>] ? trace_hardirqs_on+0xd/0x10
[ 8050.504245]  [<ffffffff816507c9>] schedule+0x29/0x70
[ 8050.504278]  [<ffffffffa084eb75>] wait_current_trans+0xb5/0x110 [btrfs]
[ 8050.504287]  [<ffffffff810891f0>] ? __init_waitqueue_head+0x60/0x60
[ 8050.504322]  [<ffffffffa08503c0>] start_transaction+0x160/0x4e0 [btrfs]
[ 8050.504360]  [<ffffffffa0866d94>] ? btrfs_wait_ordered_extents+0x174/0x230 [btrfs]
[ 8050.504395]  [<ffffffffa0850757>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[ 8050.504420]  [<ffffffffa0820133>] btrfs_sync_fs+0x53/0x130 [btrfs]
[ 8050.504430]  [<ffffffff811cac30>] ? __sync_filesystem+0x60/0x60
[ 8050.504438]  [<ffffffff811cac30>] ? __sync_filesystem+0x60/0x60
[ 8050.504447]  [<ffffffff811cac50>] sync_fs_one_sb+0x20/0x30
[ 8050.504455]  [<ffffffff8119e0c1>] iterate_supers+0xf1/0x100
[ 8050.504463]  [<ffffffff811cad25>] sys_sync+0x55/0x90
[ 8050.504472]  [<ffffffff8165a119>] system_call_fastpath+0x16/0x1b

So we're getting stuck in the

if (may_wait_transaction())
	wait_current_trans();

thing.  If we set blocked in __btrfs_end_transaction we'll just sit there
forever because nobody can actually commit the transaction.  Probably need to
change this to

if (type == TRANS_ATTACH && trans->in_commit)

or something like that.  Me and kdave reproduced by running 274 in a loop, it
happpened pretty quick.  I'd fix it myself but I have to leave my house for
people to come look at it.  If you haven't fixed this by tomorrow I'll fix it
up.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix the deadlock between the transaction attach and commit
  2013-02-11 20:35 ` Josef Bacik
@ 2013-02-12 12:56   ` David Sterba
  2013-02-18 10:36     ` Miao Xie
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2013-02-12 12:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Miao Xie, Linux Btrfs

On Mon, Feb 11, 2013 at 03:35:37PM -0500, Josef Bacik wrote:
> or something like that.  Me and kdave reproduced by running 274 in a loop, it
> happpened pretty quick.  I'd fix it myself but I have to leave my house for
> people to come look at it.  If you haven't fixed this by tomorrow I'll fix it
> up.  Thanks,

I found 224 stuck with this

[ 3840.176147] INFO: task btrfs-transacti:3692 blocked for more than 480 seconds.
[ 3840.176157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3840.176162] btrfs-transacti D 0000000000000000     0  3692      2 0x00000000
[ 3840.176172]  ffff880103a3bd98 0000000000000046 0000000000000000 ffff880103a3a010
[ 3840.176183]  ffff880103a3a000 ffff880103a3a010 ffff880103a3a000 ffff880103a3a000
[ 3840.176190]  ffff880103a3a000 ffff880103a3bfd8 ffff880102814800 ffffffff81a14420
[ 3840.176200] Call Trace:
[ 3840.176218]  [<ffffffff8108a0c9>] ? enqueue_entity+0x229/0xa40
[ 3840.176230]  [<ffffffff815efc14>] schedule+0x24/0x70
[ 3840.176296]  [<ffffffffa00cd0f5>] wait_current_trans+0xc5/0x110 [btrfs]
[ 3840.176306]  [<ffffffff8106d890>] ? wake_up_bit+0x40/0x40
[ 3840.176361]  [<ffffffffa00cfa48>] start_transaction+0x228/0x4c0 [btrfs]
[ 3840.176400]  [<ffffffffa00cfcf2>] btrfs_attach_transaction+0x12/0x20 [btrfs]
[ 3840.176434]  [<ffffffffa00c95c7>] transaction_kthread+0x167/0x220 [btrfs]
[ 3840.176476]  [<ffffffffa00c9460>] ? __btree_submit_bio_start+0x10/0x10 [btrfs]
[ 3840.176494]  [<ffffffff8106d086>] kthread+0xc6/0xd0
[ 3840.176506]  [<ffffffff8106cfc0>] ? kthread_freezable_should_stop+0x70/0x70
[ 3840.176514]  [<ffffffff815f82bc>] ret_from_fork+0x7c/0xb0
[ 3840.176522]  [<ffffffff8106cfc0>] ? kthread_freezable_should_stop+0x70/0x70
[ 3840.176528] INFO: task rm:15763 blocked for more than 480 seconds.
[ 3840.176531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3840.176534] rm              D 0000000000000002     0 15763   3517 0x00000000
[ 3840.176544]  ffff8801071b7c88 0000000000000086 00000002000280da ffff8801071b6010
[ 3840.176551]  ffff8801071b6000 ffff8801071b6010 ffff8801071b6000 ffff8801071b6000
[ 3840.176558]  ffff8801071b6000 ffff8801071b7fd8 ffff8800be5762c0 ffff88012af063c0
[ 3840.176567] Call Trace:
[ 3840.176578]  [<ffffffff8111f942>] ? __alloc_pages_nodemask+0x162/0x250
[ 3840.176608]  [<ffffffffa00be468>] ? reserve_metadata_bytes+0x378/0x490 [btrfs]
[ 3840.176622]  [<ffffffff815efc14>] schedule+0x24/0x70
[ 3840.176664]  [<ffffffffa00cd0f5>] wait_current_trans+0xc5/0x110 [btrfs]
[ 3840.176671]  [<ffffffff8106d890>] ? wake_up_bit+0x40/0x40
[ 3840.176721]  [<ffffffffa00cfa48>] start_transaction+0x228/0x4c0 [btrfs]
[ 3840.176769]  [<ffffffffa00cffd3>] btrfs_start_transaction+0x13/0x20 [btrfs]
[ 3840.176812]  [<ffffffffa00d192e>] __unlink_start_trans+0x7e/0x490 [btrfs]
[ 3840.176832]  [<ffffffff811891d7>] ? __d_lookup+0xa7/0x170
[ 3840.176839]  [<ffffffff811432f1>] ? handle_pte_fault+0x91/0x200
[ 3840.176849]  [<ffffffff81291fc9>] ? security_inode_permission+0x19/0x20
[ 3840.176893]  [<ffffffffa00daca6>] btrfs_unlink+0x36/0xd0 [btrfs]
[ 3840.176899]  [<ffffffff8117f7f4>] vfs_unlink+0xb4/0x110
[ 3840.176905]  [<ffffffff81183f83>] do_unlinkat+0x233/0x240
[ 3840.176912]  [<ffffffff811841ad>] sys_unlinkat+0x1d/0x40
[ 3840.176918]  [<ffffffff815f8369>] system_call_fastpath+0x16/0x1b

mounted with noatime,space_cache

david

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix the deadlock between the transaction attach and commit
  2013-02-08  6:55 [PATCH] Btrfs: fix the deadlock between the transaction attach and commit Miao Xie
  2013-02-11 20:35 ` Josef Bacik
@ 2013-02-12 14:39 ` Josef Bacik
  1 sibling, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2013-02-12 14:39 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

On Thu, Feb 07, 2013 at 11:55:51PM -0700, Miao Xie wrote:
> Here is the whole story:
> 	Trans_Attach_Task		Trans_Commit_Task
> 					btrfs_commit_transaction()
> 					 |->wait writers to be 1
> 	btrfs_attach_transaction()	 |
> 	btrfs_commit_transaction()	 |
> 	 |				 |->set trans_no_join to 1
> 	 |				 |  (close join transaction)
> 	 |->btrfs_run_ordered_operations |
> 	    (Those ordered operations	 |
> 	     are added when releasing	 |
> 	     file)			 |
> 	     |->btrfs_join_transaction() |
> 		|->wait_commit()	 |
> 					 |->wait writers to be 1

I'm just dropping this patch, the way you describe this deadlock can't happen
since the second btrfs_join_transaction() would see theres already a
transaction in current->journal_info and use that and not do wait_commit().  If
you observed a deadlock like this then look at it again, there is something else
going wrong.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix the deadlock between the transaction attach and commit
  2013-02-12 12:56   ` David Sterba
@ 2013-02-18 10:36     ` Miao Xie
  0 siblings, 0 replies; 5+ messages in thread
From: Miao Xie @ 2013-02-18 10:36 UTC (permalink / raw)
  To: dsterba; +Cc: Josef Bacik, Linux Btrfs

(Sorry for the late reply, I was on my vacation of the Spring Festival last week.)

On Tue, 12 Feb 2013 13:56:32 +0100, David Sterba wrote:
> On Mon, Feb 11, 2013 at 03:35:37PM -0500, Josef Bacik wrote:
>> or something like that.  Me and kdave reproduced by running 274 in a loop, it
>> happpened pretty quick.  I'd fix it myself but I have to leave my house for
>> people to come look at it.  If you haven't fixed this by tomorrow I'll fix it
>> up.  Thanks,
> 
> I found 224 stuck with this
> 
[SNIP]
> 
> mounted with noatime,space_cache

Thanks for your test. My test skipped the 274th case because it always fails,
and all the other cases passed, so I didn't hit this problem.

Anyways, very sorry for my stupid patch.
(I have reviewed Josef's fix patch, and commented on it, please see the reply
of that patch)

Thanks
Miao

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-02-18 10:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-08  6:55 [PATCH] Btrfs: fix the deadlock between the transaction attach and commit Miao Xie
2013-02-11 20:35 ` Josef Bacik
2013-02-12 12:56   ` David Sterba
2013-02-18 10:36     ` Miao Xie
2013-02-12 14:39 ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.