All of lore.kernel.org
 help / color / mirror / Atom feed
* please review snapshot corruption path with delayed metadata insertion
@ 2011-06-17 21:12 Chris Mason
  2011-06-19  4:34 ` Tsutomu Itoh
  2011-06-30  8:03 ` Miao Xie
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2011-06-17 21:12 UTC (permalink / raw)
  To: linux-btrfs, Miao Xie, Tsutomu Itoh

Hi everyone,

I think I tracked down the oops we were seeing Tsutomu Itoh's balance
test.  The delayed metadata insertion code was allowing delayed updates
to queue up and be process after the snapshot was created.

I've fixed this up by moving the delayed metadata run down into the
snapshot creation code, please take a look.  If nobody objects I'll have
this in the pull I send to Linus this weekend.


commit e999376f094162aa425ae749aa1df95ab928d010
Author: Chris Mason <chris.mason@oracle.com>
Date:   Fri Jun 17 16:14:09 2011 -0400

    Btrfs: avoid delayed metadata items during commits
    
    Snapshot creation has two phases.  One is the initial snapshot setup,
    and the second is done during commit, while nobody is allowed to modify
    the root we are snapshotting.
    
    The delayed metadata insertion code can break that rule, it does a
    delayed inode update on the inode of the parent of the snapshot,
    and delayed directory item insertion.
    
    This makes sure to run the pending delayed operations before we
    record the snapshot root, which avoids corruptions.
    
    Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index fc515b7..f1cbd02 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1237,6 +1237,13 @@ again:
 	return 0;
 }
 
+void btrfs_assert_delayed_root_empty(struct btrfs_root *root)
+{
+	struct btrfs_delayed_root *delayed_root;
+	delayed_root = btrfs_get_delayed_root(root);
+	WARN_ON(btrfs_first_delayed_node(delayed_root));
+}
+
 void btrfs_balance_delayed_items(struct btrfs_root *root)
 {
 	struct btrfs_delayed_root *delayed_root;
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index cb79b67..d1a6a29 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
 /* for init */
 int __init btrfs_delayed_inode_init(void);
 void btrfs_delayed_inode_exit(void);
+
+/* for debugging */
+void btrfs_assert_delayed_root_empty(struct btrfs_root *root);
+
 #endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c073d85..51dcec8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_update_inode(trans, parent_root, parent_inode);
 	BUG_ON(ret);
 
+	/*
+	 * pull in the delayed directory update
+	 * and the delayed inode item
+	 * otherwise we corrupt the FS during
+	 * snapshot
+	 */
+	ret = btrfs_run_delayed_items(trans, root);
+	BUG_ON(ret);
+
 	record_root_in_trans(trans, root);
 	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
 	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
@@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
 	int ret;
 
 	list_for_each_entry(pending, head, list) {
-		/*
-		 * We must deal with the delayed items before creating
-		 * snapshots, or we will create a snapthot with inconsistent
-		 * information.
-		*/
-		ret = btrfs_run_delayed_items(trans, fs_info->fs_root);
-		BUG_ON(ret);
-
 		ret = create_pending_snapshot(trans, fs_info, pending);
 		BUG_ON(ret);
 	}
@@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	 */
 	mutex_lock(&root->fs_info->reloc_mutex);
 
-	ret = create_pending_snapshots(trans, root->fs_info);
+	ret = btrfs_run_delayed_items(trans, root);
 	BUG_ON(ret);
 
-	ret = btrfs_run_delayed_items(trans, root);
+	ret = create_pending_snapshots(trans, root->fs_info);
 	BUG_ON(ret);
 
 	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
 	BUG_ON(ret);
 
+	/*
+	 * make sure none of the code above managed to slip in a
+	 * delayed item
+	 */
+	btrfs_assert_delayed_root_empty(root);
+
 	WARN_ON(cur_trans != trans->transaction);
 
 	btrfs_scrub_pause(root);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-17 21:12 please review snapshot corruption path with delayed metadata insertion Chris Mason
@ 2011-06-19  4:34 ` Tsutomu Itoh
  2011-06-19 23:41   ` Tsutomu Itoh
  2011-06-30  8:03 ` Miao Xie
  1 sibling, 1 reply; 16+ messages in thread
From: Tsutomu Itoh @ 2011-06-19  4:34 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Miao Xie

Hi, Chris,

(2011/06/18 6:12), Chris Mason wrote:
> Hi everyone,
>
> I think I tracked down the oops we were seeing Tsutomu Itoh's balance
> test.  The delayed metadata insertion code was allowing delayed updates
> to queue up and be process after the snapshot was created.
>
> I've fixed this up by moving the delayed metadata run down into the
> snapshot creation code, please take a look.  If nobody objects I'll have
> this in the pull I send to Linus this weekend.

I ran my balance test script about 12 hours, any problem didn't occur.

Thanks,
Tsutomu

>
> commit e999376f094162aa425ae749aa1df95ab928d010
> Author: Chris Mason<chris.mason@oracle.com>
> Date:   Fri Jun 17 16:14:09 2011 -0400
>
>      Btrfs: avoid delayed metadata items during commits
>
>      Snapshot creation has two phases.  One is the initial snapshot setup,
>      and the second is done during commit, while nobody is allowed to modify
>      the root we are snapshotting.
>
>      The delayed metadata insertion code can break that rule, it does a
>      delayed inode update on the inode of the parent of the snapshot,
>      and delayed directory item insertion.
>
>      This makes sure to run the pending delayed operations before we
>      record the snapshot root, which avoids corruptions.
>
>      Signed-off-by: Chris Mason<chris.mason@oracle.com>
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index fc515b7..f1cbd02 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1237,6 +1237,13 @@ again:
>   	return 0;
>   }
>
> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root)
> +{
> +	struct btrfs_delayed_root *delayed_root;
> +	delayed_root = btrfs_get_delayed_root(root);
> +	WARN_ON(btrfs_first_delayed_node(delayed_root));
> +}
> +
>   void btrfs_balance_delayed_items(struct btrfs_root *root)
>   {
>   	struct btrfs_delayed_root *delayed_root;
> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
> index cb79b67..d1a6a29 100644
> --- a/fs/btrfs/delayed-inode.h
> +++ b/fs/btrfs/delayed-inode.h
> @@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>   /* for init */
>   int __init btrfs_delayed_inode_init(void);
>   void btrfs_delayed_inode_exit(void);
> +
> +/* for debugging */
> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root);
> +
>   #endif
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c073d85..51dcec8 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>   	ret = btrfs_update_inode(trans, parent_root, parent_inode);
>   	BUG_ON(ret);
>
> +	/*
> +	 * pull in the delayed directory update
> +	 * and the delayed inode item
> +	 * otherwise we corrupt the FS during
> +	 * snapshot
> +	 */
> +	ret = btrfs_run_delayed_items(trans, root);
> +	BUG_ON(ret);
> +
>   	record_root_in_trans(trans, root);
>   	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>   	memcpy(new_root_item,&root->root_item, sizeof(*new_root_item));
> @@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
>   	int ret;
>
>   	list_for_each_entry(pending, head, list) {
> -		/*
> -		 * We must deal with the delayed items before creating
> -		 * snapshots, or we will create a snapthot with inconsistent
> -		 * information.
> -		*/
> -		ret = btrfs_run_delayed_items(trans, fs_info->fs_root);
> -		BUG_ON(ret);
> -
>   		ret = create_pending_snapshot(trans, fs_info, pending);
>   		BUG_ON(ret);
>   	}
> @@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>   	 */
>   	mutex_lock(&root->fs_info->reloc_mutex);
>
> -	ret = create_pending_snapshots(trans, root->fs_info);
> +	ret = btrfs_run_delayed_items(trans, root);
>   	BUG_ON(ret);
>
> -	ret = btrfs_run_delayed_items(trans, root);
> +	ret = create_pending_snapshots(trans, root->fs_info);
>   	BUG_ON(ret);
>
>   	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>   	BUG_ON(ret);
>
> +	/*
> +	 * make sure none of the code above managed to slip in a
> +	 * delayed item
> +	 */
> +	btrfs_assert_delayed_root_empty(root);
> +
>   	WARN_ON(cur_trans != trans->transaction);
>
>   	btrfs_scrub_pause(root);
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-19  4:34 ` Tsutomu Itoh
@ 2011-06-19 23:41   ` Tsutomu Itoh
  2011-06-21  0:24     ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Tsutomu Itoh @ 2011-06-19 23:41 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Miao Xie

(2011/06/19 13:34), Tsutomu Itoh wrote:
> Hi, Chris,
> 
> (2011/06/18 6:12), Chris Mason wrote:
>> Hi everyone,
>>
>> I think I tracked down the oops we were seeing Tsutomu Itoh's balance
>> test.  The delayed metadata insertion code was allowing delayed updates
>> to queue up and be process after the snapshot was created.
>>
>> I've fixed this up by moving the delayed metadata run down into the
>> snapshot creation code, please take a look.  If nobody objects I'll have
>> this in the pull I send to Linus this weekend.
> 

> I ran my balance test script about 12 hours, any problem didn't occur.

But, following panics still occur if inode_cache is specified.

[75164.963860] btrfs: relocating block group 18551406592 flags 18
[75165.565282] btrfs: relocating block group 18282971136 flags 20
[75165.883577] ------------[ cut here ]------------
[75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
[75165.883992] invalid opcode: 0000 [#1] SMP
[75165.884002] CPU 0
[75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
[75165.884002]
[75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
[75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
[75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
[75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
[75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
[75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
[75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
[75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
[75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
[75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
[75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
[75165.884002] Stack:
[75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
[75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
[75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
[75165.884002] Call Trace:
[75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
[75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
[75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
[75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
[75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
[75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
[75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
[75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
[75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
[75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
[75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
[75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
[75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
[75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
[75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
[75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
[75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
[75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
[75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
[75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
[75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
[75165.884002]  RSP <ffff8801026f7a08>

(gdb) l *do_relocation+0x163
0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
2498                            if (ret < 0) {
2499                                    err = ret;
2500                                    break;
2501                            }
2502                            BUG_ON(ret > 0);
2503
2504                            if (!upper->eb) {
2505                                    upper->eb = path->nodes[upper->level];
2506                                    path->nodes[upper->level] = NULL;

 
> Thanks,
> Tsutomu
> 
>>
>> commit e999376f094162aa425ae749aa1df95ab928d010
>> Author: Chris Mason<chris.mason@oracle.com>
>> Date:   Fri Jun 17 16:14:09 2011 -0400
>>
>>      Btrfs: avoid delayed metadata items during commits
>>
>>      Snapshot creation has two phases.  One is the initial snapshot setup,
>>      and the second is done during commit, while nobody is allowed to modify
>>      the root we are snapshotting.
>>
>>      The delayed metadata insertion code can break that rule, it does a
>>      delayed inode update on the inode of the parent of the snapshot,
>>      and delayed directory item insertion.
>>
>>      This makes sure to run the pending delayed operations before we
>>      record the snapshot root, which avoids corruptions.
>>
>>      Signed-off-by: Chris Mason<chris.mason@oracle.com>
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index fc515b7..f1cbd02 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1237,6 +1237,13 @@ again:
>>       return 0;
>>   }
>>
>> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root)
>> +{
>> +    struct btrfs_delayed_root *delayed_root;
>> +    delayed_root = btrfs_get_delayed_root(root);
>> +    WARN_ON(btrfs_first_delayed_node(delayed_root));
>> +}
>> +
>>   void btrfs_balance_delayed_items(struct btrfs_root *root)
>>   {
>>       struct btrfs_delayed_root *delayed_root;
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> index cb79b67..d1a6a29 100644
>> --- a/fs/btrfs/delayed-inode.h
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>>   /* for init */
>>   int __init btrfs_delayed_inode_init(void);
>>   void btrfs_delayed_inode_exit(void);
>> +
>> +/* for debugging */
>> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root);
>> +
>>   #endif
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index c073d85..51dcec8 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>       ret = btrfs_update_inode(trans, parent_root, parent_inode);
>>       BUG_ON(ret);
>>
>> +    /*
>> +     * pull in the delayed directory update
>> +     * and the delayed inode item
>> +     * otherwise we corrupt the FS during
>> +     * snapshot
>> +     */
>> +    ret = btrfs_run_delayed_items(trans, root);
>> +    BUG_ON(ret);
>> +
>>       record_root_in_trans(trans, root);
>>       btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>>       memcpy(new_root_item,&root->root_item, sizeof(*new_root_item));
>> @@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
>>       int ret;
>>
>>       list_for_each_entry(pending, head, list) {
>> -        /*
>> -         * We must deal with the delayed items before creating
>> -         * snapshots, or we will create a snapthot with inconsistent
>> -         * information.
>> -        */
>> -        ret = btrfs_run_delayed_items(trans, fs_info->fs_root);
>> -        BUG_ON(ret);
>> -
>>           ret = create_pending_snapshot(trans, fs_info, pending);
>>           BUG_ON(ret);
>>       }
>> @@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>        */
>>       mutex_lock(&root->fs_info->reloc_mutex);
>>
>> -    ret = create_pending_snapshots(trans, root->fs_info);
>> +    ret = btrfs_run_delayed_items(trans, root);
>>       BUG_ON(ret);
>>
>> -    ret = btrfs_run_delayed_items(trans, root);
>> +    ret = create_pending_snapshots(trans, root->fs_info);
>>       BUG_ON(ret);
>>
>>       ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>>       BUG_ON(ret);
>>
>> +    /*
>> +     * make sure none of the code above managed to slip in a
>> +     * delayed item
>> +     */
>> +    btrfs_assert_delayed_root_empty(root);
>> +
>>       WARN_ON(cur_trans != trans->transaction);
>>
>>       btrfs_scrub_pause(root);
>>
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-19 23:41   ` Tsutomu Itoh
@ 2011-06-21  0:24     ` David Sterba
  2011-06-21  0:40       ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2011-06-21  0:24 UTC (permalink / raw)
  To: Tsutomu Itoh; +Cc: Chris Mason, linux-btrfs, Miao Xie

On Mon, Jun 20, 2011 at 08:41:39AM +0900, Tsutomu Itoh wrote:
> (2011/06/19 13:34), Tsutomu Itoh wrote:
> >> I've fixed this up by moving the delayed metadata run down into the
> >> snapshot creation code, please take a look.  If nobody objects I'll have
> >> this in the pull I send to Linus this weekend.
> > 
> 
> > I ran my balance test script about 12 hours, any problem didn't occur.
> 
> But, following panics still occur if inode_cache is specified.
> 
> [75164.963860] btrfs: relocating block group 18551406592 flags 18
> [75165.565282] btrfs: relocating block group 18282971136 flags 20
> [75165.883577] ------------[ cut here ]------------
> [75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
> [75165.883992] invalid opcode: 0000 [#1] SMP
> [75165.884002] CPU 0
> [75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
> [75165.884002]
> [75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
> [75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
> [75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
> [75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
> [75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
> [75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
> [75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
> [75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
> [75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> [75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
> [75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
> [75165.884002] Stack:
> [75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
> [75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
> [75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
> [75165.884002] Call Trace:
> [75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
> [75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
> [75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
> [75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
> [75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
> [75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
> [75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
> [75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
> [75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
> [75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
> [75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
> [75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
> [75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
> [75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
> [75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
> [75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
> [75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
> [75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
> [75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
> [75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
> [75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
> [75165.884002]  RSP <ffff8801026f7a08>
> 
> (gdb) l *do_relocation+0x163
> 0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
> 2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
> 2498                            if (ret < 0) {
> 2499                                    err = ret;
> 2500                                    break;
> 2501                            }
> 2502                            BUG_ON(ret > 0);

this translates btrfs_search_slot() result to

1600  * If the key isn't found, the path points to the slot where it should
1601  * be inserted, and 1 is returned.  If there are other errors during the
1602  * search a negative error number is returned.

ie. the key is not in the tree. If you can reproduce it reliably, add
some printks about the key and path's ->search_commit_root,
->skip_locking, and ->leave_spinning which are accessed within
btrfs_search_slot().


david

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-21  0:24     ` David Sterba
@ 2011-06-21  0:40       ` Chris Mason
  2011-06-21  1:15         ` Tsutomu Itoh
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2011-06-21  0:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Tsutomu Itoh, linux-btrfs, Miao Xie

Excerpts from David Sterba's message of 2011-06-20 20:24:35 -0400:
> On Mon, Jun 20, 2011 at 08:41:39AM +0900, Tsutomu Itoh wrote:
> > (2011/06/19 13:34), Tsutomu Itoh wrote:
> > >> I've fixed this up by moving the delayed metadata run down into the
> > >> snapshot creation code, please take a look.  If nobody objects I'll have
> > >> this in the pull I send to Linus this weekend.
> > > 
> > 
> > > I ran my balance test script about 12 hours, any problem didn't occur.
> > 
> > But, following panics still occur if inode_cache is specified.
> > 
> > [75164.963860] btrfs: relocating block group 18551406592 flags 18
> > [75165.565282] btrfs: relocating block group 18282971136 flags 20
> > [75165.883577] ------------[ cut here ]------------
> > [75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
> > [75165.883992] invalid opcode: 0000 [#1] SMP
> > [75165.884002] CPU 0
> > [75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
> > [75165.884002]
> > [75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
> > [75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
> > [75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
> > [75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
> > [75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
> > [75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
> > [75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
> > [75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
> > [75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> > [75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
> > [75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
> > [75165.884002] Stack:
> > [75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
> > [75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
> > [75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
> > [75165.884002] Call Trace:
> > [75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
> > [75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
> > [75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
> > [75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
> > [75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
> > [75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
> > [75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
> > [75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
> > [75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
> > [75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
> > [75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
> > [75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
> > [75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
> > [75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
> > [75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
> > [75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
> > [75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
> > [75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
> > [75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
> > [75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
> > [75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
> > [75165.884002]  RSP <ffff8801026f7a08>
> > 
> > (gdb) l *do_relocation+0x163
> > 0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
> > 2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
> > 2498                            if (ret < 0) {
> > 2499                                    err = ret;
> > 2500                                    break;
> > 2501                            }
> > 2502                            BUG_ON(ret > 0);
> 
> this translates btrfs_search_slot() result to
> 
> 1600  * If the key isn't found, the path points to the slot where it should
> 1601  * be inserted, and 1 is returned.  If there are other errors during the
> 1602  * search a negative error number is returned.
> 
> ie. the key is not in the tree. If you can reproduce it reliably, add
> some printks about the key and path's ->search_commit_root,
> ->skip_locking, and ->leave_spinning which are accessed within
> btrfs_search_slot().

With the inode cache on, this is the expected result right now.  It is
changing the tree after the snapshot is taken, which isn't allowed.  I'm
hoping to nail it down this week.

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-21  0:40       ` Chris Mason
@ 2011-06-21  1:15         ` Tsutomu Itoh
  2011-06-23  8:52           ` Miao Xie
  2011-06-30  6:32           ` Miao Xie
  0 siblings, 2 replies; 16+ messages in thread
From: Tsutomu Itoh @ 2011-06-21  1:15 UTC (permalink / raw)
  To: Chris Mason; +Cc: David Sterba, linux-btrfs, Miao Xie

(2011/06/21 9:40), Chris Mason wrote:
> Excerpts from David Sterba's message of 2011-06-20 20:24:35 -0400:
>> On Mon, Jun 20, 2011 at 08:41:39AM +0900, Tsutomu Itoh wrote:
>>> (2011/06/19 13:34), Tsutomu Itoh wrote:
>>>>> I've fixed this up by moving the delayed metadata run down into the
>>>>> snapshot creation code, please take a look.  If nobody objects I'll have
>>>>> this in the pull I send to Linus this weekend.
>>>>
>>>
>>>> I ran my balance test script about 12 hours, any problem didn't occur.

Bad news.

I changed my test environment to 'btrfs-unstable + for-linus', I encountered
following panic without inode_cache. (in about 4 hours after test begins)

btrfs: relocating block group 49161437184 flags 9
btrfs: found 57523 extents
------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:4303!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]

Pid: 1329, comm: btrfs Not tainted 2.6.39btrfs-test2+ #1 FUJITSU-SV      PRIMERGY            /D2399
RIP: 0010:[<ffffffffa037c1cc>]  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
RSP: 0018:ffff880009665738  EFLAGS: 00010246
RAX: ffff8801925a4000 RBX: ffff88011ee9e000 RCX: ffff8801566889a8
RDX: ffff8800a3601728 RSI: ffff880143fa3000 RDI: ffff88014124d4b0
RBP: ffff880009665798 R08: 6db6db6db6db6db7 R09: 0000160000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880143fa3000
R13: ffff8800a3601728 R14: ffff88014124d4b0 R15: ffff880194a87ba8
FS:  00007f7f5fb40740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000033cfea6d80 CR3: 000000003fd47000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 1329, threadinfo ffff880009664000, task ffff880011c73520)
Stack:
 ffff8800a3601728 ffff88014124d4b0 ffff880009665798 ffffffffa03143fd
 0000000000000000 0000000000000001 ffff880009665798 ffff880143fa3000
 ffff8801566889a8 ffff8800a3601728 ffff88014124d4b0 ffff880194a87ba8
Call Trace:
 [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
 [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
 [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
 [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
 [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
 [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
 [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
 [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
 [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
 [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
 [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
 [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
 [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
Code: 8b 76 10 e8 c7 65 eb e0 4c 8b 45 b0 41 80 48 71 20 48 8b 4d b8 8b 45 c0 e9 52 ff ff ff 48 83 be 0f 01 00 00 f7 0f 85 22 fe ff ff <0f> 0b eb fe 49 3b 50 20 0f 84 02 ff ff ff 0f 0b 0f 1f 40 00 eb
RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
 RSP <ffff880009665738>

(gdb) l *btrfs_reloc_cow_block+0x22c
0x6f1fc is in btrfs_reloc_cow_block (fs/btrfs/relocation.c:4302).
4297
4298            rc = root->fs_info->reloc_ctl;
4299            if (!rc)
4300                    return;
4301
4302            BUG_ON(rc->stage == UPDATE_DATA_PTRS &&
4303                   root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);
4304
4305            level = btrfs_header_level(buf);
4306            if (btrfs_header_generation(buf) <=

>>>
>>> But, following panics still occur if inode_cache is specified.
>>>
>>> [75164.963860] btrfs: relocating block group 18551406592 flags 18
>>> [75165.565282] btrfs: relocating block group 18282971136 flags 20
>>> [75165.883577] ------------[ cut here ]------------
>>> [75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
>>> [75165.883992] invalid opcode: 0000 [#1] SMP
>>> [75165.884002] CPU 0
>>> [75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
>>> [75165.884002]
>>> [75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
>>> [75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>> [75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
>>> [75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
>>> [75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
>>> [75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
>>> [75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
>>> [75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
>>> [75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
>>> [75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
>>> [75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
>>> [75165.884002] Stack:
>>> [75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
>>> [75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
>>> [75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
>>> [75165.884002] Call Trace:
>>> [75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
>>> [75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
>>> [75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
>>> [75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
>>> [75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
>>> [75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
>>> [75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
>>> [75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
>>> [75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
>>> [75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
>>> [75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
>>> [75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
>>> [75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
>>> [75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
>>> [75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
>>> [75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
>>> [75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
>>> [75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
>>> [75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
>>> [75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
>>> [75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>> [75165.884002]  RSP <ffff8801026f7a08>
>>>
>>> (gdb) l *do_relocation+0x163
>>> 0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
>>> 2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
>>> 2498                            if (ret < 0) {
>>> 2499                                    err = ret;
>>> 2500                                    break;
>>> 2501                            }
>>> 2502                            BUG_ON(ret > 0);
>>
>> this translates btrfs_search_slot() result to
>>
>> 1600  * If the key isn't found, the path points to the slot where it should
>> 1601  * be inserted, and 1 is returned.  If there are other errors during the
>> 1602  * search a negative error number is returned.
>>
>> ie. the key is not in the tree. If you can reproduce it reliably, add
>> some printks about the key and path's ->search_commit_root,
>> ->skip_locking, and ->leave_spinning which are accessed within
>> btrfs_search_slot().
> 
> With the inode cache on, this is the expected result right now.  It is
> changing the tree after the snapshot is taken, which isn't allowed.  I'm
> hoping to nail it down this week.
> 
> -chris
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-21  1:15         ` Tsutomu Itoh
@ 2011-06-23  8:52           ` Miao Xie
  2011-06-30  6:32           ` Miao Xie
  1 sibling, 0 replies; 16+ messages in thread
From: Miao Xie @ 2011-06-23  8:52 UTC (permalink / raw)
  To: Tsutomu Itoh; +Cc: Chris Mason, David Sterba, linux-btrfs

On Tue, 21 Jun 2011 10:15:30 +0900, Tsutomu Itoh wrote:
[SNIP]
> Bad news.
> 
> I changed my test environment to 'btrfs-unstable + for-linus', I encountered
> following panic without inode_cache. (in about 4 hours after test begins)
> 
> btrfs: relocating block group 49161437184 flags 9
> btrfs: found 57523 extents
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/relocation.c:4303!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/mm/ksm/run
> CPU 0
> Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
> 
> Pid: 1329, comm: btrfs Not tainted 2.6.39btrfs-test2+ #1 FUJITSU-SV      PRIMERGY            /D2399
> RIP: 0010:[<ffffffffa037c1cc>]  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
> RSP: 0018:ffff880009665738  EFLAGS: 00010246
> RAX: ffff8801925a4000 RBX: ffff88011ee9e000 RCX: ffff8801566889a8
> RDX: ffff8800a3601728 RSI: ffff880143fa3000 RDI: ffff88014124d4b0
> RBP: ffff880009665798 R08: 6db6db6db6db6db7 R09: 0000160000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: ffff880143fa3000
> R13: ffff8800a3601728 R14: ffff88014124d4b0 R15: ffff880194a87ba8
> FS:  00007f7f5fb40740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000033cfea6d80 CR3: 000000003fd47000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btrfs (pid: 1329, threadinfo ffff880009664000, task ffff880011c73520)
> Stack:
>  ffff8800a3601728 ffff88014124d4b0 ffff880009665798 ffffffffa03143fd
>  0000000000000000 0000000000000001 ffff880009665798 ffff880143fa3000
>  ffff8801566889a8 ffff8800a3601728 ffff88014124d4b0 ffff880194a87ba8
> Call Trace:
>  [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
>  [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
>  [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
>  [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
>  [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
>  [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
>  [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
>  [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
>  [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
>  [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
>  [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
>  [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
>  [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
>  [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
>  [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
>  [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
>  [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
>  [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
>  [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
>  [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
>  [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
>  [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
>  [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
>  [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
>  [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
>  [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
>  [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
> Code: 8b 76 10 e8 c7 65 eb e0 4c 8b 45 b0 41 80 48 71 20 48 8b 4d b8 8b 45 c0 e9 52 ff ff ff 48 83 be 0f 01 00 00 f7 0f 85 22 fe ff ff <0f> 0b eb fe 49 3b 50 20 0f 84 02 ff ff ff 0f 0b 0f 1f 40 00 eb
> RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
>  RSP <ffff880009665738>
> 
> (gdb) l *btrfs_reloc_cow_block+0x22c
> 0x6f1fc is in btrfs_reloc_cow_block (fs/btrfs/relocation.c:4302).
> 4297
> 4298            rc = root->fs_info->reloc_ctl;
> 4299            if (!rc)
> 4300                    return;
> 4301
> 4302            BUG_ON(rc->stage == UPDATE_DATA_PTRS &&
> 4303                   root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);
> 4304
> 4305            level = btrfs_header_level(buf);
> 4306            if (btrfs_header_generation(buf) <=

It seems the data relocation inode was delayed to be updated.
I will deal with it next week.

Thanks
Miao

> 
>>>>
>>>> But, following panics still occur if inode_cache is specified.
>>>>
>>>> [75164.963860] btrfs: relocating block group 18551406592 flags 18
>>>> [75165.565282] btrfs: relocating block group 18282971136 flags 20
>>>> [75165.883577] ------------[ cut here ]------------
>>>> [75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
>>>> [75165.883992] invalid opcode: 0000 [#1] SMP
>>>> [75165.884002] CPU 0
>>>> [75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
>>>> [75165.884002]
>>>> [75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
>>>> [75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>>> [75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
>>>> [75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
>>>> [75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
>>>> [75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
>>>> [75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
>>>> [75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
>>>> [75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
>>>> [75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
>>>> [75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> [75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
>>>> [75165.884002] Stack:
>>>> [75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
>>>> [75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
>>>> [75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
>>>> [75165.884002] Call Trace:
>>>> [75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
>>>> [75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
>>>> [75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
>>>> [75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
>>>> [75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
>>>> [75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
>>>> [75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
>>>> [75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
>>>> [75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
>>>> [75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
>>>> [75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
>>>> [75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
>>>> [75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
>>>> [75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
>>>> [75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
>>>> [75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
>>>> [75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>>> [75165.884002]  RSP <ffff8801026f7a08>
>>>>
>>>> (gdb) l *do_relocation+0x163
>>>> 0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
>>>> 2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
>>>> 2498                            if (ret < 0) {
>>>> 2499                                    err = ret;
>>>> 2500                                    break;
>>>> 2501                            }
>>>> 2502                            BUG_ON(ret > 0);
>>>
>>> this translates btrfs_search_slot() result to
>>>
>>> 1600  * If the key isn't found, the path points to the slot where it should
>>> 1601  * be inserted, and 1 is returned.  If there are other errors during the
>>> 1602  * search a negative error number is returned.
>>>
>>> ie. the key is not in the tree. If you can reproduce it reliably, add
>>> some printks about the key and path's ->search_commit_root,
>>> ->skip_locking, and ->leave_spinning which are accessed within
>>> btrfs_search_slot().
>>
>> With the inode cache on, this is the expected result right now.  It is
>> changing the tree after the snapshot is taken, which isn't allowed.  I'm
>> hoping to nail it down this week.
>>
>> -chris
>>
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-21  1:15         ` Tsutomu Itoh
  2011-06-23  8:52           ` Miao Xie
@ 2011-06-30  6:32           ` Miao Xie
  2011-06-30  6:52             ` Tsutomu Itoh
  2011-07-01  8:11             ` Tsutomu Itoh
  1 sibling, 2 replies; 16+ messages in thread
From: Miao Xie @ 2011-06-30  6:32 UTC (permalink / raw)
  To: Tsutomu Itoh; +Cc: Chris Mason, David Sterba, linux-btrfs

Hi, Itoh-san

Could you test the following patch to check whether it can fix the bug or not?
I have tested it on my x86_64 machine by your test script for two days, it worked well.

Thanks
Miao

Subject: [PATCH] btrfs: fix oops when doing space balance

When doing space balance, the following oops happened.

------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:4303!
[SNIP]
Call Trace:
 [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
 [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
 [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
 [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
 [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
 [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
 [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
 [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
 [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
 [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
 [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
 [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
 [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]

It is because the data relocation inode delayed to be updated, and it triggered
BUG_ON() when doing space balance. So we fix it by doing real-time update.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/inode.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 447612d..13b2c04 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2678,12 +2678,15 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
 	int ret;
 
 	/*
-	 * If root is tree root, it means this inode is used to
-	 * store free space information. And these inodes are updated
-	 * when committing the transaction, so they needn't delaye to
-	 * be updated, or deadlock will occured.
+	 * If the inode is a free space inode. they needn't delayed to be
+	 * updated, because these inodes are updated when committing the
+	 * transaction, or deadlock will occured.
+	 *
+	 * Besides that, the data relocation inode is also needn't delayed
+	 * to be updated.
 	 */
-	if (!is_free_space_inode(root, inode)) {
+	if (!is_free_space_inode(root, inode)
+	    && root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
 		ret = btrfs_delayed_update_inode(trans, root, inode);
 		if (!ret)
 			btrfs_set_inode_last_trans(trans, inode);
-- 
1.7.4


On Tue, 21 Jun 2011 10:15:30 +0900, Tsutomu Itoh wrote:
> I changed my test environment to 'btrfs-unstable + for-linus', I encountered
> following panic without inode_cache. (in about 4 hours after test begins)
> 
> btrfs: relocating block group 49161437184 flags 9
> btrfs: found 57523 extents
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/relocation.c:4303!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/mm/ksm/run
> CPU 0
> Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
> 
> Pid: 1329, comm: btrfs Not tainted 2.6.39btrfs-test2+ #1 FUJITSU-SV      PRIMERGY            /D2399
> RIP: 0010:[<ffffffffa037c1cc>]  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
> RSP: 0018:ffff880009665738  EFLAGS: 00010246
> RAX: ffff8801925a4000 RBX: ffff88011ee9e000 RCX: ffff8801566889a8
> RDX: ffff8800a3601728 RSI: ffff880143fa3000 RDI: ffff88014124d4b0
> RBP: ffff880009665798 R08: 6db6db6db6db6db7 R09: 0000160000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: ffff880143fa3000
> R13: ffff8800a3601728 R14: ffff88014124d4b0 R15: ffff880194a87ba8
> FS:  00007f7f5fb40740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000033cfea6d80 CR3: 000000003fd47000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btrfs (pid: 1329, threadinfo ffff880009664000, task ffff880011c73520)
> Stack:
>  ffff8800a3601728 ffff88014124d4b0 ffff880009665798 ffffffffa03143fd
>  0000000000000000 0000000000000001 ffff880009665798 ffff880143fa3000
>  ffff8801566889a8 ffff8800a3601728 ffff88014124d4b0 ffff880194a87ba8
> Call Trace:
>  [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
>  [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
>  [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
>  [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
>  [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
>  [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
>  [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
>  [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
>  [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
>  [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
>  [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
>  [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
>  [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
>  [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
>  [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
>  [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
>  [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
>  [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
>  [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
>  [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
>  [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
>  [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
>  [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
>  [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
>  [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
>  [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
>  [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
> Code: 8b 76 10 e8 c7 65 eb e0 4c 8b 45 b0 41 80 48 71 20 48 8b 4d b8 8b 45 c0 e9 52 ff ff ff 48 83 be 0f 01 00 00 f7 0f 85 22 fe ff ff <0f> 0b eb fe 49 3b 50 20 0f 84 02 ff ff ff 0f 0b 0f 1f 40 00 eb
> RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
>  RSP <ffff880009665738>
> 
> (gdb) l *btrfs_reloc_cow_block+0x22c
> 0x6f1fc is in btrfs_reloc_cow_block (fs/btrfs/relocation.c:4302).
> 4297
> 4298            rc = root->fs_info->reloc_ctl;
> 4299            if (!rc)
> 4300                    return;
> 4301
> 4302            BUG_ON(rc->stage == UPDATE_DATA_PTRS &&
> 4303                   root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);
> 4304
> 4305            level = btrfs_header_level(buf);
> 4306            if (btrfs_header_generation(buf) <=



>>>>
>>>> But, following panics still occur if inode_cache is specified.
>>>>
>>>> [75164.963860] btrfs: relocating block group 18551406592 flags 18
>>>> [75165.565282] btrfs: relocating block group 18282971136 flags 20
>>>> [75165.883577] ------------[ cut here ]------------
>>>> [75165.883779] kernel BUG at fs/btrfs/relocation.c:2502!
>>>> [75165.883992] invalid opcode: 0000 [#1] SMP
>>>> [75165.884002] CPU 0
>>>> [75165.884002] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
>>>> [75165.884002]
>>>> [75165.884002] Pid: 18615, comm: btrfs Not tainted 3.0.0-rc3test #1 FUJITSU-SV      PRIMERGY            /D2399
>>>> [75165.884002] RIP: 0010:[<ffffffffa02dede3>]  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>>> [75165.884002] RSP: 0018:ffff8801026f7a08  EFLAGS: 00010202
>>>> [75165.884002] RAX: 0000000000000001 RBX: ffff8801949e7c00 RCX: 0000000000000002
>>>> [75165.884002] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
>>>> [75165.884002] RBP: ffff8801026f7ad8 R08: 00000000000006c7 R09: ffff8801026f78f0
>>>> [75165.884002] R10: ffff88009f9d7800 R11: 0000000442149000 R12: ffff880037c99180
>>>> [75165.884002] R13: ffff880152695f30 R14: ffff88009f9d4800 R15: ffff88013cfdbf78
>>>> [75165.884002] FS:  00007f490c045740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
>>>> [75165.884002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [75165.884002] CR2: 00000033cfea6a60 CR3: 0000000100d47000 CR4: 00000000000006f0
>>>> [75165.884002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [75165.884002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> [75165.884002] Process btrfs (pid: 18615, threadinfo ffff8801026f6000, task ffff880102060000)
>>>> [75165.884002] Stack:
>>>> [75165.884002]  ffff88000516d000 ffff88009f9d7d80 00000001026f7a98 ffff880037c99180
>>>> [75165.884002]  ffff880037c991c0 000000019f9d7820 ffff88011bf09ba0 ffff88009f9d7800
>>>> [75165.884002]  ffff8801026f7ad8 ffff88011bf094c0 ffff88013cfdbf78 00ff88009f9d7800
>>>> [75165.884002] Call Trace:
>>>> [75165.884002]  [<ffffffffa02926ec>] ? block_rsv_add_bytes+0x24/0x4d [btrfs]
>>>> [75165.884002]  [<ffffffffa02e137a>] relocate_tree_blocks+0x2fb/0x4a9 [btrfs]
>>>> [75165.884002]  [<ffffffffa02de227>] ? add_tree_block+0xff/0x121 [btrfs]
>>>> [75165.884002]  [<ffffffffa02e1b13>] relocate_block_group+0x214/0x4dc [btrfs]
>>>> [75165.884002]  [<ffffffffa02e1f26>] btrfs_relocate_block_group+0x14b/0x285 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c7183>] ? btrfs_relocate_chunk+0x4a4/0x50d [btrfs]
>>>> [75165.884002]  [<ffffffffa02c6d48>] btrfs_relocate_chunk+0x69/0x50d [btrfs]
>>>> [75165.884002]  [<ffffffffa028b3f5>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c0d0f>] ? read_extent_buffer+0xba/0xf6 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c561a>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
>>>> [75165.884002]  [<ffffffffa02c77be>] btrfs_balance+0x1ca/0x219 [btrfs]
>>>> [75165.884002]  [<ffffffffa02cfcbd>] btrfs_ioctl+0x922/0xc19 [btrfs]
>>>> [75165.884002]  [<ffffffff810e88e4>] ? handle_mm_fault+0x233/0x24a
>>>> [75165.884002]  [<ffffffff813a6be5>] ? do_page_fault+0x340/0x3b2
>>>> [75165.884002]  [<ffffffff8111d828>] do_vfs_ioctl+0x474/0x4c3
>>>> [75165.884002]  [<ffffffff810ffe41>] ? virt_to_head_page+0xe/0x31
>>>> [75165.884002]  [<ffffffff811010e8>] ? kmem_cache_free+0x20/0xae
>>>> [75165.884002]  [<ffffffff8111d8cd>] sys_ioctl+0x56/0x79
>>>> [75165.884002]  [<ffffffff813aa302>] system_call_fastpath+0x16/0x1b
>>>> [75165.884002] Code: 00 00 48 8b 95 60 ff ff ff 45 31 c0 41 b9 01 00 00 00 4c 89 e9 4c 89 f6 4c 89 ff e8 52 15 fb ff 83 f8 00 0f 8c e6 02 00 00 74 04 <0f> 0b eb fe 48 8b 53 68 0f b6 43 70 48 85 d2 75 14 49 8b 54 c5
>>>> [75165.884002] RIP  [<ffffffffa02dede3>] do_relocation+0x163/0x44b [btrfs]
>>>> [75165.884002]  RSP <ffff8801026f7a08>
>>>>
>>>> (gdb) l *do_relocation+0x163
>>>> 0x58e07 is in do_relocation (fs/btrfs/relocation.c:2502).
>>>> 2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
>>>> 2498                            if (ret < 0) {
>>>> 2499                                    err = ret;
>>>> 2500                                    break;
>>>> 2501                            }
>>>> 2502                            BUG_ON(ret > 0);
>>>
>>> this translates btrfs_search_slot() result to
>>>
>>> 1600  * If the key isn't found, the path points to the slot where it should
>>> 1601  * be inserted, and 1 is returned.  If there are other errors during the
>>> 1602  * search a negative error number is returned.
>>>
>>> ie. the key is not in the tree. If you can reproduce it reliably, add
>>> some printks about the key and path's ->search_commit_root,
>>> ->skip_locking, and ->leave_spinning which are accessed within
>>> btrfs_search_slot().
>>
>> With the inode cache on, this is the expected result right now.  It is
>> changing the tree after the snapshot is taken, which isn't allowed.  I'm
>> hoping to nail it down this week.
>>
>> -chris
>>
> 
> 


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-30  6:32           ` Miao Xie
@ 2011-06-30  6:52             ` Tsutomu Itoh
  2011-07-01  8:11             ` Tsutomu Itoh
  1 sibling, 0 replies; 16+ messages in thread
From: Tsutomu Itoh @ 2011-06-30  6:52 UTC (permalink / raw)
  To: miaox; +Cc: Chris Mason, David Sterba, linux-btrfs

(2011/06/30 15:32), Miao Xie wrote:
> Hi, Itoh-san
> 
> Could you test the following patch to check whether it can fix the bug or not?

Sure.
After running my test script by about a day, I will report on the result.

Thanks,
Tsutomu

> I have tested it on my x86_64 machine by your test script for two days, it worked well.
> 
> Thanks
> Miao
> 
> Subject: [PATCH] btrfs: fix oops when doing space balance
> 
> When doing space balance, the following oops happened.
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/relocation.c:4303!
> [SNIP]
> Call Trace:
>  [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
>  [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
>  [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
>  [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
>  [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
>  [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
>  [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
>  [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
>  [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
>  [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
>  [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
>  [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
>  [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
>  [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
>  [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
>  [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
>  [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
>  [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
>  [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
>  [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
>  [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
>  [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
>  [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
>  [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
>  [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
>  [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
>  [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
> [SNIP]
> RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
> 
> It is because the data relocation inode delayed to be updated, and it triggered
> BUG_ON() when doing space balance. So we fix it by doing real-time update.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/inode.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 447612d..13b2c04 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2678,12 +2678,15 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
>  	int ret;
>  
>  	/*
> -	 * If root is tree root, it means this inode is used to
> -	 * store free space information. And these inodes are updated
> -	 * when committing the transaction, so they needn't delaye to
> -	 * be updated, or deadlock will occured.
> +	 * If the inode is a free space inode. they needn't delayed to be
> +	 * updated, because these inodes are updated when committing the
> +	 * transaction, or deadlock will occured.
> +	 *
> +	 * Besides that, the data relocation inode is also needn't delayed
> +	 * to be updated.
>  	 */
> -	if (!is_free_space_inode(root, inode)) {
> +	if (!is_free_space_inode(root, inode)
> +	    && root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
>  		ret = btrfs_delayed_update_inode(trans, root, inode);
>  		if (!ret)
>  			btrfs_set_inode_last_trans(trans, inode);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-17 21:12 please review snapshot corruption path with delayed metadata insertion Chris Mason
  2011-06-19  4:34 ` Tsutomu Itoh
@ 2011-06-30  8:03 ` Miao Xie
  2011-06-30  8:51   ` Miao Xie
  1 sibling, 1 reply; 16+ messages in thread
From: Miao Xie @ 2011-06-30  8:03 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Tsutomu Itoh

Hi, Chris

I think the snapshot should be the image of the fs tree before it was created,
so the metadata of the snapshot should not exist in the its tree. But now, we
found the directory item and directory name index is in both the snapshot tree
and the fs tree.

Besides that, it also makes the users feel strange. For example:
# mkfs.btrfs /dev/sda1
# mount /dev/sda1 /mnt
# mkdir /mnt/1
# cd /mnt/1
# btrfs subvolume snapshot /mnt snap0
# ll /mnt/1
total 0
drwxr-xr-x 1 root root 10 Jun 30 15:01 1
		       ^^^
# ll /mnt/1/snap0/
total 0
drwxr-xr-x 1 root root 10 Jun 30 15:01 1
		       ^^^
			It is also 10, but...
# ll /mnt/1/snap0/1
total 0
[None]

There is nothing in the directory 1 in snap0, but btrfs told the length of
this directory is 10, it is strange.

So I think we should insert directory item and directory name index and update
the parent inode as the last step of snapshot creation.

Thanks
Miao

On Fri, 17 Jun 2011 17:12:28 -0400, Chris Mason wrote:
> commit e999376f094162aa425ae749aa1df95ab928d010
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Fri Jun 17 16:14:09 2011 -0400
> 
>     Btrfs: avoid delayed metadata items during commits
>     
>     Snapshot creation has two phases.  One is the initial snapshot setup,
>     and the second is done during commit, while nobody is allowed to modify
>     the root we are snapshotting.
>     
>     The delayed metadata insertion code can break that rule, it does a
>     delayed inode update on the inode of the parent of the snapshot,
>     and delayed directory item insertion.
>     
>     This makes sure to run the pending delayed operations before we
>     record the snapshot root, which avoids corruptions.
>     
>     Signed-off-by: Chris Mason <chris.mason@oracle.com>
> 
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index fc515b7..f1cbd02 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1237,6 +1237,13 @@ again:
>  	return 0;
>  }
>  
> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root)
> +{
> +	struct btrfs_delayed_root *delayed_root;
> +	delayed_root = btrfs_get_delayed_root(root);
> +	WARN_ON(btrfs_first_delayed_node(delayed_root));
> +}
> +
>  void btrfs_balance_delayed_items(struct btrfs_root *root)
>  {
>  	struct btrfs_delayed_root *delayed_root;
> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
> index cb79b67..d1a6a29 100644
> --- a/fs/btrfs/delayed-inode.h
> +++ b/fs/btrfs/delayed-inode.h
> @@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>  /* for init */
>  int __init btrfs_delayed_inode_init(void);
>  void btrfs_delayed_inode_exit(void);
> +
> +/* for debugging */
> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root);
> +
>  #endif
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c073d85..51dcec8 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	ret = btrfs_update_inode(trans, parent_root, parent_inode);
>  	BUG_ON(ret);
>  
> +	/*
> +	 * pull in the delayed directory update
> +	 * and the delayed inode item
> +	 * otherwise we corrupt the FS during
> +	 * snapshot
> +	 */
> +	ret = btrfs_run_delayed_items(trans, root);
> +	BUG_ON(ret);
> +
>  	record_root_in_trans(trans, root);
>  	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>  	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
> @@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
>  	int ret;
>  
>  	list_for_each_entry(pending, head, list) {
> -		/*
> -		 * We must deal with the delayed items before creating
> -		 * snapshots, or we will create a snapthot with inconsistent
> -		 * information.
> -		*/
> -		ret = btrfs_run_delayed_items(trans, fs_info->fs_root);
> -		BUG_ON(ret);
> -
>  		ret = create_pending_snapshot(trans, fs_info, pending);
>  		BUG_ON(ret);
>  	}
> @@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>  	 */
>  	mutex_lock(&root->fs_info->reloc_mutex);
>  
> -	ret = create_pending_snapshots(trans, root->fs_info);
> +	ret = btrfs_run_delayed_items(trans, root);
>  	BUG_ON(ret);
>  
> -	ret = btrfs_run_delayed_items(trans, root);
> +	ret = create_pending_snapshots(trans, root->fs_info);
>  	BUG_ON(ret);
>  
>  	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>  	BUG_ON(ret);
>  
> +	/*
> +	 * make sure none of the code above managed to slip in a
> +	 * delayed item
> +	 */
> +	btrfs_assert_delayed_root_empty(root);
> +
>  	WARN_ON(cur_trans != trans->transaction);
>  
>  	btrfs_scrub_pause(root);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-30  8:03 ` Miao Xie
@ 2011-06-30  8:51   ` Miao Xie
  0 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2011-06-30  8:51 UTC (permalink / raw)
  To: miaox; +Cc: Chris Mason, linux-btrfs, Tsutomu Itoh

On thu, 30 Jun 2011 16:03:21 +0800, Miao Xie wrote:
> Hi, Chris
> 
> I think the snapshot should be the image of the fs tree before it was created,
> so the metadata of the snapshot should not exist in the its tree. But now, we
> found the directory item and directory name index is in both the snapshot tree
> and the fs tree.
> 
> Besides that, it also makes the users feel strange. For example:
> # mkfs.btrfs /dev/sda1
> # mount /dev/sda1 /mnt
> # mkdir /mnt/1
> # cd /mnt/1
> # btrfs subvolume snapshot /mnt snap0
> # ll /mnt/1
> total 0
> drwxr-xr-x 1 root root 10 Jun 30 15:01 1
> 		       ^^^
> # ll /mnt/1/snap0/
> total 0
> drwxr-xr-x 1 root root 10 Jun 30 15:01 1
> 		       ^^^
> 			It is also 10, but...
> # ll /mnt/1/snap0/1
> total 0
> [None]

If we do

# touch /mnt/1/snap0/1/snap0/a

WARN_ON_ONCE(in d_set_d_op(), fs/dcache.c:1345) will be triggered.

If we insert directory item and directory name index and update the parent inode
as the last step, this warning will not happen.

Thanks
Miao

> There is nothing in the directory 1 in snap0, but btrfs told the length of
> this directory is 10, it is strange.
> 
> So I think we should insert directory item and directory name index and update
> the parent inode as the last step of snapshot creation.
> 
> Thanks
> Miao
> 
> On Fri, 17 Jun 2011 17:12:28 -0400, Chris Mason wrote:
>> commit e999376f094162aa425ae749aa1df95ab928d010
>> Author: Chris Mason <chris.mason@oracle.com>
>> Date:   Fri Jun 17 16:14:09 2011 -0400
>>
>>     Btrfs: avoid delayed metadata items during commits
>>     
>>     Snapshot creation has two phases.  One is the initial snapshot setup,
>>     and the second is done during commit, while nobody is allowed to modify
>>     the root we are snapshotting.
>>     
>>     The delayed metadata insertion code can break that rule, it does a
>>     delayed inode update on the inode of the parent of the snapshot,
>>     and delayed directory item insertion.
>>     
>>     This makes sure to run the pending delayed operations before we
>>     record the snapshot root, which avoids corruptions.
>>     
>>     Signed-off-by: Chris Mason <chris.mason@oracle.com>
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index fc515b7..f1cbd02 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1237,6 +1237,13 @@ again:
>>  	return 0;
>>  }
>>  
>> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root)
>> +{
>> +	struct btrfs_delayed_root *delayed_root;
>> +	delayed_root = btrfs_get_delayed_root(root);
>> +	WARN_ON(btrfs_first_delayed_node(delayed_root));
>> +}
>> +
>>  void btrfs_balance_delayed_items(struct btrfs_root *root)
>>  {
>>  	struct btrfs_delayed_root *delayed_root;
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> index cb79b67..d1a6a29 100644
>> --- a/fs/btrfs/delayed-inode.h
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>>  /* for init */
>>  int __init btrfs_delayed_inode_init(void);
>>  void btrfs_delayed_inode_exit(void);
>> +
>> +/* for debugging */
>> +void btrfs_assert_delayed_root_empty(struct btrfs_root *root);
>> +
>>  #endif
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index c073d85..51dcec8 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>  	ret = btrfs_update_inode(trans, parent_root, parent_inode);
>>  	BUG_ON(ret);
>>  
>> +	/*
>> +	 * pull in the delayed directory update
>> +	 * and the delayed inode item
>> +	 * otherwise we corrupt the FS during
>> +	 * snapshot
>> +	 */
>> +	ret = btrfs_run_delayed_items(trans, root);
>> +	BUG_ON(ret);
>> +
>>  	record_root_in_trans(trans, root);
>>  	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>>  	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
>> @@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
>>  	int ret;
>>  
>>  	list_for_each_entry(pending, head, list) {
>> -		/*
>> -		 * We must deal with the delayed items before creating
>> -		 * snapshots, or we will create a snapthot with inconsistent
>> -		 * information.
>> -		*/
>> -		ret = btrfs_run_delayed_items(trans, fs_info->fs_root);
>> -		BUG_ON(ret);
>> -
>>  		ret = create_pending_snapshot(trans, fs_info, pending);
>>  		BUG_ON(ret);
>>  	}
>> @@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>  	 */
>>  	mutex_lock(&root->fs_info->reloc_mutex);
>>  
>> -	ret = create_pending_snapshots(trans, root->fs_info);
>> +	ret = btrfs_run_delayed_items(trans, root);
>>  	BUG_ON(ret);
>>  
>> -	ret = btrfs_run_delayed_items(trans, root);
>> +	ret = create_pending_snapshots(trans, root->fs_info);
>>  	BUG_ON(ret);
>>  
>>  	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>>  	BUG_ON(ret);
>>  
>> +	/*
>> +	 * make sure none of the code above managed to slip in a
>> +	 * delayed item
>> +	 */
>> +	btrfs_assert_delayed_root_empty(root);
>> +
>>  	WARN_ON(cur_trans != trans->transaction);
>>  
>>  	btrfs_scrub_pause(root);
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-06-30  6:32           ` Miao Xie
  2011-06-30  6:52             ` Tsutomu Itoh
@ 2011-07-01  8:11             ` Tsutomu Itoh
  2011-07-07 20:26               ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Tsutomu Itoh @ 2011-07-01  8:11 UTC (permalink / raw)
  To: miaox; +Cc: Chris Mason, David Sterba, linux-btrfs

Hi, Miao,

(2011/06/30 15:32), Miao Xie wrote:
> Hi, Itoh-san
> 
> Could you test the following patch to check whether it can fix the bug or not?
> I have tested it on my x86_64 machine by your test script for two days, it worked well.

I ran my test script about a day, I was not able to reproduce this BUG.

Thanks,
Tsutomu

> 
> Thanks
> Miao
> 
> Subject: [PATCH] btrfs: fix oops when doing space balance
> 
> When doing space balance, the following oops happened.
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/relocation.c:4303!
> [SNIP]
> Call Trace:
>  [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
>  [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
>  [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
>  [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
>  [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
>  [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
>  [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
>  [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
>  [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
>  [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
>  [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
>  [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
>  [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
>  [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
>  [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
>  [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
>  [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
>  [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
>  [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
>  [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
>  [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
>  [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
>  [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
>  [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
>  [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
>  [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
>  [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
>  [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
>  [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
> [SNIP]
> RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
> 
> It is because the data relocation inode delayed to be updated, and it triggered
> BUG_ON() when doing space balance. So we fix it by doing real-time update.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/inode.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 447612d..13b2c04 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2678,12 +2678,15 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
>  	int ret;
>  
>  	/*
> -	 * If root is tree root, it means this inode is used to
> -	 * store free space information. And these inodes are updated
> -	 * when committing the transaction, so they needn't delaye to
> -	 * be updated, or deadlock will occured.
> +	 * If the inode is a free space inode. they needn't delayed to be
> +	 * updated, because these inodes are updated when committing the
> +	 * transaction, or deadlock will occured.
> +	 *
> +	 * Besides that, the data relocation inode is also needn't delayed
> +	 * to be updated.
>  	 */
> -	if (!is_free_space_inode(root, inode)) {
> +	if (!is_free_space_inode(root, inode)
> +	    && root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
>  		ret = btrfs_delayed_update_inode(trans, root, inode);
>  		if (!ret)
>  			btrfs_set_inode_last_trans(trans, inode);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-07-01  8:11             ` Tsutomu Itoh
@ 2011-07-07 20:26               ` Chris Mason
  2011-07-07 23:51                 ` Tsutomu Itoh
  2011-07-08  4:50                 ` Tsutomu Itoh
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2011-07-07 20:26 UTC (permalink / raw)
  To: Tsutomu Itoh; +Cc: miaox, David Sterba, linux-btrfs

Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
> Hi, Miao,
> 
> (2011/06/30 15:32), Miao Xie wrote:
> > Hi, Itoh-san
> > 
> > Could you test the following patch to check whether it can fix the bug or not?
> > I have tested it on my x86_64 machine by your test script for two days, it worked well.
> 
> I ran my test script about a day, I was not able to reproduce this BUG.

Can you please try this patch with the inode_cache option (in addition
to Miao's code).

commit d0243d46f7a1e4cd57c74fa14556be65b454687d
Author: Chris Mason <chris.mason@oracle.com>
Date:   Thu Jul 7 15:53:12 2011 -0400

    Btrfs: write out free inode cache before taking snapshots
    
    The btrfs snapshotting code requires that once a root has been
    snapshotted, we don't change it during a commit
    
    But the free inode cache was changing the roots when it root the cache,
    which lead to corruptions.
    
    This fixes things by making sure we write the cache while we are taking
    the snapshot, and that we don't write it again later.
    
    Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index bf0d615..d594cf7 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl *ctl,
 	info->bytes = bytes;
 
 	spin_lock(&ctl->tree_lock);
+	ctl->dirty = 1;
 
 	if (try_merge_free_space(ctl, info, true))
 		goto link;
@@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group,
 	int ret = 0;
 
 	spin_lock(&ctl->tree_lock);
+	ctl->dirty = 1;
 
 again:
 	info = tree_search_offset(ctl, offset, 0, 0);
@@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
 		if (entry->bytes == 0)
 			free_bitmap(ctl, entry);
 	}
+	ctl->dirty = 1;
 out:
 	spin_unlock(&ctl->tree_lock);
 
@@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
 		printk(KERN_ERR "btrfs: failed to write free ino cache "
 		       "for root %llu\n", root->root_key.objectid);
 
+	/* we write out at transaction commit time, there's no racing. */
+	if (ret == 0)
+		ctl->dirty = 0;
+
 	iput(inode);
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 8f2613f..1e92c93 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
 	int free_extents;
 	int total_bitmaps;
 	int unit;
+	/*
+	 * record if we've changed since written.  This can turn
+	 * into a bit field if we need more flags
+	 */
+	unsigned long dirty;
 	u64 start;
 	struct btrfs_free_space_op *op;
 	void *private;
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index b4087e0..e7c1493 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
 	ctl->start = 0;
 	ctl->private = NULL;
 	ctl->op = &free_ino_op;
+	ctl->dirty = 1;
 
 	/*
 	 * Initially we allow to use 16K of ram to cache chunks of
@@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	if (!btrfs_test_opt(root, INODE_MAP_CACHE))
 		return 0;
 
+	if (!ctl->dirty)
+		return 0;
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -485,6 +489,24 @@ out:
 	return ret;
 }
 
+/*
+ * this tries to save the cache, but if it fails for any reason we clear
+ * the dirty flag so that it won't be saved again during this commit.
+ *
+ * This is used by the snapshotting code to make sure we don't corrupt the
+ * FS by saving the inode cache after the snapshot is taken.
+ */
+int btrfs_force_save_ino_cache(struct btrfs_root *root,
+			       struct btrfs_trans_handle *trans)
+{
+	struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
+	int ret;
+	ret = btrfs_save_ino_cache(root, trans);
+
+	ctl->dirty = 0;
+	return ret;
+}
+
 static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
 {
 	struct btrfs_path *path;
diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
index ddb347b..2be060e 100644
--- a/fs/btrfs/inode-map.h
+++ b/fs/btrfs/inode-map.h
@@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
 int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid);
 int btrfs_save_ino_cache(struct btrfs_root *root,
 			 struct btrfs_trans_handle *trans);
-
+int btrfs_force_save_ino_cache(struct btrfs_root *root,
+			       struct btrfs_trans_handle *trans);
 int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
 
 #endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 51dcec8..e34827c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -966,6 +966,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_run_delayed_items(trans, root);
 	BUG_ON(ret);
 
+	/*
+	 * there are a few transient reasons the free inode cache writeback can fail.
+	 * and if it does, we'll try again later in the commit.  This forces the
+	 * clean bit, tossing the cache.  Tossing the cache isn't really good, but
+	 * if we try to write it again later in the commit we'll corrupt things.
+	 */
+	btrfs_force_save_ino_cache(parent_root, trans);
+
+
 	record_root_in_trans(trans, root);
 	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
 	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-07-07 20:26               ` Chris Mason
@ 2011-07-07 23:51                 ` Tsutomu Itoh
  2011-07-08  1:59                   ` Chris Mason
  2011-07-08  4:50                 ` Tsutomu Itoh
  1 sibling, 1 reply; 16+ messages in thread
From: Tsutomu Itoh @ 2011-07-07 23:51 UTC (permalink / raw)
  To: Chris Mason; +Cc: miaox, David Sterba, linux-btrfs

Hi, Chris,

(2011/07/08 5:26), Chris Mason wrote:
> Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
>> Hi, Miao,
>>
>> (2011/06/30 15:32), Miao Xie wrote:
>>> Hi, Itoh-san
>>>
>>> Could you test the following patch to check whether it can fix the bug or not?
>>> I have tested it on my x86_64 machine by your test script for two days, it worked well.
>>
>> I ran my test script about a day, I was not able to reproduce this BUG.
> 
> Can you please try this patch with the inode_cache option (in addition
> to Miao's code).

In my clarification.

I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'?
or, other patches also necessary?

Thanks,
Tsutomu

> 
> commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Thu Jul 7 15:53:12 2011 -0400
> 
>     Btrfs: write out free inode cache before taking snapshots
>     
>     The btrfs snapshotting code requires that once a root has been
>     snapshotted, we don't change it during a commit
>     
>     But the free inode cache was changing the roots when it root the cache,
>     which lead to corruptions.
>     
>     This fixes things by making sure we write the cache while we are taking
>     the snapshot, and that we don't write it again later.
>     
>     Signed-off-by: Chris Mason <chris.mason@oracle.com>
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index bf0d615..d594cf7 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl *ctl,
>  	info->bytes = bytes;
>  
>  	spin_lock(&ctl->tree_lock);
> +	ctl->dirty = 1;
>  
>  	if (try_merge_free_space(ctl, info, true))
>  		goto link;
> @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group,
>  	int ret = 0;
>  
>  	spin_lock(&ctl->tree_lock);
> +	ctl->dirty = 1;
>  
>  again:
>  	info = tree_search_offset(ctl, offset, 0, 0);
> @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
>  		if (entry->bytes == 0)
>  			free_bitmap(ctl, entry);
>  	}
> +	ctl->dirty = 1;
>  out:
>  	spin_unlock(&ctl->tree_lock);
>  
> @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>  		printk(KERN_ERR "btrfs: failed to write free ino cache "
>  		       "for root %llu\n", root->root_key.objectid);
>  
> +	/* we write out at transaction commit time, there's no racing. */
> +	if (ret == 0)
> +		ctl->dirty = 0;
> +
>  	iput(inode);
>  	return ret;
>  }
> diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> index 8f2613f..1e92c93 100644
> --- a/fs/btrfs/free-space-cache.h
> +++ b/fs/btrfs/free-space-cache.h
> @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
>  	int free_extents;
>  	int total_bitmaps;
>  	int unit;
> +	/*
> +	 * record if we've changed since written.  This can turn
> +	 * into a bit field if we need more flags
> +	 */
> +	unsigned long dirty;
>  	u64 start;
>  	struct btrfs_free_space_op *op;
>  	void *private;
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index b4087e0..e7c1493 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
>  	ctl->start = 0;
>  	ctl->private = NULL;
>  	ctl->op = &free_ino_op;
> +	ctl->dirty = 1;
>  
>  	/*
>  	 * Initially we allow to use 16K of ram to cache chunks of
> @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	if (!btrfs_test_opt(root, INODE_MAP_CACHE))
>  		return 0;
>  
> +	if (!ctl->dirty)
> +		return 0;
> +
>  	path = btrfs_alloc_path();
>  	if (!path)
>  		return -ENOMEM;
> @@ -485,6 +489,24 @@ out:
>  	return ret;
>  }
>  
> +/*
> + * this tries to save the cache, but if it fails for any reason we clear
> + * the dirty flag so that it won't be saved again during this commit.
> + *
> + * This is used by the snapshotting code to make sure we don't corrupt the
> + * FS by saving the inode cache after the snapshot is taken.
> + */
> +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> +			       struct btrfs_trans_handle *trans)
> +{
> +	struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
> +	int ret;
> +	ret = btrfs_save_ino_cache(root, trans);
> +
> +	ctl->dirty = 0;
> +	return ret;
> +}
> +
>  static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
>  {
>  	struct btrfs_path *path;
> diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
> index ddb347b..2be060e 100644
> --- a/fs/btrfs/inode-map.h
> +++ b/fs/btrfs/inode-map.h
> @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
>  int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid);
>  int btrfs_save_ino_cache(struct btrfs_root *root,
>  			 struct btrfs_trans_handle *trans);
> -
> +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> +			       struct btrfs_trans_handle *trans);
>  int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
>  
>  #endif
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 51dcec8..e34827c 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -966,6 +966,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	ret = btrfs_run_delayed_items(trans, root);
>  	BUG_ON(ret);
>  
> +	/*
> +	 * there are a few transient reasons the free inode cache writeback can fail.
> +	 * and if it does, we'll try again later in the commit.  This forces the
> +	 * clean bit, tossing the cache.  Tossing the cache isn't really good, but
> +	 * if we try to write it again later in the commit we'll corrupt things.
> +	 */
> +	btrfs_force_save_ino_cache(parent_root, trans);
> +
> +
>  	record_root_in_trans(trans, root);
>  	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>  	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-07-07 23:51                 ` Tsutomu Itoh
@ 2011-07-08  1:59                   ` Chris Mason
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Mason @ 2011-07-08  1:59 UTC (permalink / raw)
  To: Tsutomu Itoh; +Cc: miaox, David Sterba, linux-btrfs

Excerpts from Tsutomu Itoh's message of 2011-07-07 19:51:09 -0400:
> Hi, Chris,
> 
> (2011/07/08 5:26), Chris Mason wrote:
> > Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
> >> Hi, Miao,
> >>
> >> (2011/06/30 15:32), Miao Xie wrote:
> >>> Hi, Itoh-san
> >>>
> >>> Could you test the following patch to check whether it can fix the bug or not?
> >>> I have tested it on my x86_64 machine by your test script for two days, it worked well.
> >>
> >> I ran my test script about a day, I was not able to reproduce this BUG.
> > 
> > Can you please try this patch with the inode_cache option (in addition
> > to Miao's code).
> 
> In my clarification.
> 
> I do only have to apply this patch to 'btrfs-unstable + (current)for-linus'?
> or, other patches also necessary?
> 

Hi, sorry that I wasn't clear.  You can apply it to the current
for-linus branch, which has Miao's fix to keep from doing delayed
metadata updates on the relocation inode.

-chris

> Thanks,
> Tsutomu
> 
> > 
> > commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> > Author: Chris Mason <chris.mason@oracle.com>
> > Date:   Thu Jul 7 15:53:12 2011 -0400
> > 
> >     Btrfs: write out free inode cache before taking snapshots
> >     
> >     The btrfs snapshotting code requires that once a root has been
> >     snapshotted, we don't change it during a commit
> >     
> >     But the free inode cache was changing the roots when it root the cache,
> >     which lead to corruptions.
> >     
> >     This fixes things by making sure we write the cache while we are taking
> >     the snapshot, and that we don't write it again later.
> >     
> >     Signed-off-by: Chris Mason <chris.mason@oracle.com>
> > 
> > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> > index bf0d615..d594cf7 100644
> > --- a/fs/btrfs/free-space-cache.c
> > +++ b/fs/btrfs/free-space-cache.c
> > @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl *ctl,
> >      info->bytes = bytes;
> >  
> >      spin_lock(&ctl->tree_lock);
> > +    ctl->dirty = 1;
> >  
> >      if (try_merge_free_space(ctl, info, true))
> >          goto link;
> > @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group,
> >      int ret = 0;
> >  
> >      spin_lock(&ctl->tree_lock);
> > +    ctl->dirty = 1;
> >  
> >  again:
> >      info = tree_search_offset(ctl, offset, 0, 0);
> > @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
> >          if (entry->bytes == 0)
> >              free_bitmap(ctl, entry);
> >      }
> > +    ctl->dirty = 1;
> >  out:
> >      spin_unlock(&ctl->tree_lock);
> >  
> > @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
> >          printk(KERN_ERR "btrfs: failed to write free ino cache "
> >                 "for root %llu\n", root->root_key.objectid);
> >  
> > +    /* we write out at transaction commit time, there's no racing. */
> > +    if (ret == 0)
> > +        ctl->dirty = 0;
> > +
> >      iput(inode);
> >      return ret;
> >  }
> > diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> > index 8f2613f..1e92c93 100644
> > --- a/fs/btrfs/free-space-cache.h
> > +++ b/fs/btrfs/free-space-cache.h
> > @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
> >      int free_extents;
> >      int total_bitmaps;
> >      int unit;
> > +    /*
> > +     * record if we've changed since written.  This can turn
> > +     * into a bit field if we need more flags
> > +     */
> > +    unsigned long dirty;
> >      u64 start;
> >      struct btrfs_free_space_op *op;
> >      void *private;
> > diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> > index b4087e0..e7c1493 100644
> > --- a/fs/btrfs/inode-map.c
> > +++ b/fs/btrfs/inode-map.c
> > @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
> >      ctl->start = 0;
> >      ctl->private = NULL;
> >      ctl->op = &free_ino_op;
> > +    ctl->dirty = 1;
> >  
> >      /*
> >       * Initially we allow to use 16K of ram to cache chunks of
> > @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
> >      if (!btrfs_test_opt(root, INODE_MAP_CACHE))
> >          return 0;
> >  
> > +    if (!ctl->dirty)
> > +        return 0;
> > +
> >      path = btrfs_alloc_path();
> >      if (!path)
> >          return -ENOMEM;
> > @@ -485,6 +489,24 @@ out:
> >      return ret;
> >  }
> >  
> > +/*
> > + * this tries to save the cache, but if it fails for any reason we clear
> > + * the dirty flag so that it won't be saved again during this commit.
> > + *
> > + * This is used by the snapshotting code to make sure we don't corrupt the
> > + * FS by saving the inode cache after the snapshot is taken.
> > + */
> > +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> > +                   struct btrfs_trans_handle *trans)
> > +{
> > +    struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
> > +    int ret;
> > +    ret = btrfs_save_ino_cache(root, trans);
> > +
> > +    ctl->dirty = 0;
> > +    return ret;
> > +}
> > +
> >  static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
> >  {
> >      struct btrfs_path *path;
> > diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
> > index ddb347b..2be060e 100644
> > --- a/fs/btrfs/inode-map.h
> > +++ b/fs/btrfs/inode-map.h
> > @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
> >  int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid);
> >  int btrfs_save_ino_cache(struct btrfs_root *root,
> >               struct btrfs_trans_handle *trans);
> > -
> > +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> > +                   struct btrfs_trans_handle *trans);
> >  int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
> >  
> >  #endif
> > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> > index 51dcec8..e34827c 100644
> > --- a/fs/btrfs/transaction.c
> > +++ b/fs/btrfs/transaction.c
> > @@ -966,6 +966,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
> >      ret = btrfs_run_delayed_items(trans, root);
> >      BUG_ON(ret);
> >  
> > +    /*
> > +     * there are a few transient reasons the free inode cache writeback can fail.
> > +     * and if it does, we'll try again later in the commit.  This forces the
> > +     * clean bit, tossing the cache.  Tossing the cache isn't really good, but
> > +     * if we try to write it again later in the commit we'll corrupt things.
> > +     */
> > +    btrfs_force_save_ino_cache(parent_root, trans);
> > +
> > +
> >      record_root_in_trans(trans, root);
> >      btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
> >      memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: please review snapshot corruption path with delayed metadata insertion
  2011-07-07 20:26               ` Chris Mason
  2011-07-07 23:51                 ` Tsutomu Itoh
@ 2011-07-08  4:50                 ` Tsutomu Itoh
  1 sibling, 0 replies; 16+ messages in thread
From: Tsutomu Itoh @ 2011-07-08  4:50 UTC (permalink / raw)
  To: Chris Mason; +Cc: miaox, David Sterba, linux-btrfs

Hi, Chris,

(2011/07/08 5:26), Chris Mason wrote:
> Excerpts from Tsutomu Itoh's message of 2011-07-01 04:11:28 -0400:
>> Hi, Miao,
>>
>> (2011/06/30 15:32), Miao Xie wrote:
>>> Hi, Itoh-san
>>>
>>> Could you test the following patch to check whether it can fix the bug or not?
>>> I have tested it on my x86_64 machine by your test script for two days, it worked well.
>>
>> I ran my test script about a day, I was not able to reproduce this BUG.
> 
> Can you please try this patch with the inode_cache option (in addition
> to Miao's code).

Unfortunately, I encountered following panic.

=============================================================================

btrfs: relocating block group 17746100224 flags 20
btrfs: relocating block group 12377391104 flags 9
btrfs: found 4181 extents
------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:2502!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: btrfs zlib_deflate crc32c libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]

Pid: 26214, comm: btrfs Not tainted 2.6.39btrfs-test5+ #2 FUJITSU-SV      PRIMERGY            /D2399
RIP: 0010:[<ffffffffa04a98f2>]  [<ffffffffa04a98f2>] do_relocation+0x562/0x590 [btrfs]
RSP: 0018:ffff8801622519a8  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff8800d2754140 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880000000000 RDI: 0000000000000000
RBP: ffff880162251a78 R08: 0000000000000000 R09: 00000000000002e9
R10: 0000000000000000 R11: 0000000000000026 R12: ffff880161f2fb40
R13: ffff8800cd81eac0 R14: ffff880080038000 R15: 0000000000000000
FS:  00007f4081d05740(0000) GS:ffff88019fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000033cfea6a60 CR3: 000000015d345000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 26214, threadinfo ffff880162250000, task ffff880161c3eab0)
Stack:
 ffff880191f006d0 ffff8800cd81eac0 ffff880191f005b0 ffff8800cd81eb00
 ffff88016225b000 ffff8800777779e0 0000000162251a48 ffff880162250000
 ffff880162251a78 ffff880193a26930 0000000100251a78 ffff880193a26930
Call Trace:
 [<ffffffffa044abeb>] ? block_rsv_add_bytes+0x2b/0x70 [btrfs]
 [<ffffffffa04ab1ab>] relocate_tree_blocks+0x62b/0x6e0 [btrfs]
 [<ffffffffa0484928>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa04ac3d3>] ? add_data_references+0x263/0x280 [btrfs]
 [<ffffffffa04ac662>] relocate_block_group+0x272/0x620 [btrfs]
 [<ffffffffa04acbc3>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [<ffffffffa0496000>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [<ffffffffa048b4eb>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [<ffffffffa04400ed>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [<ffffffffa0484928>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa0448f51>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [<ffffffffa0484928>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa048c6fa>] btrfs_balance+0x21a/0x2a0 [btrfs]
 [<ffffffff8115dc41>] ? path_openat+0x101/0x3d0
 [<ffffffffa04959d8>] btrfs_ioctl+0x798/0xd20 [btrfs]
 [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
Code: 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 <0f> 0b eb fe 48 83 7a 68 00 0f 1f 44 00 00 0f 84 d2 fa ff ff 0f
RIP  [<ffffffffa04a98f2>] do_relocation+0x562/0x590 [btrfs]
 RSP <ffff8801622519a8>

(gdb) l *do_relocation+0x562
0x6f922 is in do_relocation (fs/btrfs/relocation.c:2502).
2497                            ret = btrfs_search_slot(trans, root, key, path, 0, 1);
2498                            if (ret < 0) {
2499                                    err = ret;
2500                                    break;
2501                            }
2502                            BUG_ON(ret > 0);
2503
2504                            if (!upper->eb) {
2505                                    upper->eb = path->nodes[upper->level];
2506                                    path->nodes[upper->level] = NULL;
(gdb)

> 
> commit d0243d46f7a1e4cd57c74fa14556be65b454687d
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Thu Jul 7 15:53:12 2011 -0400
> 
>     Btrfs: write out free inode cache before taking snapshots
>     
>     The btrfs snapshotting code requires that once a root has been
>     snapshotted, we don't change it during a commit
>     
>     But the free inode cache was changing the roots when it root the cache,
>     which lead to corruptions.
>     
>     This fixes things by making sure we write the cache while we are taking
>     the snapshot, and that we don't write it again later.
>     
>     Signed-off-by: Chris Mason <chris.mason@oracle.com>
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index bf0d615..d594cf7 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1651,6 +1651,7 @@ int __btrfs_add_free_space(struct btrfs_free_space_ctl *ctl,
>  	info->bytes = bytes;
>  
>  	spin_lock(&ctl->tree_lock);
> +	ctl->dirty = 1;
>  
>  	if (try_merge_free_space(ctl, info, true))
>  		goto link;
> @@ -1691,6 +1692,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group,
>  	int ret = 0;
>  
>  	spin_lock(&ctl->tree_lock);
> +	ctl->dirty = 1;
>  
>  again:
>  	info = tree_search_offset(ctl, offset, 0, 0);
> @@ -2589,6 +2591,7 @@ u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root)
>  		if (entry->bytes == 0)
>  			free_bitmap(ctl, entry);
>  	}
> +	ctl->dirty = 1;
>  out:
>  	spin_unlock(&ctl->tree_lock);
>  
> @@ -2688,6 +2691,10 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>  		printk(KERN_ERR "btrfs: failed to write free ino cache "
>  		       "for root %llu\n", root->root_key.objectid);
>  
> +	/* we write out at transaction commit time, there's no racing. */
> +	if (ret == 0)
> +		ctl->dirty = 0;
> +
>  	iput(inode);
>  	return ret;
>  }
> diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> index 8f2613f..1e92c93 100644
> --- a/fs/btrfs/free-space-cache.h
> +++ b/fs/btrfs/free-space-cache.h
> @@ -35,6 +35,11 @@ struct btrfs_free_space_ctl {
>  	int free_extents;
>  	int total_bitmaps;
>  	int unit;
> +	/*
> +	 * record if we've changed since written.  This can turn
> +	 * into a bit field if we need more flags
> +	 */
> +	unsigned long dirty;
>  	u64 start;
>  	struct btrfs_free_space_op *op;
>  	void *private;
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index b4087e0..e7c1493 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -376,6 +376,7 @@ void btrfs_init_free_ino_ctl(struct btrfs_root *root)
>  	ctl->start = 0;
>  	ctl->private = NULL;
>  	ctl->op = &free_ino_op;
> +	ctl->dirty = 1;
>  
>  	/*
>  	 * Initially we allow to use 16K of ram to cache chunks of
> @@ -417,6 +418,9 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	if (!btrfs_test_opt(root, INODE_MAP_CACHE))
>  		return 0;
>  
> +	if (!ctl->dirty)
> +		return 0;
> +
>  	path = btrfs_alloc_path();
>  	if (!path)
>  		return -ENOMEM;
> @@ -485,6 +489,24 @@ out:
>  	return ret;
>  }
>  
> +/*
> + * this tries to save the cache, but if it fails for any reason we clear
> + * the dirty flag so that it won't be saved again during this commit.
> + *
> + * This is used by the snapshotting code to make sure we don't corrupt the
> + * FS by saving the inode cache after the snapshot is taken.
> + */
> +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> +			       struct btrfs_trans_handle *trans)
> +{
> +	struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
> +	int ret;
> +	ret = btrfs_save_ino_cache(root, trans);
> +
> +	ctl->dirty = 0;
> +	return ret;
> +}
> +
>  static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
>  {
>  	struct btrfs_path *path;
> diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
> index ddb347b..2be060e 100644
> --- a/fs/btrfs/inode-map.h
> +++ b/fs/btrfs/inode-map.h
> @@ -7,7 +7,8 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid);
>  int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid);
>  int btrfs_save_ino_cache(struct btrfs_root *root,
>  			 struct btrfs_trans_handle *trans);
> -
> +int btrfs_force_save_ino_cache(struct btrfs_root *root,
> +			       struct btrfs_trans_handle *trans);
>  int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
>  
>  #endif
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 51dcec8..e34827c 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -966,6 +966,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	ret = btrfs_run_delayed_items(trans, root);
>  	BUG_ON(ret);
>  
> +	/*
> +	 * there are a few transient reasons the free inode cache writeback can fail.
> +	 * and if it does, we'll try again later in the commit.  This forces the
> +	 * clean bit, tossing the cache.  Tossing the cache isn't really good, but
> +	 * if we try to write it again later in the commit we'll corrupt things.
> +	 */
> +	btrfs_force_save_ino_cache(parent_root, trans);
> +
> +
>  	record_root_in_trans(trans, root);
>  	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>  	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-07-08  4:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-17 21:12 please review snapshot corruption path with delayed metadata insertion Chris Mason
2011-06-19  4:34 ` Tsutomu Itoh
2011-06-19 23:41   ` Tsutomu Itoh
2011-06-21  0:24     ` David Sterba
2011-06-21  0:40       ` Chris Mason
2011-06-21  1:15         ` Tsutomu Itoh
2011-06-23  8:52           ` Miao Xie
2011-06-30  6:32           ` Miao Xie
2011-06-30  6:52             ` Tsutomu Itoh
2011-07-01  8:11             ` Tsutomu Itoh
2011-07-07 20:26               ` Chris Mason
2011-07-07 23:51                 ` Tsutomu Itoh
2011-07-08  1:59                   ` Chris Mason
2011-07-08  4:50                 ` Tsutomu Itoh
2011-06-30  8:03 ` Miao Xie
2011-06-30  8:51   ` Miao Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.