All of lore.kernel.org
 help / color / mirror / Atom feed
* Btrfs moved my root tree off the end of the device
@ 2011-09-07 23:33 Justin Gottula
  2011-09-08  5:20 ` C Anthony Risinger
  2011-09-08  5:51 ` Arne Jansen
  0 siblings, 2 replies; 4+ messages in thread
From: Justin Gottula @ 2011-09-07 23:33 UTC (permalink / raw)
  To: linux-btrfs

Hi,


I recently created a Btrfs volume on top of a software (mdadm) raid5 array
(since Btrfs currently lacks raid5 support at the FS level). On this 640 GB
volume, I stored a ~400 GB tar file. After a couple weeks of use, I used 'btrfs
defragment' on this file in an effort to (a) defrag and (b) compress the file. I
made sure I was using the latest version of the userspace utilities (Btrfs
v0.19-35-g1b444cd-dirty) as well as kernel 3.0.

Now, this may or may not have had something to do with the drives being at 55
celsius, which I only discovered later, but after I had left this operation to
work overnight, I came back to some scary messages in the kernel log.
Immediately after the operation started (as far as I can tell), the following
messages showed up in the kernel log:


[17055.793912] btrfs: relocating block group 636489826304 flags 1

[17112.566998] attempt to access beyond end of device
[17112.567003] md127p1: rw=145, want=1248172032, limit=1248171999
[17112.567004] attempt to access beyond end of device
[17112.567006] md127p1: rw=145, want=1248172280, limit=1248171999
[17112.567008] attempt to access beyond end of device
[17112.567009] md127p1: rw=145, want=1248172416, limit=1248171999
[17112.567011] attempt to access beyond end of device
[17112.567012] md127p1: rw=145, want=1248172664, limit=1248171999
[17112.567014] attempt to access beyond end of device
[17112.567015] md127p1: rw=145, want=1248172912, limit=1248171999
[17112.567016] attempt to access beyond end of device
[17112.567018] md127p1: rw=145, want=1248172928, limit=1248171999
(thousands more of the above in rapid succession)


and occasionally a few of these:


[17157.916746] btrfs csum failed ino 258 off 8192 csum 2566472073 private
1679206033
[17157.916758] btrfs csum failed ino 258 off 12288 csum 2566472073 private
248979876
[17157.916771] btrfs csum failed ino 258 off 16384 csum 2566472073 private
3790022839


Then, later, another burst of the same device access warnings followed by this:


[20063.971837] ------------[ cut here ]------------
[20063.972050] kernel BUG at fs/btrfs/ctree.c:300!
[20063.972238] invalid opcode: 0000 [#1] PREEMPT SMP 
[20063.972666] CPU 0 
[20063.972669] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
[20063.974670] 
[20063.974670] Pid: 7425, comm: btrfs-endio-wri Not tainted 3.0-ARCH #1 System
manufacturer System Product Name/M4N68T-M-V2
[20063.974670] RIP: 0010:[<ffffffffa01a1df1>]  [<ffffffffa01a1df1>]
update_ref_for_cow+0x331/0x340 [btrfs]
[20063.974670] RSP: 0018:ffff8800236798b0  EFLAGS: 00010282
[20063.974670] RAX: 00000000fffffffb RBX: ffff88002539f000 RCX: 00000001bc22b580
[20063.974670] RDX: 00000001bc22b540 RSI: 000060ffc24024f0 RDI: ffffea0000cb7930
[20063.974670] RBP: ffff880023679900 R08: ffffffffa01a121a R09: 0000000000000000
[20063.974670] R10: 00000000fffffffb R11: 0000000000000001 R12: ffff880020a72a00
[20063.974670] R13: ffff880037c7f000 R14: ffff88002c24fc00 R15: ffff88002367997c
[20063.974670] FS:  00007f6c7ea81740(0000) GS:ffff88003d800000(0000)
knlGS:0000000000000000
[20063.974670] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[20063.974670] CR2: 00007fe40de62000 CR3: 0000000025978000 CR4: 00000000000006f0
[20063.974670] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20063.974670] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20063.974670] Process btrfs-endio-wri (pid: 7425, threadinfo ffff880023678000,
task ffff88003a43f300)
[20063.974670] Stack:
[20063.974670]  0000000000001000 ffff880023678000 ffff880023679fd8
ffff880025cb8010
[20063.974670]  ffff880023679910 ffff88002539f000 ffff88002c24fc00
ffff880020a72a00
[20063.974670]  ffff880037c7f000 ffff880003955900 ffff8800236799b0
ffffffffa01a2276
[20063.974670] Call Trace:
[20063.974670]  [<ffffffffa01a2276>] __btrfs_cow_block+0x476/0x890 [btrfs]
[20063.974670]  [<ffffffffa01a27a8>] btrfs_cow_block+0x118/0x360 [btrfs]
[20063.974670]  [<ffffffffa01a7e0e>] btrfs_search_slot+0x1de/0x900 [btrfs]
[20063.974670]  [<ffffffffa01bac58>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
[20063.974670]  [<ffffffffa01d54c4>] btrfs_drop_extents+0x104/0xa10 [btrfs]
[20063.974670]  [<ffffffffa01cadf3>]
insert_reserved_file_extent.constprop.47+0x73/0x250 [btrfs]
[20063.974670]  [<ffffffffa01c6007>] ? start_transaction+0x97/0x2b0 [btrfs]
[20063.974670]  [<ffffffffa01cbf9a>] btrfs_finish_ordered_io+0x2da/0x320 [btrfs]
[20063.974670]  [<ffffffffa01cc02d>] btrfs_writepage_end_io_hook+0x4d/0xf0
[btrfs]
[20063.974670]  [<ffffffffa01e5c9f>] end_bio_extent_writepage+0x12f/0x170
[btrfs]
[20063.974670]  [<ffffffff811894fd>] bio_endio+0x1d/0x40
[20063.974670]  [<ffffffffa01becf6>] end_workqueue_fn+0x56/0x140 [btrfs]
[20063.974670]  [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs]
[20063.974670]  [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs]
[20063.974670]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
[20063.974670]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
[20063.974670]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
[20063.974670]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
[20063.974670] Code: 48 8b 45 b8 e9 1c fe ff ff f6 45 c9 01 0f 85 1c fe ff ff 0f
0b 0f 0b 48 89 45 b8 e8 ba f6 24 e1 48 8b 45 b8 e9 89 fd ff ff 0f 0b <0f> 0b 0f
0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
[20063.974670] RIP  [<ffffffffa01a1df1>] update_ref_for_cow+0x331/0x340 [btrfs]
[20063.974670]  RSP <ffff8800236798b0>
[20063.974670] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder
(output 0)
[20064.021164] ---[ end trace 5d092910be378bb8 ]---
[20098.068178] attempt to access beyond end of device
[20098.068495] md127p1: rw=0, want=1250270168, limit=1248171999
[20098.069269] attempt to access beyond end of device
[20098.070180] md127p1: rw=0, want=1250270168, limit=1248171999
[20098.071105] attempt to access beyond end of device
[20098.072280] md127p1: rw=0, want=1250532312, limit=1248171999
[20098.072973] BUG: unable to handle kernel NULL pointer dereference at
0000000000000030
[20098.073881] IP: [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs]
[20098.074824] PGD 39989067 PUD 39aed067 PMD 0 
[20098.075361] Oops: 0000 [#2] PREEMPT SMP 
[20098.075361] CPU 0 
[20098.075361] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
[20098.075361] 
[20098.075361] Pid: 7416, comm: btrfs-transacti Tainted: G      D     3.0-ARCH
#1 System manufacturer System Product Name/M4N68T-M-V2
[20098.075361] RIP: 0010:[<ffffffffa01b8027>]  [<ffffffffa01b8027>]
btrfs_print_leaf+0x37/0x880 [btrfs]
[20098.075361] RSP: 0018:ffff880009437af0  EFLAGS: 00010286
[20098.075361] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: ffff880009437fd8
[20098.075361] RDX: ffff880009436000 RSI: 0000000000000000 RDI: ffff88002539f800
[20098.075361] RBP: ffff880009437b80 R08: 0000000000000001 R09: 0000000000000000
[20098.075361] R10: 00000000fffffffb R11: 0000000000000001 R12: 0000000000000101
[20098.075361] R13: 00000094db37b000 R14: 00000000fffffffb R15: 0000000000001000
[20098.075361] FS:  00007fdcd1611700(0000) GS:ffff88003d800000(0000)
knlGS:0000000000000000
[20098.075361] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[20098.075361] CR2: 0000000000000030 CR3: 0000000039abd000 CR4: 00000000000006f0
[20098.075361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20098.075361] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20098.075361] Process btrfs-transacti (pid: 7416, threadinfo ffff880009436000,
task ffff88003ba2dd70)
[20098.075361] Stack:
[20098.075361]  0000000000002000 0000000000001000 ffff880009437c28
00000094db37b000
[20098.075361]  0000000000000000 ffffffff81142ad3 ffff88002539f800
0000000000000090
[20098.075361]  0000000000000090 000080503a22a120 a800000094db37b0
0000000000001000
[20098.075361] Call Trace:
[20098.075361]  [<ffffffff81142ad3>] ? kmem_cache_alloc+0x133/0x160
[20098.075361]  [<ffffffffa01b02df>] __btrfs_free_extent+0x61f/0x6d0 [btrfs]
[20098.075361]  [<ffffffffa01b2b6d>] ? run_clustered_refs+0x11d/0x830 [btrfs]
[20098.075361]  [<ffffffffa01b2dd9>] run_clustered_refs+0x389/0x830 [btrfs]
[20098.075361]  [<ffffffff813f0b6e>] ? schedule+0x34e/0x9f0
[20098.075361]  [<ffffffffa0205400>] ? btrfs_find_ref_cluster+0x20/0x190 [btrfs]
[20098.075361]  [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
[20098.075361]  [<ffffffff813f27ee>] ? mutex_unlock+0xe/0x10
[20098.075361]  [<ffffffffa01c5381>] btrfs_commit_transaction+0x81/0x8a0 [btrfs]
[20098.075361]  [<ffffffff8107f7f0>] ? abort_exclusive_wait+0xb0/0xb0
[20098.075361]  [<ffffffffa01be3ed>] transaction_kthread+0x26d/0x290 [btrfs]
[20098.075361]  [<ffffffffa01be180>] ? btrfs_congested_fn+0xd0/0xd0 [btrfs]
[20098.075361]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
[20098.075361]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
[20098.075361]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
[20098.075361]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
[20098.075361] Code: 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66
66 66 66 90 65 48 8b 0c 25 48 cd 00 00 48 8d 91 28 e0 ff ff 48 89 f3 
[20098.075361] RIP  [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs]
[20098.075361]  RSP <ffff880009437af0>
[20098.075361] CR2: 0000000000000030
[20098.121409] ---[ end trace 5d092910be378bb9 ]---


And some time later, this:


[36853.419291] ------------[ cut here ]------------
[36853.419852] kernel BUG at fs/btrfs/extent-tree.c:5510!
[36853.419852] invalid opcode: 0000 [#3] PREEMPT SMP 
[36853.419852] CPU 0 
[36853.419852] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
[36853.419852] 
[36853.419852] Pid: 7413, comm: btrfs-delayed-m Tainted: G      D     3.0-ARCH
#1 System manufacturer System Product Name/M4N68T-M-V2
[36853.419852] RIP: 0010:[<ffffffffa01b3263>]  [<ffffffffa01b3263>]
run_clustered_refs+0x813/0x830 [btrfs]
[36853.419852] RSP: 0018:ffff880025c13c10  EFLAGS: 00010282
[36853.419852] RAX: 00000000fffffffb RBX: ffff880025dfc2a0 RCX: ffff88002a2ceb80
[36853.419852] RDX: 0000000000000008 RSI: 00000094db37a000 RDI: 0000000000000000
[36853.419852] RBP: ffff880025c13cf0 R08: 0000000000000001 R09: 0000000000000000
[36853.419852] R10: ffff88002539f800 R11: ffff880037c7f090 R12: ffff88002869c3c0
[36853.419852] R13: ffff88001c5b7200 R14: ffff88000a66a1e0 R15: 0000000000000000
[36853.419852] FS:  00007fd708110700(0000) GS:ffff88003d800000(0000)
knlGS:0000000000000000
[36853.419852] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[36853.419852] CR2: 00000000017bd048 CR3: 000000001e1cc000 CR4: 00000000000006f0
[36853.419852] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[36853.419852] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[36853.419852] Process btrfs-delayed-m (pid: 7413, threadinfo ffff880025c12000,
task ffff88003a7bebd0)
[36853.419852] Stack:
[36853.419852]  ffff880025398800 0000000000000000 ffff880025c13c60
ffffffffa01a45f1
[36853.419852]  ffff880025c13c40 000000033a22a5a0 0000000000000000
0000000000000002
[36853.419852]  ffff880026b69d00 ffff880025c13d28 0000000000000000
ffff880025c13d30
[36853.419852] Call Trace:
[36853.419852]  [<ffffffffa01a45f1>] ? bin_search+0x81/0x130 [btrfs]
[36853.419852]  [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
[36853.419852]  [<ffffffffa01c5c53>] __btrfs_end_transaction+0x83/0x320 [btrfs]
[36853.419852]  [<ffffffffa01c5f08>] btrfs_end_transaction_dmeta+0x18/0x20
[btrfs]
[36853.419852]  [<ffffffffa020f8b8>]
btrfs_async_run_delayed_node_done+0xd8/0x1a0 [btrfs]
[36853.419852]  [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs]
[36853.419852]  [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs]
[36853.419852]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
[36853.419852]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
[36853.419852]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
[36853.419852]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
[36853.419852] Code: e0 e9 12 f9 ff ff 0f 0b 80 fa b2 0f 84 bc f9 ff ff 0f 0b be
95 00 00 00 48 c7 c7 37 a0 21 a0 e8 d4 95 ea e0 e9 04 f9 ff ff 0f 0b <0f> 0b 0f
0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 66 2e 
[36853.419852] RIP  [<ffffffffa01b3263>] run_clustered_refs+0x813/0x830 [btrfs]
[36853.419852]  RSP <ffff880025c13c10>
[36853.419852] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder
(output 0)
[36853.473157] ---[ end trace 5d092910be378bba ]---


On the next boot, the volume wouldn't mount:


[    7.010353] btrfs: use zlib compression
[    7.040893] attempt to access beyond end of device
[    7.040896] md126p1: rw=0, want=1250270312, limit=1248171999
[    7.040902] attempt to access beyond end of device
[    7.040904] md126p1: rw=0, want=1250270312, limit=1248171999
[    7.040908] attempt to access beyond end of device
[    7.040910] md126p1: rw=0, want=1250532456, limit=1248171999
[    7.040914] btrfs: failed to read tree root on md126p1
[    7.046842] btrfs: open_ctree failed


At this point, I took a disk image and dived in, and in doing so discovered that
somehow, there were CHUNK_ITEMs in the chunk tree that referred to physical
address ranges that were entirely outside of the device (matching up to the
ranges showing up in the kernel log over and over). Evidently, the filesystem
driver thought it should move the root tree onto a chunk that existed at a
nonexistant offset in the device. I checked the superblocks and verified that
the total_bytes fields matched up correctly to the actual device size, which
leaves me wondering how those chunks ever got there.

I hope this report proves useful for finding the bug responsible for this. If
complete logs, details about versions used or steps taken, diagnostic output
from system commands, or particular pieces of the disk image would be helpful
for debugging this problem, I can provide them.


Thanks,

Justin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs moved my root tree off the end of the device
  2011-09-07 23:33 Btrfs moved my root tree off the end of the device Justin Gottula
@ 2011-09-08  5:20 ` C Anthony Risinger
  2011-09-08  5:51 ` Arne Jansen
  1 sibling, 0 replies; 4+ messages in thread
From: C Anthony Risinger @ 2011-09-08  5:20 UTC (permalink / raw)
  To: Justin Gottula; +Cc: linux-btrfs

On Wed, Sep 7, 2011 at 6:33 PM, Justin Gottula <justin@jgottula.com> wrote:
>
> Hi,
>
> I recently created a Btrfs volume on top of a software (mdadm) raid5 array
> (since Btrfs currently lacks raid5 support at the FS level). On this 640 GB
> volume, I stored a ~400 GB tar file. After a couple weeks of use, I used 'btrfs
> defragment' on this file in an effort to (a) defrag and (b) compress the file. I
> made sure I was using the latest version of the userspace utilities (Btrfs
> v0.19-35-g1b444cd-dirty) as well as kernel 3.0.

did you use the integration branch, ie:

http://git.darksatanic.net/repo/btrfs-progs-unstable.git

... this has the latest code for the time being.  looks like
`integration-20110805` is the most recent head.

> <snip>
>
> At this point, I took a disk image and dived in, and in doing so discovered that
> somehow, there were CHUNK_ITEMs in the chunk tree that referred to physical
> address ranges that were entirely outside of the device (matching up to the
> ranges showing up in the kernel log over and over). Evidently, the filesystem
> driver thought it should move the root tree onto a chunk that existed at a
> nonexistant offset in the device. I checked the superblocks and verified that
> the total_bytes fields matched up correctly to the actual device size, which
> leaves me wondering how those chunks ever got there.

i could be way of base here, but your report reminded me of:

[thread] http://www.spinics.net/lists/linux-btrfs/index.html#12121

-----------------------------------------------
extent data backref root 5 objectid 258 offset 18446744073709543424 count 1
extent data backref root 5 objectid 257 offset 0 count 1

> So I think we have to live with this defect, just fix relocation for
> the negative offset case ?

I prefer fixing relocation.
-----------------------------------------------

... which, if i understood correctly, surfaced some issues with
relocation that could cause the offset to be grossly inaccurate (eg.
off the device completely?)

could of course be completely unrelated :-)

-- 

C Anthony

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs moved my root tree off the end of the device
  2011-09-07 23:33 Btrfs moved my root tree off the end of the device Justin Gottula
  2011-09-08  5:20 ` C Anthony Risinger
@ 2011-09-08  5:51 ` Arne Jansen
  2011-09-08  7:29   ` Justin Gottula
  1 sibling, 1 reply; 4+ messages in thread
From: Arne Jansen @ 2011-09-08  5:51 UTC (permalink / raw)
  To: Justin Gottula; +Cc: linux-btrfs

On 08.09.2011 01:33, Justin Gottula wrote:
> Hi,
> 
> 
> I recently created a Btrfs volume on top of a software (mdadm) raid5 array
> (since Btrfs currently lacks raid5 support at the FS level). On this 640 GB
> volume, I stored a ~400 GB tar file. After a couple weeks of use, I used 'btrfs
> defragment' on this file in an effort to (a) defrag and (b) compress the file. I
> made sure I was using the latest version of the userspace utilities (Btrfs
> v0.19-35-g1b444cd-dirty) as well as kernel 3.0.

I think Liu Bo posted a fix for this a while ago:

[PATCH] Btrfs: fix a bug of balance on full multi-disk partitions

-Arne

> Now, this may or may not have had something to do with the drives being at 55
> celsius, which I only discovered later, but after I had left this operation to
> work overnight, I came back to some scary messages in the kernel log.
> Immediately after the operation started (as far as I can tell), the following
> messages showed up in the kernel log:
> 
> 
> [17055.793912] btrfs: relocating block group 636489826304 flags 1
> 
> [17112.566998] attempt to access beyond end of device
> [17112.567003] md127p1: rw=145, want=1248172032, limit=1248171999
> [17112.567004] attempt to access beyond end of device
> [17112.567006] md127p1: rw=145, want=1248172280, limit=1248171999
> [17112.567008] attempt to access beyond end of device
> [17112.567009] md127p1: rw=145, want=1248172416, limit=1248171999
> [17112.567011] attempt to access beyond end of device
> [17112.567012] md127p1: rw=145, want=1248172664, limit=1248171999
> [17112.567014] attempt to access beyond end of device
> [17112.567015] md127p1: rw=145, want=1248172912, limit=1248171999
> [17112.567016] attempt to access beyond end of device
> [17112.567018] md127p1: rw=145, want=1248172928, limit=1248171999
> (thousands more of the above in rapid succession)
> 
> 
> and occasionally a few of these:
> 
> 
> [17157.916746] btrfs csum failed ino 258 off 8192 csum 2566472073 private
> 1679206033
> [17157.916758] btrfs csum failed ino 258 off 12288 csum 2566472073 private
> 248979876
> [17157.916771] btrfs csum failed ino 258 off 16384 csum 2566472073 private
> 3790022839
> 
> 
> Then, later, another burst of the same device access warnings followed by this:
> 
> 
> [20063.971837] ------------[ cut here ]------------
> [20063.972050] kernel BUG at fs/btrfs/ctree.c:300!
> [20063.972238] invalid opcode: 0000 [#1] PREEMPT SMP 
> [20063.972666] CPU 0 
> [20063.972669] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
> drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
> parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
> edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
> snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
> raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
> raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
> ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
> [20063.974670] 
> [20063.974670] Pid: 7425, comm: btrfs-endio-wri Not tainted 3.0-ARCH #1 System
> manufacturer System Product Name/M4N68T-M-V2
> [20063.974670] RIP: 0010:[<ffffffffa01a1df1>]  [<ffffffffa01a1df1>]
> update_ref_for_cow+0x331/0x340 [btrfs]
> [20063.974670] RSP: 0018:ffff8800236798b0  EFLAGS: 00010282
> [20063.974670] RAX: 00000000fffffffb RBX: ffff88002539f000 RCX: 00000001bc22b580
> [20063.974670] RDX: 00000001bc22b540 RSI: 000060ffc24024f0 RDI: ffffea0000cb7930
> [20063.974670] RBP: ffff880023679900 R08: ffffffffa01a121a R09: 0000000000000000
> [20063.974670] R10: 00000000fffffffb R11: 0000000000000001 R12: ffff880020a72a00
> [20063.974670] R13: ffff880037c7f000 R14: ffff88002c24fc00 R15: ffff88002367997c
> [20063.974670] FS:  00007f6c7ea81740(0000) GS:ffff88003d800000(0000)
> knlGS:0000000000000000
> [20063.974670] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [20063.974670] CR2: 00007fe40de62000 CR3: 0000000025978000 CR4: 00000000000006f0
> [20063.974670] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [20063.974670] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [20063.974670] Process btrfs-endio-wri (pid: 7425, threadinfo ffff880023678000,
> task ffff88003a43f300)
> [20063.974670] Stack:
> [20063.974670]  0000000000001000 ffff880023678000 ffff880023679fd8
> ffff880025cb8010
> [20063.974670]  ffff880023679910 ffff88002539f000 ffff88002c24fc00
> ffff880020a72a00
> [20063.974670]  ffff880037c7f000 ffff880003955900 ffff8800236799b0
> ffffffffa01a2276
> [20063.974670] Call Trace:
> [20063.974670]  [<ffffffffa01a2276>] __btrfs_cow_block+0x476/0x890 [btrfs]
> [20063.974670]  [<ffffffffa01a27a8>] btrfs_cow_block+0x118/0x360 [btrfs]
> [20063.974670]  [<ffffffffa01a7e0e>] btrfs_search_slot+0x1de/0x900 [btrfs]
> [20063.974670]  [<ffffffffa01bac58>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
> [20063.974670]  [<ffffffffa01d54c4>] btrfs_drop_extents+0x104/0xa10 [btrfs]
> [20063.974670]  [<ffffffffa01cadf3>]
> insert_reserved_file_extent.constprop.47+0x73/0x250 [btrfs]
> [20063.974670]  [<ffffffffa01c6007>] ? start_transaction+0x97/0x2b0 [btrfs]
> [20063.974670]  [<ffffffffa01cbf9a>] btrfs_finish_ordered_io+0x2da/0x320 [btrfs]
> [20063.974670]  [<ffffffffa01cc02d>] btrfs_writepage_end_io_hook+0x4d/0xf0
> [btrfs]
> [20063.974670]  [<ffffffffa01e5c9f>] end_bio_extent_writepage+0x12f/0x170
> [btrfs]
> [20063.974670]  [<ffffffff811894fd>] bio_endio+0x1d/0x40
> [20063.974670]  [<ffffffffa01becf6>] end_workqueue_fn+0x56/0x140 [btrfs]
> [20063.974670]  [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs]
> [20063.974670]  [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs]
> [20063.974670]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
> [20063.974670]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
> [20063.974670]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
> [20063.974670]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
> [20063.974670] Code: 48 8b 45 b8 e9 1c fe ff ff f6 45 c9 01 0f 85 1c fe ff ff 0f
> 0b 0f 0b 48 89 45 b8 e8 ba f6 24 e1 48 8b 45 b8 e9 89 fd ff ff 0f 0b <0f> 0b 0f
> 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
> [20063.974670] RIP  [<ffffffffa01a1df1>] update_ref_for_cow+0x331/0x340 [btrfs]
> [20063.974670]  RSP <ffff8800236798b0>
> [20063.974670] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder
> (output 0)
> [20064.021164] ---[ end trace 5d092910be378bb8 ]---
> [20098.068178] attempt to access beyond end of device
> [20098.068495] md127p1: rw=0, want=1250270168, limit=1248171999
> [20098.069269] attempt to access beyond end of device
> [20098.070180] md127p1: rw=0, want=1250270168, limit=1248171999
> [20098.071105] attempt to access beyond end of device
> [20098.072280] md127p1: rw=0, want=1250532312, limit=1248171999
> [20098.072973] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000030
> [20098.073881] IP: [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs]
> [20098.074824] PGD 39989067 PUD 39aed067 PMD 0 
> [20098.075361] Oops: 0000 [#2] PREEMPT SMP 
> [20098.075361] CPU 0 
> [20098.075361] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
> drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
> parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
> edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
> snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
> raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
> raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
> ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
> [20098.075361] 
> [20098.075361] Pid: 7416, comm: btrfs-transacti Tainted: G      D     3.0-ARCH
> #1 System manufacturer System Product Name/M4N68T-M-V2
> [20098.075361] RIP: 0010:[<ffffffffa01b8027>]  [<ffffffffa01b8027>]
> btrfs_print_leaf+0x37/0x880 [btrfs]
> [20098.075361] RSP: 0018:ffff880009437af0  EFLAGS: 00010286
> [20098.075361] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: ffff880009437fd8
> [20098.075361] RDX: ffff880009436000 RSI: 0000000000000000 RDI: ffff88002539f800
> [20098.075361] RBP: ffff880009437b80 R08: 0000000000000001 R09: 0000000000000000
> [20098.075361] R10: 00000000fffffffb R11: 0000000000000001 R12: 0000000000000101
> [20098.075361] R13: 00000094db37b000 R14: 00000000fffffffb R15: 0000000000001000
> [20098.075361] FS:  00007fdcd1611700(0000) GS:ffff88003d800000(0000)
> knlGS:0000000000000000
> [20098.075361] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [20098.075361] CR2: 0000000000000030 CR3: 0000000039abd000 CR4: 00000000000006f0
> [20098.075361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [20098.075361] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [20098.075361] Process btrfs-transacti (pid: 7416, threadinfo ffff880009436000,
> task ffff88003ba2dd70)
> [20098.075361] Stack:
> [20098.075361]  0000000000002000 0000000000001000 ffff880009437c28
> 00000094db37b000
> [20098.075361]  0000000000000000 ffffffff81142ad3 ffff88002539f800
> 0000000000000090
> [20098.075361]  0000000000000090 000080503a22a120 a800000094db37b0
> 0000000000001000
> [20098.075361] Call Trace:
> [20098.075361]  [<ffffffff81142ad3>] ? kmem_cache_alloc+0x133/0x160
> [20098.075361]  [<ffffffffa01b02df>] __btrfs_free_extent+0x61f/0x6d0 [btrfs]
> [20098.075361]  [<ffffffffa01b2b6d>] ? run_clustered_refs+0x11d/0x830 [btrfs]
> [20098.075361]  [<ffffffffa01b2dd9>] run_clustered_refs+0x389/0x830 [btrfs]
> [20098.075361]  [<ffffffff813f0b6e>] ? schedule+0x34e/0x9f0
> [20098.075361]  [<ffffffffa0205400>] ? btrfs_find_ref_cluster+0x20/0x190 [btrfs]
> [20098.075361]  [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
> [20098.075361]  [<ffffffff813f27ee>] ? mutex_unlock+0xe/0x10
> [20098.075361]  [<ffffffffa01c5381>] btrfs_commit_transaction+0x81/0x8a0 [btrfs]
> [20098.075361]  [<ffffffff8107f7f0>] ? abort_exclusive_wait+0xb0/0xb0
> [20098.075361]  [<ffffffffa01be3ed>] transaction_kthread+0x26d/0x290 [btrfs]
> [20098.075361]  [<ffffffffa01be180>] ? btrfs_congested_fn+0xd0/0xd0 [btrfs]
> [20098.075361]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
> [20098.075361]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
> [20098.075361]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
> [20098.075361]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
> [20098.075361] Code: 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66
> 66 66 66 90 65 48 8b 0c 25 48 cd 00 00 48 8d 91 28 e0 ff ff 48 89 f3 
> [20098.075361] RIP  [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs]
> [20098.075361]  RSP <ffff880009437af0>
> [20098.075361] CR2: 0000000000000030
> [20098.121409] ---[ end trace 5d092910be378bb9 ]---
> 
> 
> And some time later, this:
> 
> 
> [36853.419291] ------------[ cut here ]------------
> [36853.419852] kernel BUG at fs/btrfs/extent-tree.c:5510!
> [36853.419852] invalid opcode: 0000 [#3] PREEMPT SMP 
> [36853.419852] CPU 0 
> [36853.419852] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
> drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
> parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
> edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep snd_pcm
> snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
> raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
> raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
> ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
> [36853.419852] 
> [36853.419852] Pid: 7413, comm: btrfs-delayed-m Tainted: G      D     3.0-ARCH
> #1 System manufacturer System Product Name/M4N68T-M-V2
> [36853.419852] RIP: 0010:[<ffffffffa01b3263>]  [<ffffffffa01b3263>]
> run_clustered_refs+0x813/0x830 [btrfs]
> [36853.419852] RSP: 0018:ffff880025c13c10  EFLAGS: 00010282
> [36853.419852] RAX: 00000000fffffffb RBX: ffff880025dfc2a0 RCX: ffff88002a2ceb80
> [36853.419852] RDX: 0000000000000008 RSI: 00000094db37a000 RDI: 0000000000000000
> [36853.419852] RBP: ffff880025c13cf0 R08: 0000000000000001 R09: 0000000000000000
> [36853.419852] R10: ffff88002539f800 R11: ffff880037c7f090 R12: ffff88002869c3c0
> [36853.419852] R13: ffff88001c5b7200 R14: ffff88000a66a1e0 R15: 0000000000000000
> [36853.419852] FS:  00007fd708110700(0000) GS:ffff88003d800000(0000)
> knlGS:0000000000000000
> [36853.419852] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [36853.419852] CR2: 00000000017bd048 CR3: 000000001e1cc000 CR4: 00000000000006f0
> [36853.419852] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [36853.419852] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [36853.419852] Process btrfs-delayed-m (pid: 7413, threadinfo ffff880025c12000,
> task ffff88003a7bebd0)
> [36853.419852] Stack:
> [36853.419852]  ffff880025398800 0000000000000000 ffff880025c13c60
> ffffffffa01a45f1
> [36853.419852]  ffff880025c13c40 000000033a22a5a0 0000000000000000
> 0000000000000002
> [36853.419852]  ffff880026b69d00 ffff880025c13d28 0000000000000000
> ffff880025c13d30
> [36853.419852] Call Trace:
> [36853.419852]  [<ffffffffa01a45f1>] ? bin_search+0x81/0x130 [btrfs]
> [36853.419852]  [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
> [36853.419852]  [<ffffffffa01c5c53>] __btrfs_end_transaction+0x83/0x320 [btrfs]
> [36853.419852]  [<ffffffffa01c5f08>] btrfs_end_transaction_dmeta+0x18/0x20
> [btrfs]
> [36853.419852]  [<ffffffffa020f8b8>]
> btrfs_async_run_delayed_node_done+0xd8/0x1a0 [btrfs]
> [36853.419852]  [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs]
> [36853.419852]  [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs]
> [36853.419852]  [<ffffffff8107ee9c>] kthread+0x8c/0xa0
> [36853.419852]  [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10
> [36853.419852]  [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190
> [36853.419852]  [<ffffffff813f5b60>] ? gs_change+0x13/0x13
> [36853.419852] Code: e0 e9 12 f9 ff ff 0f 0b 80 fa b2 0f 84 bc f9 ff ff 0f 0b be
> 95 00 00 00 48 c7 c7 37 a0 21 a0 e8 d4 95 ea e0 e9 04 f9 ff ff 0f 0b <0f> 0b 0f
> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 66 2e 
> [36853.419852] RIP  [<ffffffffa01b3263>] run_clustered_refs+0x813/0x830 [btrfs]
> [36853.419852]  RSP <ffff880025c13c10>
> [36853.419852] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder
> (output 0)
> [36853.473157] ---[ end trace 5d092910be378bba ]---
> 
> 
> On the next boot, the volume wouldn't mount:
> 
> 
> [    7.010353] btrfs: use zlib compression
> [    7.040893] attempt to access beyond end of device
> [    7.040896] md126p1: rw=0, want=1250270312, limit=1248171999
> [    7.040902] attempt to access beyond end of device
> [    7.040904] md126p1: rw=0, want=1250270312, limit=1248171999
> [    7.040908] attempt to access beyond end of device
> [    7.040910] md126p1: rw=0, want=1250532456, limit=1248171999
> [    7.040914] btrfs: failed to read tree root on md126p1
> [    7.046842] btrfs: open_ctree failed
> 
> 
> At this point, I took a disk image and dived in, and in doing so discovered that
> somehow, there were CHUNK_ITEMs in the chunk tree that referred to physical
> address ranges that were entirely outside of the device (matching up to the
> ranges showing up in the kernel log over and over). Evidently, the filesystem
> driver thought it should move the root tree onto a chunk that existed at a
> nonexistant offset in the device. I checked the superblocks and verified that
> the total_bytes fields matched up correctly to the actual device size, which
> leaves me wondering how those chunks ever got there.
> 
> I hope this report proves useful for finding the bug responsible for this. If
> complete logs, details about versions used or steps taken, diagnostic output
> from system commands, or particular pieces of the disk image would be helpful
> for debugging this problem, I can provide them.
> 
> 
> Thanks,
> 
> Justin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs moved my root tree off the end of the device
  2011-09-08  5:51 ` Arne Jansen
@ 2011-09-08  7:29   ` Justin Gottula
  0 siblings, 0 replies; 4+ messages in thread
From: Justin Gottula @ 2011-09-08  7:29 UTC (permalink / raw)
  To: Arne Jansen; +Cc: linux-btrfs

On Wed, Sep 7, 2011 at 10:51 PM, Arne Jansen <sensille@gmx.net> wrote:
> I think Liu Bo posted a fix for this a while ago:
>
> [PATCH] Btrfs: fix a bug of balance on full multi-disk partitions
>
> -Arne

The behavior that this patch describes looks pretty much exactly like
what I was seeing, although in my situation the reason for the
relocation was due to defragging rather than multi-device balancing
(not sure if that ultimately makes any difference based on what he's
saying is responsible for the problem). I may be able to find a way to
reliably reproduce the buggy behavior, and if so, I'll try the
bleeding edge git kernel (which appears to have this patch by now) and
see if the problem goes away.


Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-08  7:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-07 23:33 Btrfs moved my root tree off the end of the device Justin Gottula
2011-09-08  5:20 ` C Anthony Risinger
2011-09-08  5:51 ` Arne Jansen
2011-09-08  7:29   ` Justin Gottula

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.