* Nasty corruption on large array, ideas welcome @ 2019-01-08 19:33 Thiago Ramon 2019-01-09 0:05 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Thiago Ramon @ 2019-01-08 19:33 UTC (permalink / raw) To: linux-btrfs I have a pretty complicated setup here, so first a general description: 8 HDs: 4x5TB, 2x4TB, 2x8TB Each disk is a LVM PV containing a BCACHE backing device, which then contains the BTRFS disks. All the drives then were in writeback mode on a SSD BCACHE cache partition (terrible setup, I know, but without the caching the system was getting too slow to use). I had all my data, metadata and system blocks on RAID1, but as I'm running out of space, and the new kernels are getting better RAID5/6 support recently, I've finally decided to migrate to RAID6 and was starting it off with the metadata. It was running well (I was already expecting it to be slow, so no problem there), but I had to spend some days away from the machine. Due to an air conditioning failure, the room temperature went pretty high and one of the disks decided to die (apparently only temporarily). BCACHE couldn't write to the backing device anymore, so it ejected all drives and let them cope with it by themselves. I've caught the trouble some 12h later, still away, and shut down anything accessing the disks until I could be physically there to handle the issue. After I got back and got the temperature down to acceptable levels, I've checked the failed drive, which seems to be working well after getting re-inserted, but it's of course out of date with the rest of the drives. But apparently the rest got some corruption as well when they got ejected from the cache, and I'm getting some errors I haven't been able to handle. I've gone through the steps here that helped me before when having complicated crashes on this system, but this time it wasn't enough, and I'll need some advice from people who know the BTRFS internals better than me to get this back running. I have around 20TB of data in the drives, so copying the data out is the last resort, as I'd prefer to let most of it die than to buy a few disks to fit all of that. Now on to the errors: I've tried both with the "failed" drive in (which gives me additional transid errors) and without it. Trying to mount with it gives me: [Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag [ +0.000010] BTRFS info (device bcache0): disk space caching is enabled [ +0.671411] BTRFS error (device bcache0): parent transid verify failed on 77292724051968 wanted > 1499510 found 1499467 [ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2 block=77292724051968 slot=2, bad key order, prev (39029522223104 168 212992) current (39029521915904 168 16384) [ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5 [ +0.022884] BTRFS error (device bcache0): open_ctree failed Trying without the disk (and -o degraded) gives me: [Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag [ +0.000002] BTRFS info (device bcache1): allowing degraded mounts [ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated, use 'usebackuproot' instead [ +0.000000] BTRFS info (device bcache1): trying to use backup root at mount time[ +0.000002] BTRFS info (device bcache1): disabling disk space caching [ +0.000001] BTRFS info (device bcache1): force clearing of disk cache [ +0.001334] BTRFS warning (device bcache1): devid 2 uuid 27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing [ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2 block=77291982323712 slot=0, unexpected item end, have 685883288 expect 3995 [ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5 [ +0.017842] BTRFS error (device bcache1): open_ctree failed btrfs check output (without drive): warning, device 2 is missing checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6 checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 bytenr mismatch, want=77088164081664, have=274663271295232 Couldn't read chunk tree ERROR: cannot open file system I've already tried super-recover, zero-log and chunk-recover without any results, and check with --repair fails the same way as without. So, any ideas? I'll be happy to run experiments and grab more logs if anyone wants more details. And thanks for any suggestions. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-08 19:33 Nasty corruption on large array, ideas welcome Thiago Ramon @ 2019-01-09 0:05 ` Qu Wenruo 2019-01-10 15:50 ` Thiago Ramon 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2019-01-09 0:05 UTC (permalink / raw) To: Thiago Ramon, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 4911 bytes --] On 2019/1/9 上午3:33, Thiago Ramon wrote: > I have a pretty complicated setup here, so first a general description: > 8 HDs: 4x5TB, 2x4TB, 2x8TB > > Each disk is a LVM PV containing a BCACHE backing device, which then > contains the BTRFS disks. All the drives then were in writeback mode > on a SSD BCACHE cache partition (terrible setup, I know, but without > the caching the system was getting too slow to use). > > I had all my data, metadata and system blocks on RAID1, but as I'm > running out of space, and the new kernels are getting better RAID5/6 > support recently, I've finally decided to migrate to RAID6 and was > starting it off with the metadata. > > > It was running well (I was already expecting it to be slow, so no > problem there), but I had to spend some days away from the machine. > Due to an air conditioning failure, the room temperature went pretty > high and one of the disks decided to die (apparently only > temporarily). BCACHE couldn't write to the backing device anymore, so > it ejected all drives and let them cope with it by themselves. I've > caught the trouble some 12h later, still away, and shut down anything > accessing the disks until I could be physically there to handle the > issue. > > After I got back and got the temperature down to acceptable levels, > I've checked the failed drive, which seems to be working well after > getting re-inserted, but it's of course out of date with the rest of > the drives. But apparently the rest got some corruption as well when > they got ejected from the cache, and I'm getting some errors I haven't > been able to handle. > > I've gone through the steps here that helped me before when having > complicated crashes on this system, but this time it wasn't enough, > and I'll need some advice from people who know the BTRFS internals > better than me to get this back running. I have around 20TB of data in > the drives, so copying the data out is the last resort, as I'd prefer > to let most of it die than to buy a few disks to fit all of that. > > > Now on to the errors: > > I've tried both with the "failed" drive in (which gives me additional > transid errors) and without it. > > Trying to mount with it gives me: > [Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag > [ +0.000010] BTRFS info (device bcache0): disk space caching is enabled > [ +0.671411] BTRFS error (device bcache0): parent transid verify > failed on 77292724051968 wanted > 1499510 found 1499467 > [ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2 > block=77292724051968 slot=2, bad key order, prev (39029522223104 168 > 212992) current (39029521915904 168 16384) Heavily corrupted extent tree. And there is a very good experimental patch for you: https://patchwork.kernel.org/patch/10738583/ Then go mount with "skip_bg,ro" mount option. Please note this can only help you to salvage data (kernel version of btrfs-store). AFAIK, the corruption may affect fs trees too, so be aware of corrupted data. Thanks, Qu > [ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5 > [ +0.022884] BTRFS error (device bcache0): open_ctree failed > > Trying without the disk (and -o degraded) gives me: > [Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag > [ +0.000002] BTRFS info (device bcache1): allowing degraded mounts > [ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated, > use 'usebackuproot' instead > [ +0.000000] BTRFS info (device bcache1): trying to use backup root at > mount time[ +0.000002] BTRFS info (device bcache1): disabling disk > space caching > [ +0.000001] BTRFS info (device bcache1): force clearing of disk cache > [ +0.001334] BTRFS warning (device bcache1): devid 2 uuid > 27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing > [ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2 > block=77291982323712 slot=0, unexpected item end, have 685883288 > expect 3995 > [ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5 > [ +0.017842] BTRFS error (device bcache1): open_ctree failed > > btrfs check output (without drive): > warning, device 2 is missing > checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6 > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 > bytenr mismatch, want=77088164081664, have=274663271295232 > Couldn't read chunk tree > ERROR: cannot open file system > > I've already tried super-recover, zero-log and chunk-recover without > any results, and check with --repair fails the same way as without. > > So, any ideas? I'll be happy to run experiments and grab more logs if > anyone wants more details. > > > And thanks for any suggestions. > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-09 0:05 ` Qu Wenruo @ 2019-01-10 15:50 ` Thiago Ramon 2019-01-22 16:41 ` Thiago Ramon 0 siblings, 1 reply; 9+ messages in thread From: Thiago Ramon @ 2019-01-10 15:50 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Tue, Jan 8, 2019 at 10:05 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/1/9 上午3:33, Thiago Ramon wrote: > > I have a pretty complicated setup here, so first a general description: > > 8 HDs: 4x5TB, 2x4TB, 2x8TB > > > > Each disk is a LVM PV containing a BCACHE backing device, which then > > contains the BTRFS disks. All the drives then were in writeback mode > > on a SSD BCACHE cache partition (terrible setup, I know, but without > > the caching the system was getting too slow to use). > > > > I had all my data, metadata and system blocks on RAID1, but as I'm > > running out of space, and the new kernels are getting better RAID5/6 > > support recently, I've finally decided to migrate to RAID6 and was > > starting it off with the metadata. > > > > > > It was running well (I was already expecting it to be slow, so no > > problem there), but I had to spend some days away from the machine. > > Due to an air conditioning failure, the room temperature went pretty > > high and one of the disks decided to die (apparently only > > temporarily). BCACHE couldn't write to the backing device anymore, so > > it ejected all drives and let them cope with it by themselves. I've > > caught the trouble some 12h later, still away, and shut down anything > > accessing the disks until I could be physically there to handle the > > issue. > > > > After I got back and got the temperature down to acceptable levels, > > I've checked the failed drive, which seems to be working well after > > getting re-inserted, but it's of course out of date with the rest of > > the drives. But apparently the rest got some corruption as well when > > they got ejected from the cache, and I'm getting some errors I haven't > > been able to handle. > > > > I've gone through the steps here that helped me before when having > > complicated crashes on this system, but this time it wasn't enough, > > and I'll need some advice from people who know the BTRFS internals > > better than me to get this back running. I have around 20TB of data in > > the drives, so copying the data out is the last resort, as I'd prefer > > to let most of it die than to buy a few disks to fit all of that. > > > > > > Now on to the errors: > > > > I've tried both with the "failed" drive in (which gives me additional > > transid errors) and without it. > > > > Trying to mount with it gives me: > > [Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag > > [ +0.000010] BTRFS info (device bcache0): disk space caching is enabled > > [ +0.671411] BTRFS error (device bcache0): parent transid verify > > failed on 77292724051968 wanted > 1499510 found 1499467 > > [ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2 > > block=77292724051968 slot=2, bad key order, prev (39029522223104 168 > > 212992) current (39029521915904 168 16384) > > Heavily corrupted extent tree. > > And there is a very good experimental patch for you: > https://patchwork.kernel.org/patch/10738583/ > > Then go mount with "skip_bg,ro" mount option. > > Please note this can only help you to salvage data (kernel version of > btrfs-store). > > AFAIK, the corruption may affect fs trees too, so be aware of corrupted > data. > > Thanks, > Qu > > Thanks for pointing me to that patch, I've tried it and the FS mounted without issues. I've managed to get a snapshot of the folder structure and haven't noticed anything important missing, is there some way to get a list of anything that might have been corrupted, or I'll just find out as I try to access the file contents? Also, is there any hope of recovering the trees in place or should I just abandon this one and start with a new volume? It occurred to me that I might be able to run a scrub in the disk now that it's mounted, is that even possible in a situation like this, and more importantly, is it sane? and finally, thanks again for the patch, Thiago Ramon > > [ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5 > > [ +0.022884] BTRFS error (device bcache0): open_ctree failed > > > > Trying without the disk (and -o degraded) gives me: > > [Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag > > [ +0.000002] BTRFS info (device bcache1): allowing degraded mounts > > [ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated, > > use 'usebackuproot' instead > > [ +0.000000] BTRFS info (device bcache1): trying to use backup root at > > mount time[ +0.000002] BTRFS info (device bcache1): disabling disk > > space caching > > [ +0.000001] BTRFS info (device bcache1): force clearing of disk cache > > [ +0.001334] BTRFS warning (device bcache1): devid 2 uuid > > 27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing > > [ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2 > > block=77291982323712 slot=0, unexpected item end, have 685883288 > > expect 3995 > > [ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5 > > [ +0.017842] BTRFS error (device bcache1): open_ctree failed > > > > btrfs check output (without drive): > > warning, device 2 is missing > > checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6 > > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 > > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42 > > bytenr mismatch, want=77088164081664, have=274663271295232 > > Couldn't read chunk tree > > ERROR: cannot open file system > > > > I've already tried super-recover, zero-log and chunk-recover without > > any results, and check with --repair fails the same way as without. > > > > So, any ideas? I'll be happy to run experiments and grab more logs if > > anyone wants more details. > > > > > > And thanks for any suggestions. > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-10 15:50 ` Thiago Ramon @ 2019-01-22 16:41 ` Thiago Ramon 2019-01-22 20:42 ` Chris Murphy 0 siblings, 1 reply; 9+ messages in thread From: Thiago Ramon @ 2019-01-22 16:41 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs Back again with pretty much the same problem, but now without a reasonable cause: I've bought a couple new 8TB disks, recovered everything I needed from my previously damaged FS to a new BTRFS on those 2 drives (single copy mode), double-checked if everything was fine, then wipefs'd the old disks and added the ones that didn't have any issues previously to the new array and rebalanced to RAID6. Everything was running fine through the weekend and I was about 50% done when today: [ +7.733525] BTRFS info (device bcache0): relocating block group 8358036766720 flags data [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 off 31288448499712 (dev /dev/bcache4 sector 7401171296) [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 off 31288448503808 (dev /dev/bcache4 sector 7401171304) [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 off 31288448507904 (dev /dev/bcache4 sector 7401171312) [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 off 31288448512000 (dev /dev/bcache4 sector 7401171320) [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents [ +8.055456] BTRFS info (device bcache0): found 2050 extents [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents [ +0.846627] BTRFS info (device bcache0): relocating block group 8356963024896 flags data [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents [ +6.983072] BTRFS info (device bcache0): found 2052 extents [ +0.844419] BTRFS info (device bcache0): relocating block group 8355889283072 flags data [ +33.906101] BTRFS info (device bcache0): found 2058 extents [ +4.664570] BTRFS info (device bcache0): found 2058 extents [Jan22 09:24] BTRFS info (device bcache0): relocating block group 8354815541248 flags data [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents [ +17.650586] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.088917] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.001381] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.003555] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.005478] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.003953] BTRFS error (device bcache0): parent transid verify failed on 31288448466944 wanted 135681 found 135575 [ +0.000917] BTRFS: error (device bcache0) in btrfs_run_delayed_refs:3013: errno=-5 IO failure [ +0.000017] BTRFS: error (device bcache0) in btrfs_drop_snapshot:9463: errno=-5 IO failure [ +0.000895] BTRFS info (device bcache0): forced readonly [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: errno=-5 IO failure [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 Couldn't check anything even in RO mode scrub or btrfs check, when I unmounted the array I got a few kernel stack traces: [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_groups+0x395/0x3b0 [btrfs] [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted 4.20.0-042000-generic #201812232030 [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a fe ff ff 0f 0b e9 93 fe ff [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: 0000000000000000 [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: ffff924b85970600 [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: ffff924b859706a8 [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: ffff924aae380080 [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: 0000000000000000 [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) knlGS:0000000000000000 [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: 00000000001606e0 [ +0.000001] Call Trace: [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] [ +0.000004] generic_shutdown_super+0x72/0x110 [ +0.000001] kill_anon_super+0x18/0x30 [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] [ +0.000002] deactivate_locked_super+0x3a/0x80 [ +0.000001] deactivate_super+0x51/0x60 [ +0.000003] cleanup_mnt+0x3f/0x80 [ +0.000001] __cleanup_mnt+0x12/0x20 [ +0.000002] task_work_run+0x9d/0xc0 [ +0.000002] exit_to_usermode_loop+0xf2/0x100 [ +0.000002] do_syscall_64+0xda/0x110 [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.000001] RIP: 0033:0x7f1bd14bae27 [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 90 0c 00 f7 d8 64 89 01 48 [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: 00007f1bd14bae27 [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055df329edc20 [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: 00000000ffffffff [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: 000055df329edc20 [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: 00007ffdb15a7818 Now I'm back in a very similar situation as before, btrfs check gets me: Opening filesystem to check... checksum verify failed on 24707469082624 found 451E87BF wanted A1FD3A09 checksum verify failed on 24707469082624 found 2C2AEBE0 wanted D6652D6A checksum verify failed on 24707469082624 found 2C2AEBE0 wanted D6652D6A bad tree block 24707469082624, bytenr mismatch, want=24707469082624, have=231524568072192 Couldn't read tree root ERROR: cannot open file system I could do it all again, but first, what can be wrong here? This array was working for some 4 years until it went bad a few weeks ago, and now the FS got badly corrupted again without any warnings. Any suggestions? Bad RAM, SAS controller going bad, some weirdly behaving disk? I need to figure out what can be failing before I try another recovery. Thanks for any help, Thiago Ramon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-22 16:41 ` Thiago Ramon @ 2019-01-22 20:42 ` Chris Murphy 2019-01-22 23:28 ` Thiago Ramon 0 siblings, 1 reply; 9+ messages in thread From: Chris Murphy @ 2019-01-22 20:42 UTC (permalink / raw) To: Thiago Ramon; +Cc: Qu Wenruo, Btrfs BTRFS On Tue, Jan 22, 2019 at 9:41 AM Thiago Ramon <thiagoramon@gmail.com> wrote: > > Back again with pretty much the same problem, but now without a > reasonable cause: > I've bought a couple new 8TB disks, recovered everything I needed from > my previously damaged FS to a new BTRFS on those 2 drives (single copy > mode), double-checked if everything was fine, then wipefs'd the old > disks and added the ones that didn't have any issues previously to the > new array and rebalanced to RAID6. > Everything was running fine through the weekend and I was about 50% > done when today: > [ +7.733525] BTRFS info (device bcache0): relocating block group > 8358036766720 flags data > [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify > failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 > [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 > off 31288448499712 (dev /dev/bcache4 sector 7401171296) > [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 > off 31288448503808 (dev /dev/bcache4 sector 7401171304) > [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 > off 31288448507904 (dev /dev/bcache4 sector 7401171312) > [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 > off 31288448512000 (dev /dev/bcache4 sector 7401171320) This is corruption being detected and corrected on those listed sectors. As this is a bcache device, it's a virtual sector so it's hard to tell if it's coming from bcache itself, or the cache device, or the backing device. > [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents > [ +8.055456] BTRFS info (device bcache0): found 2050 extents > [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents > [ +0.846627] BTRFS info (device bcache0): relocating block group > 8356963024896 flags data > [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents > [ +6.983072] BTRFS info (device bcache0): found 2052 extents > [ +0.844419] BTRFS info (device bcache0): relocating block group > 8355889283072 flags data > [ +33.906101] BTRFS info (device bcache0): found 2058 extents > [ +4.664570] BTRFS info (device bcache0): found 2058 extents > [Jan22 09:24] BTRFS info (device bcache0): relocating block group > 8354815541248 flags data > [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents > [ +17.650586] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 Over 100 generations have passed, and yet it's only finding stale data on the desired btrfs byte nr (in btrfs linear space) so it might be extent tree corruption again. It's not possible from the available information to do anything but speculate how that much data is being lost or somehow being overwritten. > [ +0.088917] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 > [ +0.001381] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 > [ +0.003555] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 > [ +0.005478] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 > [ +0.003953] BTRFS error (device bcache0): parent transid verify > failed on 31288448466944 wanted 135681 found 135575 > [ +0.000917] BTRFS: error (device bcache0) in > btrfs_run_delayed_refs:3013: errno=-5 IO failure > [ +0.000017] BTRFS: error (device bcache0) in > btrfs_drop_snapshot:9463: errno=-5 IO failure And -5 I/O error is not a Btrfs error either, it's the detection of an IO error from the underlying block device, whether real or virtual. > [ +0.000895] BTRFS info (device bcache0): forced readonly > [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: > errno=-5 IO failure > [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 > > Couldn't check anything even in RO mode scrub or btrfs check, when I > unmounted the array I got a few kernel stack traces: > [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 > btrfs_free_block_groups+0x395/0x3b0 [btrfs] > [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted > 4.20.0-042000-generic #201812232030 > [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be > filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 > [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] > [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 > 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ > 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a > fe ff ff 0f 0b e9 93 fe ff > [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 > [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: > 0000000000000000 > [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: > ffff924b85970600 > [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: > ffff924b859706a8 > [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: > ffff924aae380080 > [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: > 0000000000000000 > [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) > knlGS:0000000000000000 > [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: > 00000000001606e0 > [ +0.000001] Call Trace: > [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] > [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] > [ +0.000004] generic_shutdown_super+0x72/0x110 > [ +0.000001] kill_anon_super+0x18/0x30 > [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] > [ +0.000002] deactivate_locked_super+0x3a/0x80 > [ +0.000001] deactivate_super+0x51/0x60 > [ +0.000003] cleanup_mnt+0x3f/0x80 > [ +0.000001] __cleanup_mnt+0x12/0x20 > [ +0.000002] task_work_run+0x9d/0xc0 > [ +0.000002] exit_to_usermode_loop+0xf2/0x100 > [ +0.000002] do_syscall_64+0xda/0x110 > [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ +0.000001] RIP: 0033:0x7f1bd14bae27 > [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 > 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 > 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 > 90 0c 00 f7 d8 64 89 01 48 > [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: > 00000000000000a6 > [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: > 00007f1bd14bae27 > [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: > 000055df329edc20 > [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: > 00000000ffffffff > [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: > 000055df329edc20 > [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: > 00007ffdb15a7818 > > Now I'm back in a very similar situation as before, btrfs check gets me: > Opening filesystem to check... > checksum verify failed on 24707469082624 found 451E87BF wanted > A1FD3A09 > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > D6652D6A > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > D6652D6A > bad tree block 24707469082624, bytenr mismatch, want=24707469082624, > have=231524568072192 > Couldn't read tree root > ERROR: cannot open file system > > I could do it all again, but first, what can be wrong here? This array > was working for some 4 years until it went bad a few weeks ago, and > now the FS got badly corrupted again without any warnings. Any > suggestions? Bad RAM, SAS controller going bad, some weirdly behaving > disk? I need to figure out what can be failing before I try another > recovery. I think it's specifically storage stack related. I think you'd have more varied and weird problems if it were memory corruption, but that's speculation on my part. I'd honestly simplify the layout and not use bcache at all, only use Btrfs directly on the whole drives, although I think it's reasonably simple to use dmcrypt if needed/desired. But it's still better for troubleshooting to make the storage stack as simple as possible. Without more debugging information from all the layers, it's hard to tell which layer to blame without just using the big stick called process of elimination. Maybe Qu has some ideas based on the call trace though - I can't parse it. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-22 20:42 ` Chris Murphy @ 2019-01-22 23:28 ` Thiago Ramon 2019-01-23 17:30 ` Thiago Ramon 0 siblings, 1 reply; 9+ messages in thread From: Thiago Ramon @ 2019-01-22 23:28 UTC (permalink / raw) To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS On Tue, Jan 22, 2019 at 6:43 PM Chris Murphy <lists@colorremedies.com> wrote: > > On Tue, Jan 22, 2019 at 9:41 AM Thiago Ramon <thiagoramon@gmail.com> wrote: > > > > Back again with pretty much the same problem, but now without a > > reasonable cause: > > I've bought a couple new 8TB disks, recovered everything I needed from > > my previously damaged FS to a new BTRFS on those 2 drives (single copy > > mode), double-checked if everything was fine, then wipefs'd the old > > disks and added the ones that didn't have any issues previously to the > > new array and rebalanced to RAID6. > > Everything was running fine through the weekend and I was about 50% > > done when today: > > [ +7.733525] BTRFS info (device bcache0): relocating block group > > 8358036766720 flags data > > [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify > > failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 > > [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 > > off 31288448499712 (dev /dev/bcache4 sector 7401171296) > > [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 > > off 31288448503808 (dev /dev/bcache4 sector 7401171304) > > [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 > > off 31288448507904 (dev /dev/bcache4 sector 7401171312) > > [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 > > off 31288448512000 (dev /dev/bcache4 sector 7401171320) > > This is corruption being detected and corrected on those listed > sectors. As this is a bcache device, it's a virtual sector so it's > hard to tell if it's coming from bcache itself, or the cache device, > or the backing device. > I was using bcache in writeback mode with my old FS, but I've learned THAT lesson the hard way. This one was just using writearound, unless bcache REALLY screwed it up I find it hard that it's the source of the corruption. There were no read or write errors from bcache since the time the new array went up, and each bcache* device is just a thin layer over a whole raw disk now. > > > [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents > > [ +8.055456] BTRFS info (device bcache0): found 2050 extents > > [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents > > [ +0.846627] BTRFS info (device bcache0): relocating block group > > 8356963024896 flags data > > [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents > > [ +6.983072] BTRFS info (device bcache0): found 2052 extents > > [ +0.844419] BTRFS info (device bcache0): relocating block group > > 8355889283072 flags data > > [ +33.906101] BTRFS info (device bcache0): found 2058 extents > > [ +4.664570] BTRFS info (device bcache0): found 2058 extents > > [Jan22 09:24] BTRFS info (device bcache0): relocating block group > > 8354815541248 flags data > > [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents > > [ +17.650586] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > > Over 100 generations have passed, and yet it's only finding stale data > on the desired btrfs byte nr (in btrfs linear space) so it might be > extent tree corruption again. > > It's not possible from the available information to do anything but > speculate how that much data is being lost or somehow being > overwritten. > > > > [ +0.088917] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > [ +0.001381] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > [ +0.003555] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > [ +0.005478] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > [ +0.003953] BTRFS error (device bcache0): parent transid verify > > failed on 31288448466944 wanted 135681 found 135575 > > [ +0.000917] BTRFS: error (device bcache0) in > > btrfs_run_delayed_refs:3013: errno=-5 IO failure > > [ +0.000017] BTRFS: error (device bcache0) in > > btrfs_drop_snapshot:9463: errno=-5 IO failure > > And -5 I/O error is not a Btrfs error either, it's the detection of an > IO error from the underlying block device, whether real or virtual. > Couldn't figure the source of the -5 either, no kernel logs from anything byt BTRFS complaining about it. After I umounted the array, it didn't shown up anymore, and I was able to remount the array with the skip_bg patch. > > > > [ +0.000895] BTRFS info (device bcache0): forced readonly > > [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: > > errno=-5 IO failure > > [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 > > > > Couldn't check anything even in RO mode scrub or btrfs check, when I > > unmounted the array I got a few kernel stack traces: > > [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 > > btrfs_free_block_groups+0x395/0x3b0 [btrfs] > > [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted > > 4.20.0-042000-generic #201812232030 > > [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be > > filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 > > [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] > > [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 > > 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ > > 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a > > fe ff ff 0f 0b e9 93 fe ff > > [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 > > [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: > > 0000000000000000 > > [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: > > ffff924b85970600 > > [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: > > ffff924b859706a8 > > [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: > > ffff924aae380080 > > [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: > > 0000000000000000 > > [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) > > knlGS:0000000000000000 > > [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: > > 00000000001606e0 > > [ +0.000001] Call Trace: > > [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] > > [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] > > [ +0.000004] generic_shutdown_super+0x72/0x110 > > [ +0.000001] kill_anon_super+0x18/0x30 > > [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] > > [ +0.000002] deactivate_locked_super+0x3a/0x80 > > [ +0.000001] deactivate_super+0x51/0x60 > > [ +0.000003] cleanup_mnt+0x3f/0x80 > > [ +0.000001] __cleanup_mnt+0x12/0x20 > > [ +0.000002] task_work_run+0x9d/0xc0 > > [ +0.000002] exit_to_usermode_loop+0xf2/0x100 > > [ +0.000002] do_syscall_64+0xda/0x110 > > [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ +0.000001] RIP: 0033:0x7f1bd14bae27 > > [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 > > 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 > > 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 > > 90 0c 00 f7 d8 64 89 01 48 > > [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: > > 00000000000000a6 > > [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: > > 00007f1bd14bae27 > > [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: > > 000055df329edc20 > > [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: > > 00000000ffffffff > > [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: > > 000055df329edc20 > > [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: > > 00007ffdb15a7818 > > > > Now I'm back in a very similar situation as before, btrfs check gets me: > > Opening filesystem to check... > > checksum verify failed on 24707469082624 found 451E87BF wanted > > A1FD3A09 > > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > > D6652D6A > > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > > D6652D6A > > bad tree block 24707469082624, bytenr mismatch, want=24707469082624, > > have=231524568072192 > > Couldn't read tree root > > ERROR: cannot open file system > > > > I could do it all again, but first, what can be wrong here? This array > > was working for some 4 years until it went bad a few weeks ago, and > > now the FS got badly corrupted again without any warnings. Any > > suggestions? Bad RAM, SAS controller going bad, some weirdly behaving > > disk? I need to figure out what can be failing before I try another > > recovery. > > I think it's specifically storage stack related. I think you'd have > more varied and weird problems if it were memory corruption, but > that's speculation on my part. I've done a quick memory test with stressapptest and it was fine, so if it's the memory it's something very localized. > > I'd honestly simplify the layout and not use bcache at all, only use > Btrfs directly on the whole drives, although I think it's reasonably > simple to use dmcrypt if needed/desired. But it's still better for > troubleshooting to make the storage stack as simple as possible. > Without more debugging information from all the layers, it's hard to > tell which layer to blame without just using the big stick called > process of elimination. > > Maybe Qu has some ideas based on the call trace though - I can't parse it. > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-22 23:28 ` Thiago Ramon @ 2019-01-23 17:30 ` Thiago Ramon 2019-01-25 2:36 ` Anand Jain 0 siblings, 1 reply; 9+ messages in thread From: Thiago Ramon @ 2019-01-23 17:30 UTC (permalink / raw) To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS Back again with more info. I've done a dump-tree on my array, and from the 71GB of tree data, I've only got the following errors: parent transid verify failed on 31288448466944 wanted 135681 found 135575 parent transid verify failed on 31288448466944 wanted 135681 found 135575 checksum verify failed on 31288448466944 found 9D027AC2 wanted 79E1C774 checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 bad tree block 31288448466944, bytenr mismatch, want=31288448466944, have=198834197030656 failed to read 31288448466944 in tree 2 parent transid verify failed on 31288448483328 wanted 135681 found 135575 parent transid verify failed on 31288448483328 wanted 135681 found 135575 checksum verify failed on 31288448483328 found D1816F8A wanted 3562D23C checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 bad tree block 31288448483328, bytenr mismatch, want=31288448483328, have=198834197025536 failed to read 31288448483328 in tree 2 parent transid verify failed on 17412886069248 wanted 31316 found 31040 parent transid verify failed on 17412886069248 wanted 31316 found 31040 checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 checksum verify failed on 17412886069248 found 52BE5880 wanted A8702E6B checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 bad tree block 17412886069248, bytenr mismatch, want=17412886069248, have=0 failed to read 17412886069248 in tree 7 If I'm reading this correctly, it's just a couple bad nodes in the extent tree and one in the checksum tree (assuming they aren't hiding lost nodes behind them). Would it be possible to manually fix those nodes, and if so, how should I proceed? (I have experience in data-recovery on other filesystems, but my knowledge of BTRFS isn't deep enough yet to allow me to even guess where I should start poking, so any pointers are welcome) And thanks again for all help so far On Tue, Jan 22, 2019 at 9:28 PM Thiago Ramon <thiagoramon@gmail.com> wrote: > > On Tue, Jan 22, 2019 at 6:43 PM Chris Murphy <lists@colorremedies.com> wrote: > > > > On Tue, Jan 22, 2019 at 9:41 AM Thiago Ramon <thiagoramon@gmail.com> wrote: > > > > > > Back again with pretty much the same problem, but now without a > > > reasonable cause: > > > I've bought a couple new 8TB disks, recovered everything I needed from > > > my previously damaged FS to a new BTRFS on those 2 drives (single copy > > > mode), double-checked if everything was fine, then wipefs'd the old > > > disks and added the ones that didn't have any issues previously to the > > > new array and rebalanced to RAID6. > > > Everything was running fine through the weekend and I was about 50% > > > done when today: > > > [ +7.733525] BTRFS info (device bcache0): relocating block group > > > 8358036766720 flags data > > > [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify > > > failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 > > > [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 > > > off 31288448499712 (dev /dev/bcache4 sector 7401171296) > > > [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 > > > off 31288448503808 (dev /dev/bcache4 sector 7401171304) > > > [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 > > > off 31288448507904 (dev /dev/bcache4 sector 7401171312) > > > [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 > > > off 31288448512000 (dev /dev/bcache4 sector 7401171320) > > > > This is corruption being detected and corrected on those listed > > sectors. As this is a bcache device, it's a virtual sector so it's > > hard to tell if it's coming from bcache itself, or the cache device, > > or the backing device. > > > I was using bcache in writeback mode with my old FS, but I've learned > THAT lesson the hard way. This one was just using writearound, unless > bcache REALLY screwed it up I find it hard that it's the source of the > corruption. There were no read or write errors from bcache since the > time the new array went up, and each bcache* device is just a thin > layer over a whole raw disk now. > > > > > > [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents > > > [ +8.055456] BTRFS info (device bcache0): found 2050 extents > > > [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents > > > [ +0.846627] BTRFS info (device bcache0): relocating block group > > > 8356963024896 flags data > > > [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents > > > [ +6.983072] BTRFS info (device bcache0): found 2052 extents > > > [ +0.844419] BTRFS info (device bcache0): relocating block group > > > 8355889283072 flags data > > > [ +33.906101] BTRFS info (device bcache0): found 2058 extents > > > [ +4.664570] BTRFS info (device bcache0): found 2058 extents > > > [Jan22 09:24] BTRFS info (device bcache0): relocating block group > > > 8354815541248 flags data > > > [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents > > > [ +17.650586] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > > > > Over 100 generations have passed, and yet it's only finding stale data > > on the desired btrfs byte nr (in btrfs linear space) so it might be > > extent tree corruption again. > > > > It's not possible from the available information to do anything but > > speculate how that much data is being lost or somehow being > > overwritten. > > > > > > > [ +0.088917] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > [ +0.001381] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > [ +0.003555] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > [ +0.005478] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > [ +0.003953] BTRFS error (device bcache0): parent transid verify > > > failed on 31288448466944 wanted 135681 found 135575 > > > [ +0.000917] BTRFS: error (device bcache0) in > > > btrfs_run_delayed_refs:3013: errno=-5 IO failure > > > [ +0.000017] BTRFS: error (device bcache0) in > > > btrfs_drop_snapshot:9463: errno=-5 IO failure > > > > And -5 I/O error is not a Btrfs error either, it's the detection of an > > IO error from the underlying block device, whether real or virtual. > > > Couldn't figure the source of the -5 either, no kernel logs from > anything byt BTRFS complaining about it. After I umounted the array, > it didn't shown up anymore, and I was able to remount the array with > the skip_bg patch. > > > > > > > > [ +0.000895] BTRFS info (device bcache0): forced readonly > > > [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: > > > errno=-5 IO failure > > > [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 > > > > > > Couldn't check anything even in RO mode scrub or btrfs check, when I > > > unmounted the array I got a few kernel stack traces: > > > [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 > > > btrfs_free_block_groups+0x395/0x3b0 [btrfs] > > > [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted > > > 4.20.0-042000-generic #201812232030 > > > [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be > > > filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 > > > [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] > > > [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 > > > 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ > > > 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a > > > fe ff ff 0f 0b e9 93 fe ff > > > [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 > > > [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: > > > 0000000000000000 > > > [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: > > > ffff924b85970600 > > > [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: > > > ffff924b859706a8 > > > [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: > > > ffff924aae380080 > > > [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: > > > 0000000000000000 > > > [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) > > > knlGS:0000000000000000 > > > [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: > > > 00000000001606e0 > > > [ +0.000001] Call Trace: > > > [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] > > > [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] > > > [ +0.000004] generic_shutdown_super+0x72/0x110 > > > [ +0.000001] kill_anon_super+0x18/0x30 > > > [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] > > > [ +0.000002] deactivate_locked_super+0x3a/0x80 > > > [ +0.000001] deactivate_super+0x51/0x60 > > > [ +0.000003] cleanup_mnt+0x3f/0x80 > > > [ +0.000001] __cleanup_mnt+0x12/0x20 > > > [ +0.000002] task_work_run+0x9d/0xc0 > > > [ +0.000002] exit_to_usermode_loop+0xf2/0x100 > > > [ +0.000002] do_syscall_64+0xda/0x110 > > > [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > [ +0.000001] RIP: 0033:0x7f1bd14bae27 > > > [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 > > > 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 > > > 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 > > > 90 0c 00 f7 d8 64 89 01 48 > > > [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: > > > 00000000000000a6 > > > [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: > > > 00007f1bd14bae27 > > > [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: > > > 000055df329edc20 > > > [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: > > > 00000000ffffffff > > > [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: > > > 000055df329edc20 > > > [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: > > > 00007ffdb15a7818 > > > > > > Now I'm back in a very similar situation as before, btrfs check gets me: > > > Opening filesystem to check... > > > checksum verify failed on 24707469082624 found 451E87BF wanted > > > A1FD3A09 > > > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > > > D6652D6A > > > checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > > > D6652D6A > > > bad tree block 24707469082624, bytenr mismatch, want=24707469082624, > > > have=231524568072192 > > > Couldn't read tree root > > > ERROR: cannot open file system > > > > > > I could do it all again, but first, what can be wrong here? This array > > > was working for some 4 years until it went bad a few weeks ago, and > > > now the FS got badly corrupted again without any warnings. Any > > > suggestions? Bad RAM, SAS controller going bad, some weirdly behaving > > > disk? I need to figure out what can be failing before I try another > > > recovery. > > > > I think it's specifically storage stack related. I think you'd have > > more varied and weird problems if it were memory corruption, but > > that's speculation on my part. > > I've done a quick memory test with stressapptest and it was fine, so > if it's the memory it's something very localized. > > > > I'd honestly simplify the layout and not use bcache at all, only use > > Btrfs directly on the whole drives, although I think it's reasonably > > simple to use dmcrypt if needed/desired. But it's still better for > > troubleshooting to make the storage stack as simple as possible. > > Without more debugging information from all the layers, it's hard to > > tell which layer to blame without just using the big stick called > > process of elimination. > > > > Maybe Qu has some ideas based on the call trace though - I can't parse it. > > > > -- > > Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-23 17:30 ` Thiago Ramon @ 2019-01-25 2:36 ` Anand Jain 2019-01-25 2:47 ` Thiago Ramon 0 siblings, 1 reply; 9+ messages in thread From: Anand Jain @ 2019-01-25 2:36 UTC (permalink / raw) To: Thiago Ramon, Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS On 01/24/2019 01:30 AM, Thiago Ramon wrote: > Back again with more info. > > I've done a dump-tree on my array, and from the 71GB of tree data, > I've only got the following errors: > parent transid verify failed on 31288448466944 wanted 135681 found 135575 > parent transid verify failed on 31288448466944 wanted 135681 found 135575 > checksum verify failed on 31288448466944 found 9D027AC2 wanted 79E1C774 > checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 > checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 > bad tree block 31288448466944, bytenr mismatch, want=31288448466944, > have=198834197030656 > failed to read 31288448466944 in tree 2 > parent transid verify failed on 31288448483328 wanted 135681 found 135575 > parent transid verify failed on 31288448483328 wanted 135681 found 135575 > checksum verify failed on 31288448483328 found D1816F8A wanted 3562D23C > checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 > checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 > bad tree block 31288448483328, bytenr mismatch, want=31288448483328, > have=198834197025536 > failed to read 31288448483328 in tree 2 > parent transid verify failed on 17412886069248 wanted 31316 found 31040 > parent transid verify failed on 17412886069248 wanted 31316 found 31040 > checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 > checksum verify failed on 17412886069248 found 52BE5880 wanted A8702E6B > checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 > bad tree block 17412886069248, bytenr mismatch, want=17412886069248, have=0 > failed to read 17412886069248 in tree 7 If there are sufficient mirror parent transit verify failed should follow with errors-fixed logs. I wonder what type of chunks are these. 'btrfs fi df <mnt>' Thanks. > If I'm reading this correctly, it's just a couple bad nodes in the > extent tree and one in the checksum tree (assuming they aren't hiding > lost nodes behind them). Would it be possible to manually fix those > nodes, and if so, how should I proceed? (I have experience in > data-recovery on other filesystems, but my knowledge of BTRFS isn't > deep enough yet to allow me to even guess where I should start poking, > so any pointers are welcome) > > And thanks again for all help so far > > On Tue, Jan 22, 2019 at 9:28 PM Thiago Ramon <thiagoramon@gmail.com> wrote: >> >> On Tue, Jan 22, 2019 at 6:43 PM Chris Murphy <lists@colorremedies.com> wrote: >>> >>> On Tue, Jan 22, 2019 at 9:41 AM Thiago Ramon <thiagoramon@gmail.com> wrote: >>>> >>>> Back again with pretty much the same problem, but now without a >>>> reasonable cause: >>>> I've bought a couple new 8TB disks, recovered everything I needed from >>>> my previously damaged FS to a new BTRFS on those 2 drives (single copy >>>> mode), double-checked if everything was fine, then wipefs'd the old >>>> disks and added the ones that didn't have any issues previously to the >>>> new array and rebalanced to RAID6. >>>> Everything was running fine through the weekend and I was about 50% >>>> done when today: >>>> [ +7.733525] BTRFS info (device bcache0): relocating block group >>>> 8358036766720 flags data >>>> [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify >>>> failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 >>>> [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 >>>> off 31288448499712 (dev /dev/bcache4 sector 7401171296) >>>> [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 >>>> off 31288448503808 (dev /dev/bcache4 sector 7401171304) >>>> [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 >>>> off 31288448507904 (dev /dev/bcache4 sector 7401171312) >>>> [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 >>>> off 31288448512000 (dev /dev/bcache4 sector 7401171320) >>> >>> This is corruption being detected and corrected on those listed >>> sectors. As this is a bcache device, it's a virtual sector so it's >>> hard to tell if it's coming from bcache itself, or the cache device, >>> or the backing device. >>> >> I was using bcache in writeback mode with my old FS, but I've learned >> THAT lesson the hard way. This one was just using writearound, unless >> bcache REALLY screwed it up I find it hard that it's the source of the >> corruption. There were no read or write errors from bcache since the >> time the new array went up, and each bcache* device is just a thin >> layer over a whole raw disk now. >> >>> >>>> [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents >>>> [ +8.055456] BTRFS info (device bcache0): found 2050 extents >>>> [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents >>>> [ +0.846627] BTRFS info (device bcache0): relocating block group >>>> 8356963024896 flags data >>>> [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents >>>> [ +6.983072] BTRFS info (device bcache0): found 2052 extents >>>> [ +0.844419] BTRFS info (device bcache0): relocating block group >>>> 8355889283072 flags data >>>> [ +33.906101] BTRFS info (device bcache0): found 2058 extents >>>> [ +4.664570] BTRFS info (device bcache0): found 2058 extents >>>> [Jan22 09:24] BTRFS info (device bcache0): relocating block group >>>> 8354815541248 flags data >>>> [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents >>>> [ +17.650586] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>> >>> >>> Over 100 generations have passed, and yet it's only finding stale data >>> on the desired btrfs byte nr (in btrfs linear space) so it might be >>> extent tree corruption again. >>> >>> It's not possible from the available information to do anything but >>> speculate how that much data is being lost or somehow being >>> overwritten. >>> >>> >>>> [ +0.088917] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>>> [ +0.001381] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>>> [ +0.003555] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>>> [ +0.005478] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>>> [ +0.003953] BTRFS error (device bcache0): parent transid verify >>>> failed on 31288448466944 wanted 135681 found 135575 >>>> [ +0.000917] BTRFS: error (device bcache0) in >>>> btrfs_run_delayed_refs:3013: errno=-5 IO failure >>>> [ +0.000017] BTRFS: error (device bcache0) in >>>> btrfs_drop_snapshot:9463: errno=-5 IO failure >>> >>> And -5 I/O error is not a Btrfs error either, it's the detection of an >>> IO error from the underlying block device, whether real or virtual. >>> >> Couldn't figure the source of the -5 either, no kernel logs from >> anything byt BTRFS complaining about it. After I umounted the array, >> it didn't shown up anymore, and I was able to remount the array with >> the skip_bg patch. >> >>> >>> >>>> [ +0.000895] BTRFS info (device bcache0): forced readonly >>>> [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: >>>> errno=-5 IO failure >>>> [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 >>>> >>>> Couldn't check anything even in RO mode scrub or btrfs check, when I >>>> unmounted the array I got a few kernel stack traces: >>>> [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 >>>> btrfs_free_block_groups+0x395/0x3b0 [btrfs] >>>> [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted >>>> 4.20.0-042000-generic #201812232030 >>>> [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be >>>> filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 >>>> [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] >>>> [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 >>>> 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ >>>> 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a >>>> fe ff ff 0f 0b e9 93 fe ff >>>> [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 >>>> [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: >>>> 0000000000000000 >>>> [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: >>>> ffff924b85970600 >>>> [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: >>>> ffff924b859706a8 >>>> [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: >>>> ffff924aae380080 >>>> [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: >>>> 0000000000000000 >>>> [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) >>>> knlGS:0000000000000000 >>>> [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: >>>> 00000000001606e0 >>>> [ +0.000001] Call Trace: >>>> [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] >>>> [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] >>>> [ +0.000004] generic_shutdown_super+0x72/0x110 >>>> [ +0.000001] kill_anon_super+0x18/0x30 >>>> [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] >>>> [ +0.000002] deactivate_locked_super+0x3a/0x80 >>>> [ +0.000001] deactivate_super+0x51/0x60 >>>> [ +0.000003] cleanup_mnt+0x3f/0x80 >>>> [ +0.000001] __cleanup_mnt+0x12/0x20 >>>> [ +0.000002] task_work_run+0x9d/0xc0 >>>> [ +0.000002] exit_to_usermode_loop+0xf2/0x100 >>>> [ +0.000002] do_syscall_64+0xda/0x110 >>>> [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [ +0.000001] RIP: 0033:0x7f1bd14bae27 >>>> [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 >>>> 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 >>>> 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 >>>> 90 0c 00 f7 d8 64 89 01 48 >>>> [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: >>>> 00000000000000a6 >>>> [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: >>>> 00007f1bd14bae27 >>>> [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: >>>> 000055df329edc20 >>>> [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: >>>> 00000000ffffffff >>>> [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: >>>> 000055df329edc20 >>>> [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: >>>> 00007ffdb15a7818 >>>> >>>> Now I'm back in a very similar situation as before, btrfs check gets me: >>>> Opening filesystem to check... >>>> checksum verify failed on 24707469082624 found 451E87BF wanted >>>> A1FD3A09 >>>> checksum verify failed on 24707469082624 found 2C2AEBE0 wanted >>>> D6652D6A >>>> checksum verify failed on 24707469082624 found 2C2AEBE0 wanted >>>> D6652D6A >>>> bad tree block 24707469082624, bytenr mismatch, want=24707469082624, >>>> have=231524568072192 >>>> Couldn't read tree root >>>> ERROR: cannot open file system >>>> >>>> I could do it all again, but first, what can be wrong here? This array >>>> was working for some 4 years until it went bad a few weeks ago, and >>>> now the FS got badly corrupted again without any warnings. Any >>>> suggestions? Bad RAM, SAS controller going bad, some weirdly behaving >>>> disk? I need to figure out what can be failing before I try another >>>> recovery. >>> >>> I think it's specifically storage stack related. I think you'd have >>> more varied and weird problems if it were memory corruption, but >>> that's speculation on my part. >> >> I've done a quick memory test with stressapptest and it was fine, so >> if it's the memory it's something very localized. >>> >>> I'd honestly simplify the layout and not use bcache at all, only use >>> Btrfs directly on the whole drives, although I think it's reasonably >>> simple to use dmcrypt if needed/desired. But it's still better for >>> troubleshooting to make the storage stack as simple as possible. >>> Without more debugging information from all the layers, it's hard to >>> tell which layer to blame without just using the big stick called >>> process of elimination. >>> >>> Maybe Qu has some ideas based on the call trace though - I can't parse it. >>> >>> -- >>> Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nasty corruption on large array, ideas welcome 2019-01-25 2:36 ` Anand Jain @ 2019-01-25 2:47 ` Thiago Ramon 0 siblings, 0 replies; 9+ messages in thread From: Thiago Ramon @ 2019-01-25 2:47 UTC (permalink / raw) To: Anand Jain; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS On Fri, Jan 25, 2019 at 12:36 AM Anand Jain <anand.jain@oracle.com> wrote: > > > > On 01/24/2019 01:30 AM, Thiago Ramon wrote: > > Back again with more info. > > > > I've done a dump-tree on my array, and from the 71GB of tree data, > > I've only got the following errors: > > parent transid verify failed on 31288448466944 wanted 135681 found 135575 > > parent transid verify failed on 31288448466944 wanted 135681 found 135575 > > checksum verify failed on 31288448466944 found 9D027AC2 wanted 79E1C774 > > checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 > > checksum verify failed on 31288448466944 found 9A1BA23D wanted CFE05F82 > > bad tree block 31288448466944, bytenr mismatch, want=31288448466944, > > have=198834197030656 > > failed to read 31288448466944 in tree 2 > > parent transid verify failed on 31288448483328 wanted 135681 found 135575 > > parent transid verify failed on 31288448483328 wanted 135681 found 135575 > > checksum verify failed on 31288448483328 found D1816F8A wanted 3562D23C > > checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 > > checksum verify failed on 31288448483328 found 51B89701 wanted EECC5745 > > bad tree block 31288448483328, bytenr mismatch, want=31288448483328, > > have=198834197025536 > > failed to read 31288448483328 in tree 2 > > parent transid verify failed on 17412886069248 wanted 31316 found 31040 > > parent transid verify failed on 17412886069248 wanted 31316 found 31040 > > checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 > > checksum verify failed on 17412886069248 found 52BE5880 wanted A8702E6B > > checksum verify failed on 17412886069248 found 5A0CE056 wanted BEEF5DE0 > > bad tree block 17412886069248, bytenr mismatch, want=17412886069248, have=0 > > failed to read 17412886069248 in tree 7 > > If there are sufficient mirror parent transit verify failed should > follow with errors-fixed logs. Whatever caused the corruption did enough damage it couldn't repair by itself. I'll be spending the next couple days poking at the trees to see if I can figure out what happened and how to fix it, but I still have to learn a lot of details about the BTRFS disk format. > I wonder what type of chunks are these. The disk is currently in a happy mix of RAID1, single and RAID6. I don't know if it was a coincidence, but a couple minutes before the first error it had just balanced a metadata block group, possibly the first one. > 'btrfs fi df <mnt>' Data, single: total=7.58TiB, used=7.58TiB Data, RAID6: total=13.85TiB, used=13.85TiB System, RAID1: total=8.00MiB, used=8.00MiB Metadata, RAID1: total=21.00GiB, used=21.00GiB Metadata, RAID6: total=18.00GiB, used=18.00GiB GlobalReserve, single: total=0.00B, used=0.00B > > Thanks. > > > If I'm reading this correctly, it's just a couple bad nodes in the > > extent tree and one in the checksum tree (assuming they aren't hiding > > lost nodes behind them). Would it be possible to manually fix those > > nodes, and if so, how should I proceed? (I have experience in > > data-recovery on other filesystems, but my knowledge of BTRFS isn't > > deep enough yet to allow me to even guess where I should start poking, > > so any pointers are welcome) > > > > And thanks again for all help so far > > > > On Tue, Jan 22, 2019 at 9:28 PM Thiago Ramon <thiagoramon@gmail.com> wrote: > >> > >> On Tue, Jan 22, 2019 at 6:43 PM Chris Murphy <lists@colorremedies.com> wrote: > >>> > >>> On Tue, Jan 22, 2019 at 9:41 AM Thiago Ramon <thiagoramon@gmail.com> wrote: > >>>> > >>>> Back again with pretty much the same problem, but now without a > >>>> reasonable cause: > >>>> I've bought a couple new 8TB disks, recovered everything I needed from > >>>> my previously damaged FS to a new BTRFS on those 2 drives (single copy > >>>> mode), double-checked if everything was fine, then wipefs'd the old > >>>> disks and added the ones that didn't have any issues previously to the > >>>> new array and rebalanced to RAID6. > >>>> Everything was running fine through the weekend and I was about 50% > >>>> done when today: > >>>> [ +7.733525] BTRFS info (device bcache0): relocating block group > >>>> 8358036766720 flags data > >>>> [Jan22 09:20] BTRFS warning (device bcache0): bcache0 checksum verify > >>>> failed on 31288448499712 wanted A3746F78 found 44D6AEB0 level 1 > >>>> [ +0.460086] BTRFS info (device bcache0): read error corrected: ino 0 > >>>> off 31288448499712 (dev /dev/bcache4 sector 7401171296) > >>>> [ +0.000199] BTRFS info (device bcache0): read error corrected: ino 0 > >>>> off 31288448503808 (dev /dev/bcache4 sector 7401171304) > >>>> [ +0.000181] BTRFS info (device bcache0): read error corrected: ino 0 > >>>> off 31288448507904 (dev /dev/bcache4 sector 7401171312) > >>>> [ +0.000158] BTRFS info (device bcache0): read error corrected: ino 0 > >>>> off 31288448512000 (dev /dev/bcache4 sector 7401171320) > >>> > >>> This is corruption being detected and corrected on those listed > >>> sectors. As this is a bcache device, it's a virtual sector so it's > >>> hard to tell if it's coming from bcache itself, or the cache device, > >>> or the backing device. > >>> > >> I was using bcache in writeback mode with my old FS, but I've learned > >> THAT lesson the hard way. This one was just using writearound, unless > >> bcache REALLY screwed it up I find it hard that it's the source of the > >> corruption. There were no read or write errors from bcache since the > >> time the new array went up, and each bcache* device is just a thin > >> layer over a whole raw disk now. > >> > >>> > >>>> [Jan22 09:21] BTRFS info (device bcache0): found 2050 extents > >>>> [ +8.055456] BTRFS info (device bcache0): found 2050 extents > >>>> [Jan22 09:22] BTRFS info (device bcache0): found 2050 extents > >>>> [ +0.846627] BTRFS info (device bcache0): relocating block group > >>>> 8356963024896 flags data > >>>> [Jan22 09:23] BTRFS info (device bcache0): found 2052 extents > >>>> [ +6.983072] BTRFS info (device bcache0): found 2052 extents > >>>> [ +0.844419] BTRFS info (device bcache0): relocating block group > >>>> 8355889283072 flags data > >>>> [ +33.906101] BTRFS info (device bcache0): found 2058 extents > >>>> [ +4.664570] BTRFS info (device bcache0): found 2058 extents > >>>> [Jan22 09:24] BTRFS info (device bcache0): relocating block group > >>>> 8354815541248 flags data > >>>> [Jan22 09:25] BTRFS info (device bcache0): found 2057 extents > >>>> [ +17.650586] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>> > >>> > >>> Over 100 generations have passed, and yet it's only finding stale data > >>> on the desired btrfs byte nr (in btrfs linear space) so it might be > >>> extent tree corruption again. > >>> > >>> It's not possible from the available information to do anything but > >>> speculate how that much data is being lost or somehow being > >>> overwritten. > >>> > >>> > >>>> [ +0.088917] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>>> [ +0.001381] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>>> [ +0.003555] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>>> [ +0.005478] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>>> [ +0.003953] BTRFS error (device bcache0): parent transid verify > >>>> failed on 31288448466944 wanted 135681 found 135575 > >>>> [ +0.000917] BTRFS: error (device bcache0) in > >>>> btrfs_run_delayed_refs:3013: errno=-5 IO failure > >>>> [ +0.000017] BTRFS: error (device bcache0) in > >>>> btrfs_drop_snapshot:9463: errno=-5 IO failure > >>> > >>> And -5 I/O error is not a Btrfs error either, it's the detection of an > >>> IO error from the underlying block device, whether real or virtual. > >>> > >> Couldn't figure the source of the -5 either, no kernel logs from > >> anything byt BTRFS complaining about it. After I umounted the array, > >> it didn't shown up anymore, and I was able to remount the array with > >> the skip_bg patch. > >> > >>> > >>> > >>>> [ +0.000895] BTRFS info (device bcache0): forced readonly > >>>> [ +0.000902] BTRFS: error (device bcache0) in merge_reloc_roots:2429: > >>>> errno=-5 IO failure > >>>> [ +0.000387] BTRFS info (device bcache0): balance: ended with status: -30 > >>>> > >>>> Couldn't check anything even in RO mode scrub or btrfs check, when I > >>>> unmounted the array I got a few kernel stack traces: > >>>> [Jan22 13:58] WARNING: CPU: 3 PID: 9711 at fs/btrfs/extent-tree.c:5986 > >>>> btrfs_free_block_groups+0x395/0x3b0 [btrfs] > >>>> [ +0.000032] CPU: 3 PID: 9711 Comm: umount Not tainted > >>>> 4.20.0-042000-generic #201812232030 > >>>> [ +0.000001] Hardware name: Gigabyte Technology Co., Ltd. To be > >>>> filled by O.E.M./H61M-DS2H, BIOS F6 12/14/2012 > >>>> [ +0.000014] RIP: 0010:btrfs_free_block_groups+0x395/0x3b0 [btrfs] > >>>> [ +0.000002] Code: 01 00 00 00 0f 84 a0 fe ff ff 0f 0b 48 83 bb d0 01 > >>>> 00 00 00 0f 84 9e fe ff ff 0f 0b 48 83 bb 08 0$ > >>>> 00 00 00 0f 84 9c fe ff ff <0f> 0b 48 83 bb 00 02 00 00 00 0f 84 9a > >>>> fe ff ff 0f 0b e9 93 fe ff > >>>> [ +0.000001] RSP: 0018:ffffa3c1c2997d88 EFLAGS: 00010206 > >>>> [ +0.000001] RAX: 0000000020000000 RBX: ffff924aae380000 RCX: > >>>> 0000000000000000 > >>>> [ +0.000001] RDX: ffffffffe0000000 RSI: ffff924b85970600 RDI: > >>>> ffff924b85970600 > >>>> [ +0.000001] RBP: ffffa3c1c2997db8 R08: 0000000020000000 R09: > >>>> ffff924b859706a8 > >>>> [ +0.000000] R10: 0000000000000002 R11: ffff924b973a1c04 R12: > >>>> ffff924aae380080 > >>>> [ +0.000001] R13: ffff924b8dfe8400 R14: ffff924aae380090 R15: > >>>> 0000000000000000 > >>>> [ +0.000002] FS: 00007f1bd1076080(0000) GS:ffff924b97380000(0000) > >>>> knlGS:0000000000000000 > >>>> [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> [ +0.000000] CR2: 0000562d2eb13c10 CR3: 0000000156910006 CR4: > >>>> 00000000001606e0 > >>>> [ +0.000001] Call Trace: > >>>> [ +0.000018] close_ctree+0x143/0x2e0 [btrfs] > >>>> [ +0.000012] btrfs_put_super+0x15/0x20 [btrfs] > >>>> [ +0.000004] generic_shutdown_super+0x72/0x110 > >>>> [ +0.000001] kill_anon_super+0x18/0x30 > >>>> [ +0.000012] btrfs_kill_super+0x16/0xa0 [btrfs] > >>>> [ +0.000002] deactivate_locked_super+0x3a/0x80 > >>>> [ +0.000001] deactivate_super+0x51/0x60 > >>>> [ +0.000003] cleanup_mnt+0x3f/0x80 > >>>> [ +0.000001] __cleanup_mnt+0x12/0x20 > >>>> [ +0.000002] task_work_run+0x9d/0xc0 > >>>> [ +0.000002] exit_to_usermode_loop+0xf2/0x100 > >>>> [ +0.000002] do_syscall_64+0xda/0x110 > >>>> [ +0.000003] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>> [ +0.000001] RIP: 0033:0x7f1bd14bae27 > >>>> [ +0.000001] Code: 90 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 > >>>> 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 > >>>> 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 > >>>> 90 0c 00 f7 d8 64 89 01 48 > >>>> [ +0.000001] RSP: 002b:00007ffdb15a75a8 EFLAGS: 00000246 ORIG_RAX: > >>>> 00000000000000a6 > >>>> [ +0.000002] RAX: 0000000000000000 RBX: 000055df329eda40 RCX: > >>>> 00007f1bd14bae27 > >>>> [ +0.000000] RDX: 0000000000000001 RSI: 0000000000000000 RDI: > >>>> 000055df329edc20 > >>>> [ +0.000001] RBP: 0000000000000000 R08: 000055df329eea70 R09: > >>>> 00000000ffffffff > >>>> [ +0.000001] R10: 000000000000000b R11: 0000000000000246 R12: > >>>> 000055df329edc20 > >>>> [ +0.000001] R13: 00007f1bd15e18c4 R14: 0000000000000000 R15: > >>>> 00007ffdb15a7818 > >>>> > >>>> Now I'm back in a very similar situation as before, btrfs check gets me: > >>>> Opening filesystem to check... > >>>> checksum verify failed on 24707469082624 found 451E87BF wanted > >>>> A1FD3A09 > >>>> checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > >>>> D6652D6A > >>>> checksum verify failed on 24707469082624 found 2C2AEBE0 wanted > >>>> D6652D6A > >>>> bad tree block 24707469082624, bytenr mismatch, want=24707469082624, > >>>> have=231524568072192 > >>>> Couldn't read tree root > >>>> ERROR: cannot open file system > >>>> > >>>> I could do it all again, but first, what can be wrong here? This array > >>>> was working for some 4 years until it went bad a few weeks ago, and > >>>> now the FS got badly corrupted again without any warnings. Any > >>>> suggestions? Bad RAM, SAS controller going bad, some weirdly behaving > >>>> disk? I need to figure out what can be failing before I try another > >>>> recovery. > >>> > >>> I think it's specifically storage stack related. I think you'd have > >>> more varied and weird problems if it were memory corruption, but > >>> that's speculation on my part. > >> > >> I've done a quick memory test with stressapptest and it was fine, so > >> if it's the memory it's something very localized. > >>> > >>> I'd honestly simplify the layout and not use bcache at all, only use > >>> Btrfs directly on the whole drives, although I think it's reasonably > >>> simple to use dmcrypt if needed/desired. But it's still better for > >>> troubleshooting to make the storage stack as simple as possible. > >>> Without more debugging information from all the layers, it's hard to > >>> tell which layer to blame without just using the big stick called > >>> process of elimination. > >>> > >>> Maybe Qu has some ideas based on the call trace though - I can't parse it. > >>> > >>> -- > >>> Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-01-25 2:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-08 19:33 Nasty corruption on large array, ideas welcome Thiago Ramon 2019-01-09 0:05 ` Qu Wenruo 2019-01-10 15:50 ` Thiago Ramon 2019-01-22 16:41 ` Thiago Ramon 2019-01-22 20:42 ` Chris Murphy 2019-01-22 23:28 ` Thiago Ramon 2019-01-23 17:30 ` Thiago Ramon 2019-01-25 2:36 ` Anand Jain 2019-01-25 2:47 ` Thiago Ramon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).