On 2020/8/23 上午10:49, Tyler Richmond wrote: > Well, I can guarantee that I didn't create this fs before 2015 (just > checked the order confirmation from when I bought the server), but I > may have just used whatever was in the Ubuntu package manager at the > time. So maybe I don't have a v0 ref? Then btrfs-image shouldn't report that. There is an item smaller than any valid btrfs item, normally it means it's a v0 ref. If not, then it could be a bigger problem. Could you please provide the full btrfs-check output? Also, if possible result from "btrfs check --mode=lowmem" would also help. Also, if you really go "--repair", then the full output would also be needed to determine what's going wrong. There is a report about "btrfs check --repair" didn't repair the inode generation, if that's the case we must have a bug then. Thanks, Qu > > On Sat, Aug 22, 2020 at 10:31 PM Qu Wenruo wrote: >> >> >> >> On 2020/8/23 上午9:51, Qu Wenruo wrote: >>> >>> >>> On 2020/8/23 上午9:15, Tyler Richmond wrote: >>>> Is my best bet just to downgrade the kernel and then try to delete the >>>> broken files? Or should I rebuild from scratch? Just don't know >>>> whether it's worth the time to try and figure this out or if the >>>> problems stem from the FS being too old and it's beyond trying to >>>> repair. >>> >>> All invalid inode generations, should be able to be repaired by latest >>> btrfs-check. >>> >>> If not, please provide the btrfs-image dump for us to determine what's >>> going wrong. >>> >>> Thanks, >>> Qu >>>> >>>> On Tue, Aug 18, 2020 at 8:18 AM Tyler Richmond wrote: >>>>> >>>>> I didn't check dmesg during the btrfs check, but that was the only >>>>> output during the rm -f before it was forced readonly. I just checked >>>>> dmesg for inode generation values, and there are a lot of them. >>>>> >>>>> https://pastebin.com/stZdN0ta >>>>> The dmesg output had 990 lines containing inode generation. >>>>> >>>>> However, these were at least later. I tried to do a btrfs balance >>>>> -mconvert raid1 and it failed with an I/O error. That is probably what >>>>> generated these specific errors, but maybe they were also happening >>>>> during the btrfs repair. >>>>> >>>>> The FS is ~45TB, but the btrfs-image -c9 failed anway with: >>>>> ERROR: either extent tree is corrupted or deprecated extent ref format >>>>> ERROR: create failed: -5 >> >> Oh, forgot this part. >> >> This means you have v0 ref?! >> >> Then the fs is too old, no progs/kernel support after all. >> >> In that case, please rollback to the last working kernel and copy your data. >> >> In fact, that v0 ref should only be in the code base for several weeks >> before 2010, thus it's really too old. >> >> The good news is, with tree-checker, we should never experience such >> too-old-to-be-usable problem (at least I hope so) >> >> Thanks, >> Qu >> >>>>> >>>>> >>>>> On Tue, Aug 18, 2020 at 2:07 AM Qu Wenruo wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/8/18 上午11:35, Tyler Richmond wrote: >>>>>>> Qu, >>>>>>> >>>>>>> Sorry to resurrect this thread, but I just ran into something that I >>>>>>> can't really just ignore. I've found a folder that is full of files >>>>>>> which I guess have been broken somehow. I found a backup and restored >>>>>>> them, but I want to delete this folder of broken files. But whenever I >>>>>>> try, the fs is forced into readonly mode again. I just finished another >>>>>>> btrfs check --repair but it didn't fix the problem. >>>>>>> >>>>>>> https://pastebin.com/eTV3s3fr >>>>>> >>>>>> Is that the full output? >>>>>> >>>>>> No inode generation bugs? >>>>>>> >>>>>>> I'm already on btrfs-progs v5.7. Any new suggestions? >>>>>> >>>>>> Strange. >>>>>> >>>>>> The detection and repair should have been merged into v5.5. >>>>>> >>>>>> If your fs is small enough, would you please provide the "btrfs-image >>>>>> -c9" dump? >>>>>> >>>>>> It would contain the filenames and directories names, but doesn't >>>>>> contain file contents. >>>>>> >>>>>> Thanks, >>>>>> Qu >>>>>>> >>>>>>> On Fri, May 8, 2020 at 9:52 AM Tyler Richmond >>>>>> > wrote: >>>>>>> >>>>>>> 5.6.1 also failed the same way. Here's the usage output. This is the >>>>>>> part where you see I've been using RAID5 haha >>>>>>> >>>>>>> WARNING: RAID56 detected, not implemented >>>>>>> Overall: >>>>>>> Device size: 60.03TiB >>>>>>> Device allocated: 98.06GiB >>>>>>> Device unallocated: 59.93TiB >>>>>>> Device missing: 0.00B >>>>>>> Used: 92.56GiB >>>>>>> Free (estimated): 0.00B (min: 8.00EiB) >>>>>>> Data ratio: 0.00 >>>>>>> Metadata ratio: 2.00 >>>>>>> Global reserve: 512.00MiB (used: 0.00B) >>>>>>> Multiple profiles: no >>>>>>> >>>>>>> Data,RAID5: Size:40.35TiB, Used:40.12TiB (99.42%) >>>>>>> /dev/sdh 8.07TiB >>>>>>> /dev/sdf 8.07TiB >>>>>>> /dev/sdg 8.07TiB >>>>>>> /dev/sdd 8.07TiB >>>>>>> /dev/sdc 8.07TiB >>>>>>> /dev/sde 8.07TiB >>>>>>> >>>>>>> Metadata,RAID1: Size:49.00GiB, Used:46.28GiB (94.44%) >>>>>>> /dev/sdh 34.00GiB >>>>>>> /dev/sdf 32.00GiB >>>>>>> /dev/sdg 32.00GiB >>>>>>> >>>>>>> System,RAID1: Size:32.00MiB, Used:2.20MiB (6.87%) >>>>>>> /dev/sdf 32.00MiB >>>>>>> /dev/sdg 32.00MiB >>>>>>> >>>>>>> Unallocated: >>>>>>> /dev/sdh 2.81TiB >>>>>>> /dev/sdf 2.81TiB >>>>>>> /dev/sdg 2.81TiB >>>>>>> /dev/sdd 1.03TiB >>>>>>> /dev/sdc 1.03TiB >>>>>>> /dev/sde 1.03TiB >>>>>>> >>>>>>> On Fri, May 8, 2020 at 1:47 AM Qu Wenruo >>>>>> > wrote: >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On 2020/5/8 下午1:12, Tyler Richmond wrote: >>>>>>> > > If this is saying there's no extra space for metadata, is that why >>>>>>> > > adding more files often makes the system hang for 30-90s? Is there >>>>>>> > > anything I should do about that? >>>>>>> > >>>>>>> > I'm not sure about the hang though. >>>>>>> > >>>>>>> > It would be nice to give more info to diagnosis. >>>>>>> > The output of 'btrfs fi usage' is useful for space usage problem. >>>>>>> > >>>>>>> > But the common idea is, to keep at 1~2 Gi unallocated (not avaiable >>>>>>> > space in vanilla df command) space for btrfs. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Qu >>>>>>> > >>>>>>> > > >>>>>>> > > Thank you so much for all of your help. I love how flexible BTRFS is >>>>>>> > > but when things go wrong it's very hard for me to troubleshoot. >>>>>>> > > >>>>>>> > > On Fri, May 8, 2020 at 1:07 AM Qu Wenruo >>>>>> > wrote: >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> On 2020/5/8 下午12:23, Tyler Richmond wrote: >>>>>>> > >>> Something went wrong: >>>>>>> > >>> >>>>>>> > >>> Reinitialize checksum tree >>>>>>> > >>> Unable to find block group for 0 >>>>>>> > >>> Unable to find block group for 0 >>>>>>> > >>> Unable to find block group for 0 >>>>>>> > >>> ctree.c:2272: split_leaf: BUG_ON `1` triggered, value 1 >>>>>>> > >>> btrfs(+0x6dd94)[0x55a933af7d94] >>>>>>> > >>> btrfs(+0x71b94)[0x55a933afbb94] >>>>>>> > >>> btrfs(btrfs_search_slot+0x11f0)[0x55a933afd6c8] >>>>>>> > >>> btrfs(btrfs_csum_file_block+0x432)[0x55a933b19d09] >>>>>>> > >>> btrfs(+0x360b2)[0x55a933ac00b2] >>>>>>> > >>> btrfs(+0x46a3e)[0x55a933ad0a3e] >>>>>>> > >>> btrfs(main+0x98)[0x55a933a9fe88] >>>>>>> > >>> >>>>>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f263ed550b3] >>>>>>> > >>> btrfs(_start+0x2e)[0x55a933a9fa0e] >>>>>>> > >>> Aborted >>>>>>> > >> >>>>>>> > >> This means no space for extra metadata... >>>>>>> > >> >>>>>>> > >> Anyway the csum tree problem shouldn't be a big thing, you >>>>>>> could leave >>>>>>> > >> it and call it a day. >>>>>>> > >> >>>>>>> > >> BTW, as long as btrfs check reports no extra problem for the inode >>>>>>> > >> generation, it should be pretty safe to use the fs. >>>>>>> > >> >>>>>>> > >> Thanks, >>>>>>> > >> Qu >>>>>>> > >>> >>>>>>> > >>> I just noticed I have btrfs-progs 5.6 installed and 5.6.1 is >>>>>>> > >>> available. I'll let that try overnight? >>>>>>> > >>> >>>>>>> > >>> On Thu, May 7, 2020 at 8:11 PM Qu Wenruo >>>>>>> > wrote: >>>>>>> > >>>> >>>>>>> > >>>> >>>>>>> > >>>> >>>>>>> > >>>> On 2020/5/7 下午11:52, Tyler Richmond wrote: >>>>>>> > >>>>> Thank you for helping. The end result of the scan was: >>>>>>> > >>>>> >>>>>>> > >>>>> >>>>>>> > >>>>> [1/7] checking root items >>>>>>> > >>>>> [2/7] checking extents >>>>>>> > >>>>> [3/7] checking free space cache >>>>>>> > >>>>> [4/7] checking fs roots >>>>>>> > >>>> >>>>>>> > >>>> Good news is, your fs is still mostly fine. >>>>>>> > >>>> >>>>>>> > >>>>> [5/7] checking only csums items (without verifying data) >>>>>>> > >>>>> there are no extents for csum range 0-69632 >>>>>>> > >>>>> csum exists for 0-69632 but there is no extent record >>>>>>> > >>>>> ... >>>>>>> > >>>>> ... >>>>>>> > >>>>> there are no extents for csum range 946692096-946827264 >>>>>>> > >>>>> csum exists for 946692096-946827264 but there is no extent >>>>>>> record >>>>>>> > >>>>> there are no extents for csum range 946831360-947912704 >>>>>>> > >>>>> csum exists for 946831360-947912704 but there is no extent >>>>>>> record >>>>>>> > >>>>> ERROR: errors found in csum tree >>>>>>> > >>>> >>>>>>> > >>>> Only extent tree is corrupted. >>>>>>> > >>>> >>>>>>> > >>>> Normally btrfs check --init-csum-tree should be able to >>>>>>> handle it. >>>>>>> > >>>> >>>>>>> > >>>> But still, please be sure you're using the latest btrfs-progs >>>>>>> to fix it. >>>>>>> > >>>> >>>>>>> > >>>> Thanks, >>>>>>> > >>>> Qu >>>>>>> > >>>> >>>>>>> > >>>>> [6/7] checking root refs >>>>>>> > >>>>> [7/7] checking quota groups skipped (not enabled on this FS) >>>>>>> > >>>>> found 44157956026368 bytes used, error(s) found >>>>>>> > >>>>> total csum bytes: 42038602716 >>>>>>> > >>>>> total tree bytes: 49688616960 >>>>>>> > >>>>> total fs tree bytes: 1256427520 >>>>>>> > >>>>> total extent tree bytes: 1709105152 >>>>>>> > >>>>> btree space waste bytes: 3172727316 >>>>>>> > >>>>> file data blocks allocated: 261625653436416 >>>>>>> > >>>>> referenced 47477768499200 >>>>>>> > >>>>> >>>>>>> > >>>>> What do I need to do to fix all of this? >>>>>>> > >>>>> >>>>>>> > >>>>> On Thu, May 7, 2020 at 1:52 AM Qu Wenruo >>>>>>> > wrote: >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> On 2020/5/7 下午1:43, Tyler Richmond wrote: >>>>>>> > >>>>>>> Well, the repair doesn't look terribly successful. >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>> >>>>>>> > >>>>>> This means there are more problems, not only the hash name >>>>>>> mismatch. >>>>>>> > >>>>>> >>>>>>> > >>>>>> This means the fs is already corrupted, the name hash is >>>>>>> just one >>>>>>> > >>>>>> unrelated symptom. >>>>>>> > >>>>>> >>>>>>> > >>>>>> The only good news is, btrfs-progs abort the transaction, >>>>>>> thus no >>>>>>> > >>>>>> further damage to the fs. >>>>>>> > >>>>>> >>>>>>> > >>>>>> Please run a plain btrfs-check to show what's the problem >>>>>>> first. >>>>>>> > >>>>>> >>>>>>> > >>>>>> Thanks, >>>>>>> > >>>>>> Qu >>>>>>> > >>>>>> >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> parent transid verify failed on 218620880703488 wanted >>>>>>> 6875841 found 6876224 >>>>>>> > >>>>>>> Ignoring transid failure >>>>>>> > >>>>>>> ERROR: child eb corrupted: parent bytenr=225049956061184 >>>>>>> item=84 >>>>>>> > >>>>>>> parent level=1 >>>>>>> > >>>>>>> child level=4 >>>>>>> > >>>>>>> ERROR: failed to zero log tree: -17 >>>>>>> > >>>>>>> ERROR: attempt to start transaction over already running one >>>>>>> > >>>>>>> WARNING: reserved space leaked, flag=0x4 bytes_reserved=4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066086400 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066086400 len 4096 >>>>>>> > >>>>>>> WARNING: dirty eb leak (aborted trans): start >>>>>>> 225049066086400 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066094592 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066094592 len 4096 >>>>>>> > >>>>>>> WARNING: dirty eb leak (aborted trans): start >>>>>>> 225049066094592 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066102784 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066102784 len 4096 >>>>>>> > >>>>>>> WARNING: dirty eb leak (aborted trans): start >>>>>>> 225049066102784 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066131456 len 4096 >>>>>>> > >>>>>>> extent buffer leak: start 225049066131456 len 4096 >>>>>>> > >>>>>>> WARNING: dirty eb leak (aborted trans): start >>>>>>> 225049066131456 len 4096 >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> What is going on? >>>>>>> > >>>>>>> >>>>>>> > >>>>>>> On Wed, May 6, 2020 at 9:30 PM Tyler Richmond >>>>>>> > wrote: >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>> Chris, I had used the correct mountpoint in the command. >>>>>>> I just edited >>>>>>> > >>>>>>>> it in the email to be /mountpoint for consistency. >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>> Qu, I'll try the repair. Fingers crossed! >>>>>>> > >>>>>>>> >>>>>>> > >>>>>>>> On Wed, May 6, 2020 at 9:13 PM Qu Wenruo >>>>>>> > wrote: >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> On 2020/5/7 上午5:54, Tyler Richmond wrote: >>>>>>> > >>>>>>>>>> Hello, >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> I looked up this error and it basically says ask a >>>>>>> developer to >>>>>>> > >>>>>>>>>> determine if it's a false error or not. I just started >>>>>>> getting some >>>>>>> > >>>>>>>>>> slow response times, and looked at the dmesg log to >>>>>>> find a ton of >>>>>>> > >>>>>>>>>> these errors. >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> [192088.446299] BTRFS critical (device sdh): corrupt >>>>>>> leaf: root=5 >>>>>>> > >>>>>>>>>> block=203510940835840 slot=4 ino=1311670, invalid inode >>>>>>> generation: >>>>>>> > >>>>>>>>>> has 18446744073709551492 expect [0, 6875827] >>>>>>> > >>>>>>>>>> [192088.449823] BTRFS error (device sdh): >>>>>>> block=203510940835840 read >>>>>>> > >>>>>>>>>> time tree block corruption detected >>>>>>> > >>>>>>>>>> [192088.459238] BTRFS critical (device sdh): corrupt >>>>>>> leaf: root=5 >>>>>>> > >>>>>>>>>> block=203510940835840 slot=4 ino=1311670, invalid inode >>>>>>> generation: >>>>>>> > >>>>>>>>>> has 18446744073709551492 expect [0, 6875827] >>>>>>> > >>>>>>>>>> [192088.462773] BTRFS error (device sdh): >>>>>>> block=203510940835840 read >>>>>>> > >>>>>>>>>> time tree block corruption detected >>>>>>> > >>>>>>>>>> [192088.464711] BTRFS critical (device sdh): corrupt >>>>>>> leaf: root=5 >>>>>>> > >>>>>>>>>> block=203510940835840 slot=4 ino=1311670, invalid inode >>>>>>> generation: >>>>>>> > >>>>>>>>>> has 18446744073709551492 expect [0, 6875827] >>>>>>> > >>>>>>>>>> [192088.468457] BTRFS error (device sdh): >>>>>>> block=203510940835840 read >>>>>>> > >>>>>>>>>> time tree block corruption detected >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> btrfs device stats, however, doesn't show any errors. >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> Is there anything I should do about this, or should I >>>>>>> just continue >>>>>>> > >>>>>>>>>> using my array as normal? >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> This is caused by older kernel underflow inode generation. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Latest btrfs-progs can fix it, using btrfs check --repair. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Or you can go safer, by manually locating the inode >>>>>>> using its inode >>>>>>> > >>>>>>>>> number (1311670), and copy it to some new location using >>>>>>> previous >>>>>>> > >>>>>>>>> working kernel, then delete the old file, copy the new >>>>>>> one back to fix it. >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>> Thanks, >>>>>>> > >>>>>>>>> Qu >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>>> Thank you! >>>>>>> > >>>>>>>>>> >>>>>>> > >>>>>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>> >>>>>>> > >> >>>>>>> > >>>>>>> >>>>>> >>> >>