* btrfs problems @ 2018-09-16 13:58 Adrian Bastholm 2018-09-16 14:50 ` Qu Wenruo 2018-09-16 18:35 ` Chris Murphy 0 siblings, 2 replies; 16+ messages in thread From: Adrian Bastholm @ 2018-09-16 13:58 UTC (permalink / raw) To: linux-btrfs Hello all Actually I'm not trying to get any help any more, I gave up BTRFS on the desktop, but I'd like to share my efforts of trying to fix my problems, in hope I can help some poor noob like me. I decided to use BTRFS after reading the ArsTechnica article about the next-gen filesystems, and BTRFS seemed like the natural choice, open source, built into linux, etc. I even bought a HP microserver to have everything on because none of the commercial NAS-es supported BTRFS. What a mistake, I wasted weeks in total managing something that could have taken a day to set up, and I'd have MUCH more functionality now (if I wasn't hit by some ransomware, that is). I had three 1TB drives, chose to use raid, and all was good for a while, until started fiddling with Motion, the image capturing software. When you kill that process (my take on it) a file can be written but it ends up with question marks instead of attributes, and it's impossible to remove. BTRFS check --repair is not recommended, it crashes , doesn't fix all problems, and I later found out that my lost+found dir had about 39G of lost files and dirs. I spent about two days trying to fix everything, removing a disk, adding it again, checking , you name it. I ended up removing one disk, reformatting it, and moving the data there. Now I removed BTRFS entirely and replaced it with a OpenZFS mirror array, to which I'll add the third disk later when I transferred everything over. Please have a look at the console logs. I've been running linux on the desktop for the past 15 years, so I'm not a noob, but for running BTRFS you better be involved in the development of it. In my humble opinion, it's not for us "users" just yet. Not even for power users. For those of you considering building a NAS without special purposes, don't. Buy a synology, pop in a couple of drives, and enjoy the ride. ------------ root /home/storage/motion/2017-05-24 1 ls -al ls: cannot access '36-20170524201346-02.jpg': No such file or directory ls: cannot access '36-20170524201346-02.jpg': No such file or directory total 4 drwxrwxrwx 1 motion motion 114 Sep 14 12:48 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -????????? ? ? ? ? ? 36-20170524201346-02.jpg -????????? ? ? ? ? ? 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py root /home/storage/motion/2017-05-24 1 touch test.raw root /home/storage/motion/2017-05-24 cat /dev/random > test.raw ^C root /home/storage/motion/2017-05-24 ls -al ls: cannot access '36-20170524201346-02.jpg': No such file or directory ls: cannot access '36-20170524201346-02.jpg': No such file or directory total 8 drwxrwxrwx 1 motion motion 130 Sep 14 13:12 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -????????? ? ? ? ? ? 36-20170524201346-02.jpg -????????? ? ? ? ? ? 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw root /home/storage/motion/2017-05-24 1 cp test.raw 36-20170524201346-02.jpg 'test.raw' -> '36-20170524201346-02.jpg' root /home/storage/motion/2017-05-24 ls -al total 20 drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw root /home/storage/motion/2017-05-24 chmod 777 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 ls -al total 20 drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw root /home/storage/motion/2017-05-24 unlink 36-20170524201346-02.jpg unlink: cannot unlink '36-20170524201346-02.jpg': No such file or directory root /home/storage/motion/2017-05-24 1 ls -al total 20 drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw root /home/storage/motion/2017-05-24 journalctl -k | grep BTRFS Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant storage devid 4 transid 348450 /dev/sdd Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant storage devid 2 transid 348450 /dev/sdb Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant storage devid 3 transid 348450 /dev/sdc Sep 14 09:41:58 jenna kernel: BTRFS info (device sdc): enabling auto defrag Sep 14 09:41:58 jenna kernel: BTRFS info (device sdc): disabling disk space caching Sep 14 12:52:36 jenna kernel: BTRFS: Transaction aborted (error -2) Sep 14 12:52:36 jenna kernel: BTRFS: error (device sdc) in btrfs_rename:9943: errno=-2 No such entry Sep 14 12:52:36 jenna kernel: BTRFS info (device sdc): forced readonly Sep 14 13:02:26 jenna kernel: BTRFS error (device sdc): cleaner transaction attach returned -30 Sep 14 13:03:41 jenna kernel: BTRFS info (device sdc): disk space caching is enabled root /home/storage/motion/2017-05-24 root ~ btrfs scrub status /home/storage/ scrub status for 72ea6622-5098-4a0f-bea1-9a5e5a325735 scrub started at Fri Sep 14 13:06:46 2018 and finished after 00:56:35 total bytes scrubbed: 1.16TiB with 0 errors root /home/storage/motion/2017-05-24 stat 36-20170524201346-02.jpg File: 36-20170524201346-02.jpg Size: 338 Blocks: 8 IO Block: 4096 regular file Device: 29h/41d Inode: 12616879 Links: 1 Access: (0777/-rwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-09-14 13:13:35.477264025 +0200 Modify: 2018-09-14 13:13:35.477264025 +0200 Change: 2018-09-14 13:14:02.025170343 +0200 Birth: - root /home/storage/motion/2017-05-24 1 find . -inum 12616879 -exec rm -i {} \; rm: remove regular file './36-20170524201346-02.jpg'? y rm: cannot remove './36-20170524201346-02.jpg': No such file or directory root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 ls -al total 20 drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw root /home/storage/motion/2017-05-24 rm 36-20170524201346-02.jpg rm: cannot remove '36-20170524201346-02.jpg': No such file or directory root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg root /home/storage/motion/2017-05-24 ... more of the same root /home/storage/motion rm -rf 2017-05-24/ rm: cannot remove '2017-05-24/': Directory not empty root /home/storage/motion 1 ls -al 2017-05-24/ ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file or directory ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file or directory ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file or directory total 0 drwxrwxrwx 1 motion motion 144 Sep 14 14:25 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -????????? ? ? ? ? ? 36-20170524201346-02.jpg -????????? ? ? ? ? ? 36-20170524201346-02.jpg -????????? ? ? ? ? ? 36-20170524201346-02.jpg root ~ btrfs check /dev/sdb warning, device 3 is missing warning, device 3 is missing Checking filesystem on /dev/sdb UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 checking extents checking free space cache failed to load free space cache for block group 9998483259392 failed to load free space cache for block group 10388251541504 failed to load free space cache for block group 10483848118272 checking fs roots root 5 inode 11189411 errors 200, dir isize wrong unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 errors 6, no dir index, no inode ref unresolved ref dir 11189411 index 9477 namelen 24 name 36-20170524201346-02.jpg filetype 1 errors 1, no dir item root 5 inode 12616877 errors 2000, link count wrong unresolved ref dir 11189411 index 9482 namelen 24 name 36-20170524201346-02.jpg filetype 1 errors 1, no dir item root 5 inode 12616879 errors 2000, link count wrong unresolved ref dir 11189411 index 9484 namelen 24 name 36-20170524201346-02.jpg filetype 1 errors 1, no dir item found 639613362176 bytes used err is 1 total csum bytes: 605048928 total tree bytes: 828735488 total fs tree bytes: 182419456 total extent tree bytes: 18399232 btree space waste bytes: 47806043 file data blocks allocated: 969656111104 referenced 634590535680 root ~ 1 btrfs check --repair /dev/sdb enabling repair mode warning, device 3 is missing warning, device 3 is missing Checking filesystem on /dev/sdb UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 checking extents Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x43e418] btrfs(btrfs_reserve_extent+0x5c9)[0x4425df] btrfs(btrfs_alloc_free_block+0x63)[0x44297c] btrfs(__btrfs_cow_block+0xfc)[0x436636] btrfs(btrfs_cow_block+0x8b)[0x436bd8] btrfs[0x43ad82] btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc] btrfs[0x4268b4] btrfs(cmd_check+0x1111)[0x427d6d] btrfs(main+0x12f)[0x40a341] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd7a78002e1] btrfs(_start+0x2a)[0x40a37a] root ~ 1 btrfs check --repair /dev/sdc enabling repair mode warning, device 2 is missing Checking filesystem on /dev/sdc UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots reset isize for dir 11189411 root 5 unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 errors 6, no dir index, no inode ref unresolved ref dir 11189411 index 9477 namelen 24 name 36-20170524201346-02.jpg filetype 1 errors 1, no dir item invalid dir item size Moving file '36-20170524201346-02.jpg' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 12616877 invalid dir item size Moving file '36-20170524201346-02.jpg.12616879' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 12616879 unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 errors 6, no dir index, no inode ref unresolved ref dir 11189411 index 9477 namelen 24 name 36-20170524201346-02.jpg filetype 1 errors 1, no dir item checking csums checking root refs found 639613362176 bytes used err is 0 total csum bytes: 605048928 total tree bytes: 828735488 total fs tree bytes: 182419456 total extent tree bytes: 18399232 btree space waste bytes: 47806043 file data blocks allocated: 969656111104 referenced 634590535680 root ~ 251 btrfs check /dev/sdb warning, device 3 is missing warning, device 3 is missing parent transid verify failed on 9998522662912 wanted 348736 found 348741 parent transid verify failed on 9998522662912 wanted 348736 found 348741 Ignoring transid failure Couldn't setup extent tree Couldn't open file system root ~ 251 mount /home/storage/ root ~ watch btrfs scrub status /home/storage/ root ~ ls /home/storage/motion/2017-05-24/ ls: cannot access '/home/storage/motion/2017-05-24/36-20170524201346-02.jpg': No such file or directory 36-20170524201346-02.jpg total 0 drwxrwxrwx 1 motion motion 24 Sep 14 14:25 . drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. -????????? ? ? ? ? ? 36-20170524201346-02.jpg Back to square one [12031.946724] BTRFS error (device sdc): cleaner transaction attach returned -30 [19272.100407] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 0, gen 1 [19272.104100] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 0, gen 2 [19272.120344] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 0, gen 3 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-16 13:58 btrfs problems Adrian Bastholm @ 2018-09-16 14:50 ` Qu Wenruo [not found] ` <CAMrg+aTNK1cBG7rGVfudpydD6hMJz9UW0-3mdS8Yx4tqAQZE6Q@mail.gmail.com> 2018-09-16 18:35 ` Chris Murphy 1 sibling, 1 reply; 16+ messages in thread From: Qu Wenruo @ 2018-09-16 14:50 UTC (permalink / raw) To: Adrian Bastholm, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 16186 bytes --] On 2018/9/16 下午9:58, Adrian Bastholm wrote: > Hello all > Actually I'm not trying to get any help any more, I gave up BTRFS on > the desktop, but I'd like to share my efforts of trying to fix my > problems, in hope I can help some poor noob like me. > > I decided to use BTRFS after reading the ArsTechnica article about the > next-gen filesystems, and BTRFS seemed like the natural choice, open > source, built into linux, etc. I even bought a HP microserver to have > everything on because none of the commercial NAS-es supported BTRFS. > What a mistake, I wasted weeks in total managing something that could > have taken a day to set up, and I'd have MUCH more functionality now > (if I wasn't hit by some ransomware, that is). > > I had three 1TB drives, chose to use raid, and all was good for a > while, until started fiddling with Motion, the image capturing > software. When you kill that process (my take on it) a file can be > written but it ends up with question marks instead of attributes, and > it's impossible to remove. At this timing, your fs is already corrupted. I'm not sure about the reason, it can be a failed CoW combined with powerloss, or corrupted free space cache, or some old kernel bugs. Anyway, the metadata itself is already corrupted, and I believe it happens even before you noticed. > BTRFS check --repair is not recommended, it > crashes , doesn't fix all problems, and I later found out that my > lost+found dir had about 39G of lost files and dirs. lost+found is completely created by btrfs check --repair. > I spent about two days trying to fix everything, removing a disk, > adding it again, checking , you name it. I ended up removing one disk, > reformatting it, and moving the data there. Well, I would recommend to submit such problem to the mail list *BEFORE* doing any write operation to the fs (including btrfs check --repair). As it would help us to analyse the failure pattern to further enhance btrfs. > Now I removed BTRFS > entirely and replaced it with a OpenZFS mirror array, to which I'll > add the third disk later when I transferred everything over. Understandable, it's really annoying a fs just get itself corrupted, and without much btrfs specified knowledge it would just be a hell to try any method to fix it (even a lot of them would just make the case worse). > > Please have a look at the console logs. I've been running linux on the > desktop for the past 15 years, so I'm not a noob, but for running > BTRFS you better be involved in the development of it. I'd say, yes. For any btrfs unexpected behavior, don't use btrfs check --repair unless you're a developer or some developer asked to do. Any btrfs unexpected behavior, from strange ls output to aborted transaction, please consult with the mail list first. (Of course, with kernel version and btrfs-progs version, which is missing in your console log though) In fact, in recent (IIRC starting from v4.15) kernel releases, btrfs is already doing much better error detection thus it would detect such problem early on and protect the fs from being further modified. (This further shows that the importance of using the latest mainline kernel other than some old kernel provided by stable distribution). Thanks, Qu > In my humble > opinion, it's not for us "users" just yet. Not even for power users. > > For those of you considering building a NAS without special purposes, > don't. Buy a synology, pop in a couple of drives, and enjoy the ride. > > > ------------ > root /home/storage/motion/2017-05-24 1 ls -al > ls: cannot access '36-20170524201346-02.jpg': No such file or directory > ls: cannot access '36-20170524201346-02.jpg': No such file or directory > total 4 > drwxrwxrwx 1 motion motion 114 Sep 14 12:48 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > root /home/storage/motion/2017-05-24 1 touch test.raw > root /home/storage/motion/2017-05-24 cat /dev/random > test.raw > ^C > root /home/storage/motion/2017-05-24 ls -al > ls: cannot access '36-20170524201346-02.jpg': No such file or directory > ls: cannot access '36-20170524201346-02.jpg': No such file or directory > total 8 > drwxrwxrwx 1 motion motion 130 Sep 14 13:12 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw > root /home/storage/motion/2017-05-24 1 cp test.raw > 36-20170524201346-02.jpg > 'test.raw' -> '36-20170524201346-02.jpg' > > root /home/storage/motion/2017-05-24 ls -al > total 20 > drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw > > root /home/storage/motion/2017-05-24 chmod 777 36-20170524201346-02.jpg > > root /home/storage/motion/2017-05-24 ls -al > total 20 > drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw > root /home/storage/motion/2017-05-24 unlink 36-20170524201346-02.jpg > unlink: cannot unlink '36-20170524201346-02.jpg': No such file or directory > > root /home/storage/motion/2017-05-24 1 ls -al > total 20 > drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw > > root /home/storage/motion/2017-05-24 journalctl -k | grep BTRFS > Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant > storage devid 4 transid 348450 /dev/sdd > Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant > storage devid 2 transid 348450 /dev/sdb > Sep 14 09:41:58 jenna kernel: BTRFS: device label BTRFS Redundant > storage devid 3 transid 348450 /dev/sdc > Sep 14 09:41:58 jenna kernel: BTRFS info (device sdc): enabling auto defrag > Sep 14 09:41:58 jenna kernel: BTRFS info (device sdc): disabling disk > space caching > Sep 14 12:52:36 jenna kernel: BTRFS: Transaction aborted (error -2) > Sep 14 12:52:36 jenna kernel: BTRFS: error (device sdc) in > btrfs_rename:9943: errno=-2 No such entry > Sep 14 12:52:36 jenna kernel: BTRFS info (device sdc): forced readonly > Sep 14 13:02:26 jenna kernel: BTRFS error (device sdc): cleaner > transaction attach returned -30 > Sep 14 13:03:41 jenna kernel: BTRFS info (device sdc): disk space > caching is enabled > root /home/storage/motion/2017-05-24 > > root ~ btrfs scrub status /home/storage/ > scrub status for 72ea6622-5098-4a0f-bea1-9a5e5a325735 > scrub started at Fri Sep 14 13:06:46 2018 and finished after 00:56:35 > total bytes scrubbed: 1.16TiB with 0 errors > > root /home/storage/motion/2017-05-24 stat 36-20170524201346-02.jpg > File: 36-20170524201346-02.jpg > Size: 338 Blocks: 8 IO Block: 4096 regular file > Device: 29h/41d Inode: 12616879 Links: 1 > Access: (0777/-rwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2018-09-14 13:13:35.477264025 +0200 > Modify: 2018-09-14 13:13:35.477264025 +0200 > Change: 2018-09-14 13:14:02.025170343 +0200 > Birth: - > > root /home/storage/motion/2017-05-24 1 find . -inum 12616879 > -exec rm -i {} \; > rm: remove regular file './36-20170524201346-02.jpg'? y > rm: cannot remove './36-20170524201346-02.jpg': No such file or directory > > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 ls -al > total 20 > drwxrwxrwx 1 motion motion 178 Sep 14 13:13 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg > -rwxr-xr-x 1 adyhasch adyhasch 62 Sep 14 12:43 remove.py > -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw > root /home/storage/motion/2017-05-24 rm 36-20170524201346-02.jpg > rm: cannot remove '36-20170524201346-02.jpg': No such file or directory > > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 rm -f 36-20170524201346-02.jpg > root /home/storage/motion/2017-05-24 > ... more of the same > root /home/storage/motion rm -rf 2017-05-24/ > rm: cannot remove '2017-05-24/': Directory not empty > root /home/storage/motion 1 ls -al 2017-05-24/ > ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file > or directory > ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file > or directory > ls: cannot access '2017-05-24/36-20170524201346-02.jpg': No such file > or directory > total 0 > drwxrwxrwx 1 motion motion 144 Sep 14 14:25 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > > root ~ btrfs check /dev/sdb > warning, device 3 is missing > warning, device 3 is missing > Checking filesystem on /dev/sdb > UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 > checking extents > checking free space cache > failed to load free space cache for block group 9998483259392 > failed to load free space cache for block group 10388251541504 > failed to load free space cache for block group 10483848118272 > checking fs roots > root 5 inode 11189411 errors 200, dir isize wrong > unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 > errors 6, no dir index, no inode ref > unresolved ref dir 11189411 index 9477 namelen 24 name > 36-20170524201346-02.jpg filetype 1 errors 1, no dir item > root 5 inode 12616877 errors 2000, link count wrong > unresolved ref dir 11189411 index 9482 namelen 24 name > 36-20170524201346-02.jpg filetype 1 errors 1, no dir item > root 5 inode 12616879 errors 2000, link count wrong > unresolved ref dir 11189411 index 9484 namelen 24 name > 36-20170524201346-02.jpg filetype 1 errors 1, no dir item > found 639613362176 bytes used err is 1 > total csum bytes: 605048928 > total tree bytes: 828735488 > total fs tree bytes: 182419456 > total extent tree bytes: 18399232 > btree space waste bytes: 47806043 > file data blocks allocated: 969656111104 > referenced 634590535680 > > > root ~ 1 btrfs check --repair /dev/sdb > enabling repair mode > warning, device 3 is missing > warning, device 3 is missing > Checking filesystem on /dev/sdb > UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 > checking extents > Unable to find block group for 0 > extent-tree.c:289: find_search_start: Assertion `1` failed. > btrfs[0x43e418] > btrfs(btrfs_reserve_extent+0x5c9)[0x4425df] > btrfs(btrfs_alloc_free_block+0x63)[0x44297c] > btrfs(__btrfs_cow_block+0xfc)[0x436636] > btrfs(btrfs_cow_block+0x8b)[0x436bd8] > btrfs[0x43ad82] > btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc] > btrfs[0x4268b4] > btrfs(cmd_check+0x1111)[0x427d6d] > btrfs(main+0x12f)[0x40a341] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd7a78002e1] > btrfs(_start+0x2a)[0x40a37a] > > > root ~ 1 btrfs check --repair /dev/sdc > enabling repair mode > warning, device 2 is missing > Checking filesystem on /dev/sdc > UUID: 72ea6622-5098-4a0f-bea1-9a5e5a325735 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > reset isize for dir 11189411 root 5 > unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 > errors 6, no dir index, no inode ref > unresolved ref dir 11189411 index 9477 namelen 24 name > 36-20170524201346-02.jpg filetype 1 errors 1, no dir item > invalid dir item size > Moving file '36-20170524201346-02.jpg' to 'lost+found' dir since it > has no valid backref > Fixed the nlink of inode 12616877 > invalid dir item size > Moving file '36-20170524201346-02.jpg.12616879' to 'lost+found' dir > since it has no valid backref > Fixed the nlink of inode 12616879 > unresolved ref dir 11189411 index 0 namelen 0 name filetype 0 > errors 6, no dir index, no inode ref > unresolved ref dir 11189411 index 9477 namelen 24 name > 36-20170524201346-02.jpg filetype 1 errors 1, no dir item > checking csums > checking root refs > found 639613362176 bytes used err is 0 > total csum bytes: 605048928 > total tree bytes: 828735488 > total fs tree bytes: 182419456 > total extent tree bytes: 18399232 > btree space waste bytes: 47806043 > file data blocks allocated: 969656111104 > referenced 634590535680 > > > root ~ 251 btrfs check /dev/sdb > warning, device 3 is missing > warning, device 3 is missing > parent transid verify failed on 9998522662912 wanted 348736 found 348741 > parent transid verify failed on 9998522662912 wanted 348736 found 348741 > Ignoring transid failure > Couldn't setup extent tree > Couldn't open file system > > root ~ 251 mount /home/storage/ > root ~ watch btrfs scrub status /home/storage/ > root ~ ls /home/storage/motion/2017-05-24/ > ls: cannot access > '/home/storage/motion/2017-05-24/36-20170524201346-02.jpg': No such > file or directory > 36-20170524201346-02.jpg > total 0 > drwxrwxrwx 1 motion motion 24 Sep 14 14:25 . > drwxrwxr-x 1 motion adyhasch 60 Sep 14 09:42 .. > -????????? ? ? ? ? ? 36-20170524201346-02.jpg > > Back to square one > > [12031.946724] BTRFS error (device sdc): cleaner transaction attach returned -30 > [19272.100407] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd > 0, flush 0, corrupt 0, gen 1 > [19272.104100] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd > 0, flush 0, corrupt 0, gen 2 > [19272.120344] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd > 0, flush 0, corrupt 0, gen 3 > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAMrg+aTNK1cBG7rGVfudpydD6hMJz9UW0-3mdS8Yx4tqAQZE6Q@mail.gmail.com>]
* Fwd: btrfs problems [not found] ` <CAMrg+aTNK1cBG7rGVfudpydD6hMJz9UW0-3mdS8Yx4tqAQZE6Q@mail.gmail.com> @ 2018-09-16 20:11 ` Adrian Bastholm 2018-09-16 20:54 ` Chris Murphy [not found] ` <ecac52ad-70ed-e0e3-5660-1717f0d4f5e0@gmx.com> 1 sibling, 1 reply; 16+ messages in thread From: Adrian Bastholm @ 2018-09-16 20:11 UTC (permalink / raw) To: linux-btrfs Thanks for answering Qu. > At this timing, your fs is already corrupted. > I'm not sure about the reason, it can be a failed CoW combined with > powerloss, or corrupted free space cache, or some old kernel bugs. > > Anyway, the metadata itself is already corrupted, and I believe it > happens even before you noticed. I suspected it had to be like that > > > BTRFS check --repair is not recommended, it > > crashes , doesn't fix all problems, and I later found out that my > > lost+found dir had about 39G of lost files and dirs. > > lost+found is completely created by btrfs check --repair. > > > I spent about two days trying to fix everything, removing a disk, > > adding it again, checking , you name it. I ended up removing one disk, > > reformatting it, and moving the data there. > > Well, I would recommend to submit such problem to the mail list *BEFORE* > doing any write operation to the fs (including btrfs check --repair). > As it would help us to analyse the failure pattern to further enhance btrfs. IMHO that's a, how should I put it, a design flaw, the wrong way of looking at how people think, with all respect to all the very smart people that put in countless hours of hard work. Users expect and fs check and repair to repair, not to break stuff. Reading that --repair is "destructive" is contradictory even to me. This problem emerged in a direcory where motion (the camera software) was saving pictures. Either killing the process or a powerloss could have left these jpg files (or fs metadata) in a bad state. Maybe that's something to go on. I was thinking that there's not much anyone can do without root access to my box anyway, and I'm not sure I was prepared to give that to anyone. > > > Now I removed BTRFS > > entirely and replaced it with a OpenZFS mirror array, to which I'll > > add the third disk later when I transferred everything over. > > Understandable, it's really annoying a fs just get itself corrupted, and > without much btrfs specified knowledge it would just be a hell to try > any method to fix it (even a lot of them would just make the case worse). > I now know for a fact that ZFS has its own set of problems, like adding a vdev to an existing zpool is irreversible, or that you can't grow an array by just adding a bigger drive, stuff that seems som natural, and that BTRFS is very good at. > > > > Please have a look at the console logs. I've been running linux on the > > desktop for the past 15 years, so I'm not a noob, but for running > > BTRFS you better be involved in the development of it. > > I'd say, yes. > For any btrfs unexpected behavior, don't use btrfs check --repair unless > you're a developer or some developer asked to do. Again, this is counterintuitive. the repair option has been there in all other systems and has worked (more or less the same way). Fixing mbr's in windows, checkdisk, etc. Most linux fs variants create the lost+found folder, but I've never before with another filesystem read anywhere "don't use it unless you're one of the developers, it destroys your filesystem". In that case it should be called btrfs check --destructive so you don't get the impression that it'll somehow give you an easy fix > Any btrfs unexpected behavior, from strange ls output to aborted > transaction, please consult with the mail list first. > (Of course, with kernel version and btrfs-progs version, which is > missing in your console log though) Linux jenna 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux btrfs-progs is already the newest version (4.7.3-1). > In fact, in recent (IIRC starting from v4.15) kernel releases, btrfs is > already doing much better error detection thus it would detect such > problem early on and protect the fs from being further modified. > > (This further shows that the importance of using the latest mainline > kernel other than some old kernel provided by stable distribution). > Thanks, > Qu Thank You very much Qu for the comments, even though I ranted a bit, the purpose was to give a bit of feedback. -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-16 20:11 ` Fwd: " Adrian Bastholm @ 2018-09-16 20:54 ` Chris Murphy 0 siblings, 0 replies; 16+ messages in thread From: Chris Murphy @ 2018-09-16 20:54 UTC (permalink / raw) To: Adrian Bastholm; +Cc: Btrfs BTRFS On Sun, Sep 16, 2018 at 2:11 PM, Adrian Bastholm <adrian@javaguru.org> wrote: > Thanks for answering Qu. > >> At this timing, your fs is already corrupted. >> I'm not sure about the reason, it can be a failed CoW combined with >> powerloss, or corrupted free space cache, or some old kernel bugs. >> >> Anyway, the metadata itself is already corrupted, and I believe it >> happens even before you noticed. > I suspected it had to be like that >> >> > BTRFS check --repair is not recommended, it >> > crashes , doesn't fix all problems, and I later found out that my >> > lost+found dir had about 39G of lost files and dirs. >> >> lost+found is completely created by btrfs check --repair. >> >> > I spent about two days trying to fix everything, removing a disk, >> > adding it again, checking , you name it. I ended up removing one disk, >> > reformatting it, and moving the data there. >> >> Well, I would recommend to submit such problem to the mail list *BEFORE* >> doing any write operation to the fs (including btrfs check --repair). >> As it would help us to analyse the failure pattern to further enhance btrfs. > > IMHO that's a, how should I put it, a design flaw, the wrong way of > looking at how people think, with all respect to all the very smart > people that put in countless hours of hard work. Users expect and fs > check and repair to repair, not to break stuff. > Reading that --repair is "destructive" is contradictory even to me. It's contradictory to everyone including the developers. No developer set out to make --repair dangerous from the outset. It just turns out that it was a harder problem to solve and the thought was that it would keep getting better. Newer versions are "should be safe" now even if they can't fix everything. The far bigger issue I think the developers are aware of is that depending on repair at all for any Btrfs of appreciable size, is simply not scalable. Taking a day or a week to run a repair on a large file system, is unworkable. And that's why it's better to avoid inconsistencies in the first place which is what Btrfs is supposed to do, and if that's not happening it's a bug somewhere in Btrfs and also sometimes in the hardware. > This problem emerged in a direcory where motion (the camera software) > was saving pictures. Either killing the process or a powerloss could > have left these jpg files (or fs metadata) in a bad state. Maybe > that's something to go on. I was thinking that there's not much anyone > can do without root access to my box anyway, and I'm not sure I was > prepared to give that to anyone. I can't recommend raid56 for people new to Btrfs. It really takes qualified hardware to make sure there's no betrayal, as everything gets a lot more complicated with raid56. The general state of faulty device handling on Btrfs, makes raid56 very much a hands on approach you can't turn your back on it. And then when jumping into raid5, I advise raid1 for metadata. It reduces problems. And that's true for raid6 also, except that raid1 metadata is less redundancy than raid1 so...it's not helpful if you end up losing 2 devices. If you need production grade parity raid you should use openzfs, although I can't speak to how it behaves with respect to faulty devices on Linux. >> Any btrfs unexpected behavior, from strange ls output to aborted >> transaction, please consult with the mail list first. >> (Of course, with kernel version and btrfs-progs version, which is >> missing in your console log though) > > Linux jenna 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) > x86_64 GNU/Linux > btrfs-progs is already the newest version (4.7.3-1). Well the newest versions are kernel 4.18.8, and btrfs-progs 4.17.1, so in Btrfs terms those are kinda old. That is not inherently bad, but there are literally thousands of additions and deletions since kernel 4.9 so there's almost no way anyone on this list, except a developer familiar with backport status, can tell you if the problem you're seeing is a bug that's been fixed in that particular version. There aren't that many developers that familiar with that status who also have time to read user reports. Since this is an upstream list, most developers will want to know if you're able to reproduce the problem with a mainline kernel, because if you can it's very probable it's a bug that needs to be fixed upstream first before it can be backported. That's just the nature of kernel development generally. And you'll find the same thing on ext4 and XFS lists... The main reason why people use Debian and its older kernel bases is they're willing to accept certain bugginess in favor of stability. Transient bugs are really bad in that world. Consistent bugs they just find work arounds for (avoidance) until there's a known highly tested backport, because they want "The Behavior" to be predictable, both good and bad. That is not a model well suited for a file system that's in Btrfs really active development state. It's better now than it was even a couple years ago, where I'd say: just don't use RHEL or Debian or anything with old kernels except for experimenting; it's not worth the hassle; you're inevitably gonna have to use a newer kernel because all the Btrfs devs are busy making metric shittonnes of fixes in the mainline version. Today, it's not as bad as that. But still 4.9 is old in Btrfs terms. Should it be stable? For *your* problem for sure because that's just damn strange and something very goofy is going on. But is it possible there's a whole series of bugs happening in sequence that results in this kind of corruption? No idea. Maybe. And that's the main reason why quite a lot of users on this list use Fedora, Arch, Gentoo - so they're using the newest stable or even mainline rc kernels. And so if you want to run any file system, including Btrfs, in production with older kernels, you pick a distro that's doing that work. And right now it's openSUSE and SUSE that have the most Btrfs developers supporting 4.9 and 4.14 kernels and Btrfs. Most of those users are getting distro support, I don't often see SUSE users on here. OpenZFS is a different strategy because they're using out of tree code. So you can run older kernels, and compile the current openzfs code base against your older kernel. In effect you're using an older distro kernel, but with new file system code base supported by that upstream. -- Chris Murphy ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <ecac52ad-70ed-e0e3-5660-1717f0d4f5e0@gmx.com>]
* Re: btrfs problems [not found] ` <ecac52ad-70ed-e0e3-5660-1717f0d4f5e0@gmx.com> @ 2018-09-17 11:55 ` Adrian Bastholm 2018-09-17 12:44 ` Qu Wenruo 0 siblings, 1 reply; 16+ messages in thread From: Adrian Bastholm @ 2018-09-17 11:55 UTC (permalink / raw) To: quwenruo.btrfs, linux-btrfs > Well, I'd say Debian is really not your first choice for btrfs. > The kernel is really old for btrfs. > > My personal recommend is to use rolling release distribution like > vanilla Archlinux, whose kernel is already 4.18.7 now. I just upgraded to Debian Testing which has the 4.18 kernel > Anyway, enjoy your stable fs even it's not btrfs anymore. My new stable fs is too rigid. Can't grow it, can't shrink it, can't remove vdevs from it , so I'm planning a comeback to BTRFS. I guess after the dust settled I realize I like the flexibility of BTRFS. This time I'm considering BTRFS as rootfs as well, can I do an in-place conversion ? There's this guide (https://www.howtoforge.com/how-to-convert-an-ext3-ext4-root-file-system-to-btrfs-on-ubuntu-12.10) I was planning on following. Another thing is I'd like to see a "first steps after getting started " section in the wiki. Something like take your first snapshot, back up, how to think when running it - can i just set some cron jobs and forget about it, or does it need constant attention, and stuff like that. BR Adrian -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-17 11:55 ` Adrian Bastholm @ 2018-09-17 12:44 ` Qu Wenruo 2018-09-17 12:59 ` Stefan K 2018-09-20 17:23 ` Adrian Bastholm 0 siblings, 2 replies; 16+ messages in thread From: Qu Wenruo @ 2018-09-17 12:44 UTC (permalink / raw) To: Adrian Bastholm, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2547 bytes --] On 2018/9/17 下午7:55, Adrian Bastholm wrote: >> Well, I'd say Debian is really not your first choice for btrfs. >> The kernel is really old for btrfs. >> >> My personal recommend is to use rolling release distribution like >> vanilla Archlinux, whose kernel is already 4.18.7 now. > > I just upgraded to Debian Testing which has the 4.18 kernel Then I strongly recommend to use the latest upstream kernel and progs for btrfs. (thus using Debian Testing) And if anything went wrong, please report asap to the mail list. Especially for fs corruption, that's the ghost I'm always chasing for. So if any corruption happens again (although I hope it won't happen), I may have a chance to catch it. > >> Anyway, enjoy your stable fs even it's not btrfs anymore. > > My new stable fs is too rigid. Can't grow it, can't shrink it, can't > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess > after the dust settled I realize I like the flexibility of BTRFS. > > > This time I'm considering BTRFS as rootfs as well, can I do an > in-place conversion ? There's this guide > (https://www.howtoforge.com/how-to-convert-an-ext3-ext4-root-file-system-to-btrfs-on-ubuntu-12.10) > I was planning on following. Btrfs-convert is recommended mostly for short term trial (the ability to rollback to ext* without anything modified) From the code aspect, the biggest difference is the chunk layout. Due to the ext* block group usage, each block group header (except some sparse bg) is always used, thus btrfs can't use them. This leads to highly fragmented chunk layout. We doesn't have error report about such layout yet, but if you want everything to be as stable as possible, I still recommend to use a newly created fs. > > Another thing is I'd like to see a "first steps after getting started > " section in the wiki. Something like take your first snapshot, back > up, how to think when running it - can i just set some cron jobs and > forget about it, or does it need constant attention, and stuff like > that. There are projects do such things automatically, like snapper. If your primary concern is to make the fs as stable as possible, then keep snapshots to a minimal amount, avoid any functionality you won't use, like qgroup, routinely balance, RAID5/6. And keep the necessary btrfs specific operations to minimal, like subvolume/snapshot (and don't keep too many snapshots, say over 20), shrink, send/receive. Thanks, Qu > > BR Adrian > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-17 12:44 ` Qu Wenruo @ 2018-09-17 12:59 ` Stefan K 2018-09-20 17:23 ` Adrian Bastholm 1 sibling, 0 replies; 16+ messages in thread From: Stefan K @ 2018-09-17 12:59 UTC (permalink / raw) To: linux-btrfs > If your primary concern is to make the fs as stable as possible, then > keep snapshots to a minimal amount, avoid any functionality you won't > use, like qgroup, routinely balance, RAID5/6. > > And keep the necessary btrfs specific operations to minimal, like > subvolume/snapshot (and don't keep too many snapshots, say over 20), > shrink, send/receive. hehe, that sound like "hey use btrfs, its cool, but please - don't use any btrfs specific feature" ;) best Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-17 12:44 ` Qu Wenruo 2018-09-17 12:59 ` Stefan K @ 2018-09-20 17:23 ` Adrian Bastholm 2018-09-20 19:39 ` Chris Murphy 1 sibling, 1 reply; 16+ messages in thread From: Adrian Bastholm @ 2018-09-20 17:23 UTC (permalink / raw) To: quwenruo.btrfs, linux-btrfs On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > Then I strongly recommend to use the latest upstream kernel and progs > for btrfs. (thus using Debian Testing) > > And if anything went wrong, please report asap to the mail list. > > Especially for fs corruption, that's the ghost I'm always chasing for. > So if any corruption happens again (although I hope it won't happen), I > may have a chance to catch it. You got it > > > >> Anyway, enjoy your stable fs even it's not btrfs > > My new stable fs is too rigid. Can't grow it, can't shrink it, can't > > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess > > after the dust settled I realize I like the flexibility of BTRFS. > > I'm back to btrfs. > From the code aspect, the biggest difference is the chunk layout. > Due to the ext* block group usage, each block group header (except some > sparse bg) is always used, thus btrfs can't use them. > > This leads to highly fragmented chunk layout. The only thing I really understood is "highly fragmented" == not good . I might need to google these "chunk" thingies > We doesn't have error report about such layout yet, but if you want > everything to be as stable as possible, I still recommend to use a newly > created fs. I guess I'll stick with ext4 on the rootfs > > Another thing is I'd like to see a "first steps after getting started > > " section in the wiki. Something like take your first snapshot, back > > up, how to think when running it - can i just set some cron jobs and > > forget about it, or does it need constant attention, and stuff like > > that. > > There are projects do such things automatically, like snapper. > > If your primary concern is to make the fs as stable as possible, then > keep snapshots to a minimal amount, avoid any functionality you won't > use, like qgroup, routinely balance, RAID5/6. So, is RAID5 stable enough ? reading the wiki there's a big fat warning about some parity issues, I read an article about silent corruption (written a while back), and chris says he can't recommend raid56 to mere mortals. > And keep the necessary btrfs specific operations to minimal, like > subvolume/snapshot (and don't keep too many snapshots, say over 20), > shrink, send/receive. > > Thanks, > Qu > > > > > BR Adrian > > > > > -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-20 17:23 ` Adrian Bastholm @ 2018-09-20 19:39 ` Chris Murphy 2018-09-20 21:35 ` Adrian Bastholm 0 siblings, 1 reply; 16+ messages in thread From: Chris Murphy @ 2018-09-20 19:39 UTC (permalink / raw) To: Adrian Bastholm; +Cc: Qu Wenruo, Btrfs BTRFS On Thu, Sep 20, 2018 at 11:23 AM, Adrian Bastholm <adrian@javaguru.org> wrote: > On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> >> Then I strongly recommend to use the latest upstream kernel and progs >> for btrfs. (thus using Debian Testing) >> >> And if anything went wrong, please report asap to the mail list. >> >> Especially for fs corruption, that's the ghost I'm always chasing for. >> So if any corruption happens again (although I hope it won't happen), I >> may have a chance to catch it. > > You got it >> > >> >> Anyway, enjoy your stable fs even it's not btrfs > >> > My new stable fs is too rigid. Can't grow it, can't shrink it, can't >> > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess >> > after the dust settled I realize I like the flexibility of BTRFS. >> > > I'm back to btrfs. > >> From the code aspect, the biggest difference is the chunk layout. >> Due to the ext* block group usage, each block group header (except some >> sparse bg) is always used, thus btrfs can't use them. >> >> This leads to highly fragmented chunk layout. > > The only thing I really understood is "highly fragmented" == not good > . I might need to google these "chunk" thingies Chunks are synonyms with block groups. They're like a super extent, or extent of extents. The block group is how Btrfs abstracts the logical address used most everywhere in Btrfs land, and device + physical location of extents. It's how a file is referenced only by on logical address, and doesn't need to know either where the extent is located, or how many copies there are. The block group allocation profile is what determines if there's one copy, duplicate copies, raid1, 10, 5, 6 copies of a chunk and where the copies are located. It's also fundamental to how device add, remove, replace, file system resize, and balance all interrelate. >> If your primary concern is to make the fs as stable as possible, then >> keep snapshots to a minimal amount, avoid any functionality you won't >> use, like qgroup, routinely balance, RAID5/6. > > So, is RAID5 stable enough ? reading the wiki there's a big fat > warning about some parity issues, I read an article about silent > corruption (written a while back), and chris says he can't recommend > raid56 to mere mortals. Depends on how you define stable. In recent kernels it's stable on stable hardware, i.e. no lying hardware (actually flushes when it claims it has), no power failures, and no failed devices. Of course it's designed to help protect against a clear loss of a device, but there's tons of stuff here that's just not finished including ejecting bad devices from the array like md and lvm raids will do. Btrfs will just keep trying, through all the failures. There are some patches to moderate this but I don't think they're merged yet. You'd also want to be really familiar with how to handle degraded operation, if you're going to depend on it, and how to replace a bad device. Last I refreshed my memory on it, it's advised to use "btrfs device add" followed by "btrfs device remove" for raid56; whereas "btrfs replace" is preferred for all other profiles. I'm not sure if the "btrfs replace" issues with parity raid were fixed. Metadata as raid56 shows a lot more problem reports than metadata raid1, so there's something goofy going on in those cases. I'm not sure how well understood they are. But other people don't have problems with it. It's worth looking through the archives about some things. Btrfs raid56 isn't exactly perfectly COW, there is read-modify-write code that means there can be overwrites. I vaguely recall that it's COW in the logical layer, but the physical writes can end up being RMW or not for sure COW. -- Chris Murphy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-20 19:39 ` Chris Murphy @ 2018-09-20 21:35 ` Adrian Bastholm 2018-09-20 22:15 ` Chris Murphy ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Adrian Bastholm @ 2018-09-20 21:35 UTC (permalink / raw) To: lists, linux-btrfs Thanks a lot for the detailed explanation. Aabout "stable hardware/no lying hardware". I'm not running any raid hardware, was planning on just software raid. three drives glued together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would this be a safer bet, or would You recommend running the sausage method instead, with "-d single" for safety ? I'm guessing that if one of the drives dies the data is completely lost Another variant I was considering is running a raid1 mirror on two of the drives and maybe a subvolume on the third, for less important stuff BR Adrian On Thu, Sep 20, 2018 at 9:39 PM Chris Murphy <lists@colorremedies.com> wrote: > > On Thu, Sep 20, 2018 at 11:23 AM, Adrian Bastholm <adrian@javaguru.org> wrote: > > On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > >> > >> Then I strongly recommend to use the latest upstream kernel and progs > >> for btrfs. (thus using Debian Testing) > >> > >> And if anything went wrong, please report asap to the mail list. > >> > >> Especially for fs corruption, that's the ghost I'm always chasing for. > >> So if any corruption happens again (although I hope it won't happen), I > >> may have a chance to catch it. > > > > You got it > >> > > >> >> Anyway, enjoy your stable fs even it's not btrfs > > > >> > My new stable fs is too rigid. Can't grow it, can't shrink it, can't > >> > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess > >> > after the dust settled I realize I like the flexibility of BTRFS. > >> > > > I'm back to btrfs. > > > >> From the code aspect, the biggest difference is the chunk layout. > >> Due to the ext* block group usage, each block group header (except some > >> sparse bg) is always used, thus btrfs can't use them. > >> > >> This leads to highly fragmented chunk layout. > > > > The only thing I really understood is "highly fragmented" == not good > > . I might need to google these "chunk" thingies > > Chunks are synonyms with block groups. They're like a super extent, or > extent of extents. > > The block group is how Btrfs abstracts the logical address used most > everywhere in Btrfs land, and device + physical location of extents. > It's how a file is referenced only by on logical address, and doesn't > need to know either where the extent is located, or how many copies > there are. The block group allocation profile is what determines if > there's one copy, duplicate copies, raid1, 10, 5, 6 copies of a chunk > and where the copies are located. It's also fundamental to how device > add, remove, replace, file system resize, and balance all interrelate. > > > >> If your primary concern is to make the fs as stable as possible, then > >> keep snapshots to a minimal amount, avoid any functionality you won't > >> use, like qgroup, routinely balance, RAID5/6. > > > > So, is RAID5 stable enough ? reading the wiki there's a big fat > > warning about some parity issues, I read an article about silent > > corruption (written a while back), and chris says he can't recommend > > raid56 to mere mortals. > > Depends on how you define stable. In recent kernels it's stable on > stable hardware, i.e. no lying hardware (actually flushes when it > claims it has), no power failures, and no failed devices. Of course > it's designed to help protect against a clear loss of a device, but > there's tons of stuff here that's just not finished including ejecting > bad devices from the array like md and lvm raids will do. Btrfs will > just keep trying, through all the failures. There are some patches to > moderate this but I don't think they're merged yet. > > You'd also want to be really familiar with how to handle degraded > operation, if you're going to depend on it, and how to replace a bad > device. Last I refreshed my memory on it, it's advised to use "btrfs > device add" followed by "btrfs device remove" for raid56; whereas > "btrfs replace" is preferred for all other profiles. I'm not sure if > the "btrfs replace" issues with parity raid were fixed. > > Metadata as raid56 shows a lot more problem reports than metadata > raid1, so there's something goofy going on in those cases. I'm not > sure how well understood they are. But other people don't have > problems with it. > > It's worth looking through the archives about some things. Btrfs > raid56 isn't exactly perfectly COW, there is read-modify-write code > that means there can be overwrites. I vaguely recall that it's COW in > the logical layer, but the physical writes can end up being RMW or not > for sure COW. > > > > -- > Chris Murphy -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-20 21:35 ` Adrian Bastholm @ 2018-09-20 22:15 ` Chris Murphy 2018-09-20 22:21 ` Remi Gauvin 2018-09-22 6:49 ` Duncan 2 siblings, 0 replies; 16+ messages in thread From: Chris Murphy @ 2018-09-20 22:15 UTC (permalink / raw) To: Adrian Bastholm; +Cc: Chris Murphy, Btrfs BTRFS On Thu, Sep 20, 2018 at 3:36 PM Adrian Bastholm <adrian@javaguru.org> wrote: > > Thanks a lot for the detailed explanation. > Aabout "stable hardware/no lying hardware". I'm not running any raid > hardware, was planning on just software raid. Yep. I'm referring to the drives, their firmware, cables, logic board, its firmware, the power supply, power, etc. Btrfs is by nature intolerant of corruption. Other file systems are more tolerant because they don't know about it (although recent versions of XFS and ext4 are now defaulting to checksummed metadata and journals). >three drives glued > together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would > this be a safer bet, or would You recommend running the sausage method > instead, with "-d single" for safety ? I'm guessing that if one of the > drives dies the data is completely lost > Another variant I was considering is running a raid1 mirror on two of > the drives and maybe a subvolume on the third, for less important > stuff RAID does not substantially reduce the chances of data loss. It's not anything like a backup. It's an uptime enhancer. If you have backups, and your primary storage dies, of course you can restore from backup no problem, but it takes time and while the restore is happening, you're not online - uptime is killed. If that's a negative, might want to run RAID so you can keep working during the degraded period, and instead of a restore you're doing a rebuild. But of course there is a chance of failure during the degraded period. So you have to have a backup anyway. At least with Btrfs/ZFS, there is another reason to run with some replication like raid1 or raid5 and that's so that if there's corruption or a bad sector, Btrfs doesn't just detect it, it can fix it up with the good copy. For what it's worth, make sure the drives have lower SCT ERC time than the SCSI command timer. This is the same for Btrfs as it is for md and LVM RAID. The command timer default is 30 seconds, and most drives have SCT ERC disabled with very high recovery times well over 30 seconds. So either set SCT ERC to something like 70 deciseconds. Or increase the command timer to something like 120 or 180 (either one is absurdly high but what you want is for the drive to eventually give up and report a discrete error message which Btrfs can do something about, rather than do a SATA link reset in which case Btrfs can't do anything about it). -- Chris Murphy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-20 21:35 ` Adrian Bastholm 2018-09-20 22:15 ` Chris Murphy @ 2018-09-20 22:21 ` Remi Gauvin 2018-09-22 6:49 ` Duncan 2 siblings, 0 replies; 16+ messages in thread From: Remi Gauvin @ 2018-09-20 22:21 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1024 bytes --] On 2018-09-20 05:35 PM, Adrian Bastholm wrote: > Thanks a lot for the detailed explanation. > Aabout "stable hardware/no lying hardware". I'm not running any raid > hardware, was planning on just software raid. three drives glued > together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would > this be a safer bet, or would You recommend running the sausage method > instead, with "-d single" for safety ? I'm guessing that if one of the > drives dies the data is completely lost > Another variant I was considering is running a raid1 mirror on two of > the drives and maybe a subvolume on the third, for less important > stuff In case you were not aware, it's perfectly acceptable with BTRFS to use Raid 1 over 3 devices. Even more amazing, regardless of how many devices you start with, 2, 3, 4, whatever, you can add a single drive to the array to increase capacity, (at 50%, of course,, ie, adding a 4TB drive will give you 2TB usable space, assuming the other drives add up to at least 4TB to match it.) [-- Attachment #2: remi.vcf --] [-- Type: text/x-vcard, Size: 193 bytes --] begin:vcard fn:Remi Gauvin n:Gauvin;Remi org:Georgian Infotech adr:;;3-51 Sykes St. N.;Meaford;ON;N4L 1X3;Canada email;internet:remi@georgianit.com tel;work:226-256-1545 version:2.1 end:vcard ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-20 21:35 ` Adrian Bastholm 2018-09-20 22:15 ` Chris Murphy 2018-09-20 22:21 ` Remi Gauvin @ 2018-09-22 6:49 ` Duncan 2 siblings, 0 replies; 16+ messages in thread From: Duncan @ 2018-09-22 6:49 UTC (permalink / raw) To: linux-btrfs Adrian Bastholm posted on Thu, 20 Sep 2018 23:35:57 +0200 as excerpted: > Thanks a lot for the detailed explanation. > Aabout "stable hardware/no lying hardware". I'm not running any raid > hardware, was planning on just software raid. three drives glued > together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would > this be a safer bet, or would You recommend running the sausage method > instead, with "-d single" for safety ? I'm guessing that if one of the > drives dies the data is completely lost Another variant I was > considering is running a raid1 mirror on two of the drives and maybe a > subvolume on the third, for less important stuff Agreed with CMurphy's reply, but he didn't mention... As I wrote elsewhere recently, don't remember if it was in a reply to you before you tried zfs and came back, or to someone else, so I'll repeat here, briefer this time... Keep in mind that on btrfs, it's possible (and indeed the default with multiple devices) to run data and metadata at different raid levels. IMO, as long as you're following an appropriate backup policy that backs up anything valuable enough to be worth the time/trouble/resources of doing so, so if you /do/ lose the array you still have a backup of anything you considered valuable enough to worry about (and that caveat is always the case, no matter where or how it's stored, value of data is in practice defined not by arbitrary claims but by the number of backups it's considered worth having of it)... With that backups caveat, I'm now confident /enough/ about raid56 mode to be comfortable cautiously recommending it for data, tho I'd still /not/ recommend it for metadata, which I'd recommend should remain the multi- device default raid1 level. That way, you're only risking a limited amount of raid5 data to the not yet as mature and well tested raid56 mode, the metadata remains protected by the more mature raid1 mode, and if something does go wrong, it's much more likely to be only a few files lost instead of the entire filesystem, as is at risk if your metadata is raid56 as well, the metadata including checksums will be intact so scrub should tell you what files are bad, and if those few files are valuable they'll be on the backup and easy enough to restore, compared to restoring the entire filesystem. But for most use-cases, metadata should be relatively small compared to data, so duplicating metadata as raid1 while doing raid5 for data should go much easier on the capacity needs than raid1 for both would. Tho I'd still recommend raid1 data as well for higher maturity and tested ability to use the good copy to rewrite the bad one if one copy goes bad (in theory, raid56 mode can use parity to rewrite as well, but that's not yet as well tested and there's still the narrow degraded-mode crash write hole to worry about), if it's not cost-prohibitive for the amount of data you need to store. But for people on a really tight budget or who are storing double-digit TB of data or more, I can understand why they prefer raid5, and I do think raid5 is stable enough for data now, as long as the metadata remains raid1, AND they're actually executing on a good backup policy. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs problems 2018-09-16 13:58 btrfs problems Adrian Bastholm 2018-09-16 14:50 ` Qu Wenruo @ 2018-09-16 18:35 ` Chris Murphy [not found] ` <CAMrg+aQw-sjXpaff=cS6X2-CWDRfOy1f8orQsEsy48xrsuPe3g@mail.gmail.com> 1 sibling, 1 reply; 16+ messages in thread From: Chris Murphy @ 2018-09-16 18:35 UTC (permalink / raw) To: Adrian Bastholm; +Cc: Btrfs BTRFS On Sun, Sep 16, 2018 at 7:58 AM, Adrian Bastholm <adrian@javaguru.org> wrote: > Hello all > Actually I'm not trying to get any help any more, I gave up BTRFS on > the desktop, but I'd like to share my efforts of trying to fix my > problems, in hope I can help some poor noob like me. There's almost no useful information provided for someone to even try to reproduce your results, isolate cause and figure out the bugs. No kernel version. No btrfs-progs version. No description of the hardware and how it's laid out, and what mkfs and mount options are being used. No one really has the time to speculate. >BTRFS check --repair is not recommended Right. So why did you run it anyway? man btrfs check: Warning Do not use --repair unless you are advised to do so by a developer or an experienced user It is always a legitimate complaint, despite this warning, if btrfs check --repair makes things worse, because --repair shouldn't ever make things worse. But Btrfs repairs are complicated, and that's why the warning is there. I suppose the devs could have made the flag --riskyrepair but I doubt this would really slow users down that much. A big part of --repair fixes weren't known to make things worse at the time, and edge cases where it made things worse kept popping up, so only in hindsight does it make sense --repair maybe could have been called something different to catch the user's attention. But anyway, I see this same sort of thing on the linux-raid list all the time. People run into trouble, and they press full forward making all kinds of changes, each change increases the chance of data loss. And then they come on the list with WTF messages. And it's always a lesson in patience for the list regulars and developers... if only you'd come to us with questions sooner. > Please have a look at the console logs. These aren't logs. It's a record of shell commands. Logs would include kernel messages, ideally all of them. Why is device 3 missing? We have no idea. Most of Btrfs code is in the kernel, problems are reported by the kernel. So we need kernel messages, user space messages aren't enough. Anyway, good luck with openzfs, cool project. -- Chris Murphy ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAMrg+aQw-sjXpaff=cS6X2-CWDRfOy1f8orQsEsy48xrsuPe3g@mail.gmail.com>]
* Fwd: btrfs problems [not found] ` <CAMrg+aQw-sjXpaff=cS6X2-CWDRfOy1f8orQsEsy48xrsuPe3g@mail.gmail.com> @ 2018-09-16 20:12 ` Adrian Bastholm [not found] ` <CAJCQCtQPniv4eJSsbT24bma9Gv6_T44zoq9owSYmPNmKO7hXaA@mail.gmail.com> 1 sibling, 0 replies; 16+ messages in thread From: Adrian Bastholm @ 2018-09-16 20:12 UTC (permalink / raw) To: linux-btrfs Hi Chris > There's almost no useful information provided for someone to even try > to reproduce your results, isolate cause and figure out the bugs. I realize that. That's why I wasn't really asking for help, I was merely giving some feedback. > No kernel version. No btrfs-progs version. No description of the > hardware and how it's laid out, and what mkfs and mount options are > being used. No one really has the time to speculate. I understand, and I apologize. I could have added more detail. > > >BTRFS check --repair is not recommended > > Right. So why did you run it anyway? Because "repair" implies it does something to help you. That's how most people's brains work. My fs is broken. I'll try "REPAIR" > man btrfs check: > > Warning > Do not use --repair unless you are advised to do so by a > developer or an experienced user > > > It is always a legitimate complaint, despite this warning, if btrfs > check --repair makes things worse, because --repair shouldn't ever > make things worse. I don't think It made things worse. It's more like it didn't do anything. That's when I started trying to copy a new file to the file with the question mark attributes (lame, I know) to see what happens. The "corrupted" file suddenly had attributes, and so on. check --repair removed the extra files and left me at square one, so not worse. >But Btrfs repairs are complicated, and that's why > the warning is there. I suppose the devs could have made the flag > --riskyrepair but I doubt this would really slow users down that much. calling it --destructive or --deconstruct, or something even more scary would slow people down > A big part of --repair fixes weren't known to make things worse at the > time, and edge cases where it made things worse kept popping up, so > only in hindsight does it make sense --repair maybe could have been > called something different to catch the user's attention. Exactly. It's not too late to rename it. And maybe make it dump a filesystem report with everything a developer would need (within reason) to trace the error > But anyway, I see this same sort of thing on the linux-raid list all > the time. People run into trouble, and they press full forward making > all kinds of changes, each change increases the chance of data loss. > And then they come on the list with WTF messages. And it's always a > lesson in patience for the list regulars and developers... if only > you'd come to us with questions sooner. True. I found the list a bit late. I tried the IRC channel but I couldn' t post messages. > > Please have a look at the console logs. > > These aren't logs. It's a record of shell commands. Logs would include > kernel messages, ideally all of them. Why is device 3 missing? It was a RAID5 array of three drives. When doing btrfs check on two of the drives I got the drive x is missing. I figured that maybe it had to do something with which one was the "first" drive or something. The same way, btrfs-check crashed when I was running it against the drives where I got the "drive x missing" message > We have no idea. Most of Btrfs code is in the kernel, problems are reported by > the kernel. So we need kernel messages, user space messages aren't > enough. > Anyway, good luck with openzfs, cool project. Cool project, not so cool pitfalls. I might head back to BTRFS after all .. see the response to Qu. Thanks for answering, and sorry for the shortcomings of my feedback /A > > -- > Chris Murphy -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAJCQCtQPniv4eJSsbT24bma9Gv6_T44zoq9owSYmPNmKO7hXaA@mail.gmail.com>]
* Fwd: btrfs problems [not found] ` <CAJCQCtQPniv4eJSsbT24bma9Gv6_T44zoq9owSYmPNmKO7hXaA@mail.gmail.com> @ 2018-09-16 20:13 ` Adrian Bastholm 0 siblings, 0 replies; 16+ messages in thread From: Adrian Bastholm @ 2018-09-16 20:13 UTC (permalink / raw) To: linux-btrfs ... And also raid56 is still considered experimental, and has various problems if the hardware lies (like if some writes happen out of order or faster on some devices than others,and it's much harder to repair because the repair tools aren't raid56 feature complete). https://btrfs.wiki.kernel.org/index.php/Status I think it's less scary than "dangerous" or "unstable" but anyway, there are known problems unique to raid56 that will need future features to make it as reliable as single, raid1, raid10. And like any parity raid it sucks performance wise for random writes, especially when using hard drives. On Sun, Sep 16, 2018 at 1:40 PM, Adrian Bastholm <adrian@javaguru.org> wrote: > Hi Chris >> There's almost no useful information provided for someone to even try >> to reproduce your results, isolate cause and figure out the bugs. > I realize that. That's why I wasn't really asking for help, I was > merely giving some feedback. > >> No kernel version. No btrfs-progs version. No description of the >> hardware and how it's laid out, and what mkfs and mount options are >> being used. No one really has the time to speculate. > > I understand, and I apologize. I could have added more detail. > >> >> >BTRFS check --repair is not recommended >> >> Right. So why did you run it anyway? > > Because "repair" implies it does something to help you. That's how > most people's brains work. My fs is broken. I'll try "REPAIR" > > >> man btrfs check: >> >> Warning >> Do not use --repair unless you are advised to do so by a >> developer or an experienced user >> >> >> It is always a legitimate complaint, despite this warning, if btrfs >> check --repair makes things worse, because --repair shouldn't ever >> make things worse. > > I don't think It made things worse. It's more like it didn't do > anything. That's when I started trying to copy a new file to the file > with the question mark attributes (lame, I know) to see what happens. > The "corrupted" file suddenly had attributes, and so on. > check --repair removed the extra files and left me at square one, so not worse. > >>But Btrfs repairs are complicated, and that's why >> the warning is there. I suppose the devs could have made the flag >> --riskyrepair but I doubt this would really slow users down that much. > > calling it --destructive or --deconstruct, or something even more > scary would slow people down > >> A big part of --repair fixes weren't known to make things worse at the >> time, and edge cases where it made things worse kept popping up, so >> only in hindsight does it make sense --repair maybe could have been >> called something different to catch the user's attention. > > Exactly. It's not too late to rename it. And maybe make it dump a > filesystem report with everything a developer would need (within > reason) to trace the error > >> But anyway, I see this same sort of thing on the linux-raid list all >> the time. People run into trouble, and they press full forward making >> all kinds of changes, each change increases the chance of data loss. >> And then they come on the list with WTF messages. And it's always a >> lesson in patience for the list regulars and developers... if only >> you'd come to us with questions sooner. > > True. I found the list a bit late. I tried the IRC channel but I > couldn' t post messages. > >> > Please have a look at the console logs. >> >> These aren't logs. It's a record of shell commands. Logs would include >> kernel messages, ideally all of them. Why is device 3 missing? > > It was a RAID5 array of three drives. When doing btrfs check on two of > the drives I got the drive x is missing. I figured that maybe it had > to do something with which one was the "first" drive or something. The > same way, btrfs-check crashed when I was running it against the drives > where I got the "drive x missing" message > > >> We have no idea. Most of Btrfs code is in the kernel, problems are reported by >> the kernel. So we need kernel messages, user space messages aren't >> enough. > >> Anyway, good luck with openzfs, cool project. > Cool project, not so cool pitfalls. I might head back to BTRFS after > all .. see the response to Qu. > > Thanks for answering, and sorry for the shortcomings of my feedback > /A > >> >> -- >> Chris Murphy > > > > -- > Vänliga hälsningar / Kind regards, > Adrian Bastholm > > ``I would change the world, but they won't give me the sourcecode`` -- Chris Murphy -- Vänliga hälsningar / Kind regards, Adrian Bastholm ``I would change the world, but they won't give me the sourcecode`` ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-09-22 12:43 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-16 13:58 btrfs problems Adrian Bastholm 2018-09-16 14:50 ` Qu Wenruo [not found] ` <CAMrg+aTNK1cBG7rGVfudpydD6hMJz9UW0-3mdS8Yx4tqAQZE6Q@mail.gmail.com> 2018-09-16 20:11 ` Fwd: " Adrian Bastholm 2018-09-16 20:54 ` Chris Murphy [not found] ` <ecac52ad-70ed-e0e3-5660-1717f0d4f5e0@gmx.com> 2018-09-17 11:55 ` Adrian Bastholm 2018-09-17 12:44 ` Qu Wenruo 2018-09-17 12:59 ` Stefan K 2018-09-20 17:23 ` Adrian Bastholm 2018-09-20 19:39 ` Chris Murphy 2018-09-20 21:35 ` Adrian Bastholm 2018-09-20 22:15 ` Chris Murphy 2018-09-20 22:21 ` Remi Gauvin 2018-09-22 6:49 ` Duncan 2018-09-16 18:35 ` Chris Murphy [not found] ` <CAMrg+aQw-sjXpaff=cS6X2-CWDRfOy1f8orQsEsy48xrsuPe3g@mail.gmail.com> 2018-09-16 20:12 ` Fwd: " Adrian Bastholm [not found] ` <CAJCQCtQPniv4eJSsbT24bma9Gv6_T44zoq9owSYmPNmKO7hXaA@mail.gmail.com> 2018-09-16 20:13 ` Adrian Bastholm
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.