* Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing @ 2016-01-24 17:52 Tom Hunt 2016-01-24 20:34 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Tom Hunt @ 2016-01-24 17:52 UTC (permalink / raw) To: linux-btrfs I've been running for a week or two using a single-drive 6TB btrfs volume. For some of this time, the machine running had bad memory, which led to various checksum errors. For most of these, I just deleted the relevant file and reacquired it (the errors fortuitously never occurring in files which were not easily replaceable). However, there currently remains a single error which does not appear to be in any file: # btrfs scrub status / scrub status for 85f5b744-f68c-4194-aa90-d6fe238115a3 scrub started at Fri Jan 22 09:49:02 2016 and finished after 11:55:08 total bytes scrubbed: 4.27TiB with 1 errors error details: csum=1 corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 # dmesg (...) [52841.310422] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 [52841.335656] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 [95071.256448] BTRFS: bdev /dev/mapper/rootvol_1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0 [95071.256532] BTRFS: unable to fixup (regular) error at logical 4450167468032 on dev /dev/mapper/rootvol_1 I've searched for ino 515, and the file there does not have any apparent error (can read the whole thing without problem; deleting and recreating it does not make the error go away). The error is, of course, uncorrectable, because it's a single-drive volume. However, having put in a second drive, the balance filter to convert to raid1 fails because of the I/O error. How do I deal with this? -- Tom Hunt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 17:52 Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing Tom Hunt @ 2016-01-24 20:34 ` Chris Murphy 2016-01-24 20:58 ` Tom Hunt 0 siblings, 1 reply; 12+ messages in thread From: Chris Murphy @ 2016-01-24 20:34 UTC (permalink / raw) To: Tom Hunt; +Cc: Btrfs BTRFS On Sun, Jan 24, 2016 at 10:52 AM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > I've been running for a week or two using a single-drive 6TB btrfs > volume. For some of this time, the machine running had bad memory, > which led to various checksum errors. For most of these, I just > deleted the relevant file and reacquired it (the errors fortuitously > never occurring in files which were not easily replaceable). However, > there currently remains a single error which does not appear to be in > any file: > > # btrfs scrub status / > scrub status for 85f5b744-f68c-4194-aa90-d6fe238115a3 > scrub started at Fri Jan 22 09:49:02 2016 and finished after 11:55:08 > total bytes scrubbed: 4.27TiB with 1 errors > error details: csum=1 > corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 > > # dmesg > (...) > [52841.310422] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 > [52841.335656] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 > [95071.256448] BTRFS: bdev /dev/mapper/rootvol_1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0 > [95071.256532] BTRFS: unable to fixup (regular) error at logical 4450167468032 on dev /dev/mapper/rootvol_1 > > I've searched for ino 515, and the file there does not have any > apparent error (can read the whole thing without problem; deleting and > recreating it does not make the error go away). The error is, of > course, uncorrectable, because it's a single-drive volume. However, > having put in a second drive, the balance filter to convert to raid1 > fails because of the I/O error. You delete the file and yet the scrub still says inode 515 exists and has an error? Or there are no errors, but then after copying the same file back to the volume, the problem reoccurs? Are there any snapshots or subvolumes? Because if there are any subvolumes/snapshots, each is its own fs tree with its own set of inodes. So an inode can be used more than once for different files so I wonder off hand if you haven't found the actual problematic file. Or possibly it's a directory, and not a file. # find /brick2 -inum 60724 On my system, it returns six results, four are files, two of which are in common to the others due to snapshotting, and two are directories one of which is also a snapshot of the other. So a single inode can not only appear multiple times on a Btrfs volume, but can be pointing to a file or a directory. The scrub not saying what the file path is suggests it could be a directory. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 20:34 ` Chris Murphy @ 2016-01-24 20:58 ` Tom Hunt 2016-01-24 21:07 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Tom Hunt @ 2016-01-24 20:58 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS > You delete the file and yet the scrub still says inode 515 exists and > has an error? Or there are no errors, but then after copying the same > file back to the volume, the problem reoccurs? Are there any snapshots > or subvolumes? Because if there are any subvolumes/snapshots, each is > its own fs tree with its own set of inodes. So an inode can be used > more than once for different files so I wonder off hand if you haven't > found the actual problematic file. > > Or possibly it's a directory, and not a file. There are no snapshots, but there are subvolumes. I did the same procedure on the file at inode 515 in each subvolume, which was: # cp $file ~ # rm $file # mv ~/$file {old_file_path} This concluded without any errors. After doing this, the inode number is different, and 'find / -inum 515' no longer finds anything on either subvolume. However, initiating a scrub after this still shows the error at ino 515. On Sun, Jan 24, 2016 at 01:34:20PM -0700, Chris Murphy wrote: > On Sun, Jan 24, 2016 at 10:52 AM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > > I've been running for a week or two using a single-drive 6TB btrfs > > volume. For some of this time, the machine running had bad memory, > > which led to various checksum errors. For most of these, I just > > deleted the relevant file and reacquired it (the errors fortuitously > > never occurring in files which were not easily replaceable). However, > > there currently remains a single error which does not appear to be in > > any file: > > > > # btrfs scrub status / > > scrub status for 85f5b744-f68c-4194-aa90-d6fe238115a3 > > scrub started at Fri Jan 22 09:49:02 2016 and finished after 11:55:08 > > total bytes scrubbed: 4.27TiB with 1 errors > > error details: csum=1 > > corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 > > > > # dmesg > > (...) > > [52841.310422] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 > > [52841.335656] BTRFS warning (device dm-0): csum failed ino 515 off 15118336 csum 2629660496 expected csum 54021641 > > [95071.256448] BTRFS: bdev /dev/mapper/rootvol_1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0 > > [95071.256532] BTRFS: unable to fixup (regular) error at logical 4450167468032 on dev /dev/mapper/rootvol_1 > > > > I've searched for ino 515, and the file there does not have any > > apparent error (can read the whole thing without problem; deleting and > > recreating it does not make the error go away). The error is, of > > course, uncorrectable, because it's a single-drive volume. However, > > having put in a second drive, the balance filter to convert to raid1 > > fails because of the I/O error. > > # find /brick2 -inum 60724 > > On my system, it returns six results, four are files, two of which are > in common to the others due to snapshotting, and two are directories > one of which is also a snapshot of the other. So a single inode can > not only appear multiple times on a Btrfs volume, but can be pointing > to a file or a directory. The scrub not saying what the file path is > suggests it could be a directory. > > > -- > Chris Murphy -- Tom Hunt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 20:58 ` Tom Hunt @ 2016-01-24 21:07 ` Chris Murphy 2016-01-24 21:17 ` Tom Hunt 0 siblings, 1 reply; 12+ messages in thread From: Chris Murphy @ 2016-01-24 21:07 UTC (permalink / raw) To: Tom Hunt; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Jan 24, 2016 at 1:58 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: >> You delete the file and yet the scrub still says inode 515 exists and >> has an error? Or there are no errors, but then after copying the same >> file back to the volume, the problem reoccurs? Are there any snapshots >> or subvolumes? Because if there are any subvolumes/snapshots, each is >> its own fs tree with its own set of inodes. So an inode can be used >> more than once for different files so I wonder off hand if you haven't >> found the actual problematic file. >> >> Or possibly it's a directory, and not a file. > > There are no snapshots, but there are subvolumes. I did the same procedure on > the file at inode 515 in each subvolume, which was: > > # cp $file ~ > # rm $file > # mv ~/$file {old_file_path} > > This concluded without any errors. After doing this, the inode number is > different, and 'find / -inum 515' no longer finds anything on either subvolume. > However, initiating a scrub after this still shows the error at ino 515. Try pointing the find command at the mountpoint, rather than at a subvolume. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 21:07 ` Chris Murphy @ 2016-01-24 21:17 ` Tom Hunt 2016-01-24 22:04 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Tom Hunt @ 2016-01-24 21:17 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS My arrangement is two subvolumes, 'root' and 'home', directly under the root subvolume. The 'root' subvolume is mounted on /, the 'home' subvolume on /home. I've tried both pointing the find command at /, and mounting the root subvolume on a mount directory and pointing it there. Neither show any file or directory with inum 515. On Sun, Jan 24, 2016 at 02:07:58PM -0700, Chris Murphy wrote: > On Sun, Jan 24, 2016 at 1:58 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > >> You delete the file and yet the scrub still says inode 515 exists and > >> has an error? Or there are no errors, but then after copying the same > >> file back to the volume, the problem reoccurs? Are there any snapshots > >> or subvolumes? Because if there are any subvolumes/snapshots, each is > >> its own fs tree with its own set of inodes. So an inode can be used > >> more than once for different files so I wonder off hand if you haven't > >> found the actual problematic file. > >> > >> Or possibly it's a directory, and not a file. > > > > There are no snapshots, but there are subvolumes. I did the same procedure on > > the file at inode 515 in each subvolume, which was: > > > > # cp $file ~ > > # rm $file > > # mv ~/$file {old_file_path} > > > > This concluded without any errors. After doing this, the inode number is > > different, and 'find / -inum 515' no longer finds anything on either subvolume. > > However, initiating a scrub after this still shows the error at ino 515. > > Try pointing the find command at the mountpoint, rather than at a subvolume. > > > -- > Chris Murphy -- Tom Hunt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 21:17 ` Tom Hunt @ 2016-01-24 22:04 ` Chris Murphy 2016-01-24 22:16 ` Tom Hunt 0 siblings, 1 reply; 12+ messages in thread From: Chris Murphy @ 2016-01-24 22:04 UTC (permalink / raw) To: Tom Hunt; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Jan 24, 2016 at 2:17 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > My arrangement is two subvolumes, 'root' and 'home', directly under the root > subvolume. The 'root' subvolume is mounted on /, the 'home' subvolume on /home. > I've tried both pointing the find command at /, and mounting the root subvolume > on a mount directory and pointing it there. Neither show any file or directory > with inum 515. Inode 515 isn't special as far as I can tell. That you get zero results for 'find <mp> -inum 515' and yet it appears with an error during a scrub is weird. What do you get for btrfs check without --repair? -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 22:04 ` Chris Murphy @ 2016-01-24 22:16 ` Tom Hunt 2016-01-24 22:45 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Tom Hunt @ 2016-01-24 22:16 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS I get "currently mounted, aborting". If I must bring down the machine over this, I can, but I'd prefer not to. On Sun, Jan 24, 2016 at 03:04:25PM -0700, Chris Murphy wrote: > On Sun, Jan 24, 2016 at 2:17 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > > My arrangement is two subvolumes, 'root' and 'home', directly under the root > > subvolume. The 'root' subvolume is mounted on /, the 'home' subvolume on /home. > > I've tried both pointing the find command at /, and mounting the root subvolume > > on a mount directory and pointing it there. Neither show any file or directory > > with inum 515. > > Inode 515 isn't special as far as I can tell. That you get zero > results for 'find <mp> -inum 515' and yet it appears with an error > during a scrub is weird. > > What do you get for btrfs check without --repair? > > > > -- > Chris Murphy -- Tom Hunt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 22:16 ` Tom Hunt @ 2016-01-24 22:45 ` Chris Murphy 2016-01-25 1:06 ` Chris Murphy ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Chris Murphy @ 2016-01-24 22:45 UTC (permalink / raw) To: Tom Hunt; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Jan 24, 2016 at 3:16 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > I get "currently mounted, aborting". > > If I must bring down the machine over this, I can, but I'd prefer not to. Now is a good time to refresh its backup, while it's online. It's also a good idea to maintain readonly snapshots of each subvolume you want to keep in case you need to depend on using btrfs send/receive to move the data to a new file system, since only ro snapshots can be used for send/receive. Maybe it's an orphaned item. I don't know much about those, whether btrfs check finds or removes them. But you can safely use btrfs-debug-tree while the fs is mounted. btrfs-debug-tree <dev> | grep -A3 -B3 ORPHAN That'd find object type ORPHAN and item type ORPHAN_ITEM. Again, no idea what to do about it if either are found, and matches inode 515. You could also do a search using btrfs-debug-tree <dev> | grep -A3 -B3 "(515 " and see if that reveals anything. While the fs has to be umounted for this, the --init-csum-tree option for btrfs check will obliterate the current csum tree (and all file csums) and then compute new csums for everything. That might fix it, at least if it's a stuck orphan item then this ought to give it a valid csum so you can proceed with conversion. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 22:45 ` Chris Murphy @ 2016-01-25 1:06 ` Chris Murphy 2016-01-25 1:28 ` Duncan 2016-01-25 3:21 ` Tom Hunt 2 siblings, 0 replies; 12+ messages in thread From: Chris Murphy @ 2016-01-25 1:06 UTC (permalink / raw) To: Btrfs BTRFS On Sun, Jan 24, 2016 at 3:45 PM, Chris Murphy <lists@colorremedies.com> wrote: > Maybe it's an orphaned item. I don't know much about those, whether > btrfs check finds or removes them. But you can safely use > btrfs-debug-tree while the fs is mounted. > > btrfs-debug-tree <dev> | grep -A3 -B3 ORPHAN > > That'd find object type ORPHAN and item type ORPHAN_ITEM. OK I guess that's not implemented yet, nevermind. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 22:45 ` Chris Murphy 2016-01-25 1:06 ` Chris Murphy @ 2016-01-25 1:28 ` Duncan 2016-01-25 3:21 ` Tom Hunt 2 siblings, 0 replies; 12+ messages in thread From: Duncan @ 2016-01-25 1:28 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Sun, 24 Jan 2016 15:45:49 -0700 as excerpted: > Maybe it's an orphaned item. I don't know much about those, whether > btrfs check finds or removes them. But you can safely use > btrfs-debug-tree while the fs is mounted. > > btrfs-debug-tree <dev> | grep -A3 -B3 ORPHAN > > That'd find object type ORPHAN and item type ORPHAN_ITEM. Again, no idea > what to do about it if either are found, and matches inode 515. > You could also do a search using > > btrfs-debug-tree <dev> | grep -A3 -B3 "(515 " Orphans, in POSIX filesystem context, are files that are open, but deleted. Think shared object libraries (*.so*) that are loaded by a running program at the time they're updated, in which context it's a security issue as well since some of those updates might have been security related and existing processes continue to use the old, vulnerable versions until they are restarted. These files can be logically deleted so nothing else can access them, but the filesystem can't remove all references to them until all running programs holding them open have closed them. That's what makes them orphan -- nothing new can access them, but existing references mean they can't yet be fully removed from the filesystem, either. Typically such orphans block unmounting or remounting the filesystem readonly (unless it's forced read-only by filesystem error or kernel emergency-SRQ sequence, etc, which override things and locks orphans in their existing state on the filesystem, just as a crash would, except it doesn't necessarily immediately crash, tho a livelock often follows pretty quickly). There are various tools available that can find these files and tell you which processes are holding them open and/or kill or restart those processes in ordered to release the locks and close the files, allowing them to be fully deleted. I routinely use one after updates here called lib_users, which simply tells me what processes are holding open these deleted files, so I can decide whether I want to restart them, kill them, or simply ignore the problem until later. In a system reboot (or simple descent to single user or systemd emergency mode to stop most services, systemd itself can then be reexecuted using systemctl daemon-reexec, if it's holding open deleted files as well, before returning to normal mode), normal processes and system services are stopped, releasing these files, so the filesystem can clean them up before unmount or remount-readonly. (Here, my root, including /usr, remains mounted read-only by default, only being mounted writable for updates. Thus my routine use of lib_users to find services I need to restart, or if there's too many, decide I want to temporarily go to systemd emergency mode, before returning to normal mode, releasing all these deleted files in the process, so I can cleanly remount / read-only once again. That of course is the reason I know a bit more about this than many, since it's part of my update routine now.) Of course in an unclean shutdown situation, the filesystem will not have had the chance to clean up the orphans before the umount or remount- readonly, so these orphans remain around at reboot and must be cleaned up then. Btrfs at least, does this automatically, so btrfs check need not be run to do it, as the check has to be done on an unmounted filesystem, and at mount, btrfs will handle it, itself. However, btrfs check may still clean them up as well (I'm not sure), simply to avoid other issues the orphan files might have that are easiest eliminated by simply eliminating the files, since that's what would happen on mount anyway. If it's orphan files, just quitting X (or logging out and back in if you use a *DM graphical login, I don't, preferring to login at the CLI and run startx) tends to release a lot of them, no reboot needed. But sometimes quitting X and restarting other services holding open deleted files may be necessary as well, and if there's enough such services, descending to emergency/single-user mode, and possibly restarting init/ systemd itself, if it is holding open any such files, may be necessary. But a full reboot isn't normally needed. Tho of course restarting X, and descending to emergency/single-user-mode even more so, can in practice be as bad as having to reboot. Going to emergency/single-user especially, pretty much all you save in practice is your uptime statistics and caches that would be lost on a full reboot. But if you're running over a year uptime (yeah, right, on a still not fully stable btrfs where the recommendation is to keep to reasonably current kernels, hardly the stuff of year-plus uptimes!), or have spinning rust and lots of RAM and thus cache that would have to be reloaded from the slow spinning rust, going single-user to avoid dropping them can be well worth the hassle over simply rebooting. Meanwhile, if it /is/ an orphan file, it's an interesting situation, since that means the checksum was fine and btrfs was able to load the file when whatever is holding it open/orphan was started, but that isn't the case now. Which means we have evidence of near-real-time (well, depending on how long the file has been open) block decay and checksum invalidation, which has serious implications in terms of storage device health. Be sure to run smartctl -AH, or similar, to see what's going on, and track it closely for awhile, as you may well have a failing device! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-24 22:45 ` Chris Murphy 2016-01-25 1:06 ` Chris Murphy 2016-01-25 1:28 ` Duncan @ 2016-01-25 3:21 ` Tom Hunt 2016-01-25 5:58 ` Chris Murphy 2 siblings, 1 reply; 12+ messages in thread From: Tom Hunt @ 2016-01-25 3:21 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS The only output from btrfs-debug-tree containing ORPHAN: item 151 key (ORPHAN ORPHAN_ITEM 150415) itemoff 4904 itemsize 0 orphan item item 152 key (ORPHAN ORPHAN_ITEM 150416) itemoff 4904 itemsize 0 orphan item item 153 key (ORPHAN ORPHAN_ITEM 175228) itemoff 4904 itemsize 0 orphan item item 154 key (ORPHAN ORPHAN_ITEM 175229) itemoff 4904 itemsize 0 orphan item Given the later talk about orphans, I don't think this is an orphan; the disks are new, and I know the checksum error issues came from running with bad RAM, which has since been replaced. The output referring to 515: item 49 key (515 INODE_ITEM 0) itemoff 10851 itemsize 160 inode generation 132 transid 132 size 262144 nbytes 4194304 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x1b item 50 key (515 EXTENT_DATA 0) itemoff 10798 itemsize 53 extent data disk byte 603428720640 nr 262144 extent data offset 0 nr 262144 ram 262144 extent compression 0 Incidentally, btrfs-debug-tree produced about 1.5G of text output; I'm not sure whether that's normal for a 6TB volume with ~4TB used, or what. On Sun, Jan 24, 2016 at 03:45:49PM -0700, Chris Murphy wrote: > On Sun, Jan 24, 2016 at 3:16 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > > I get "currently mounted, aborting". > > > > If I must bring down the machine over this, I can, but I'd prefer not to. > > Now is a good time to refresh its backup, while it's online. It's also > a good idea to maintain readonly snapshots of each subvolume you want > to keep in case you need to depend on using btrfs send/receive to move > the data to a new file system, since only ro snapshots can be used for > send/receive. > > Maybe it's an orphaned item. I don't know much about those, whether > btrfs check finds or removes them. But you can safely use > btrfs-debug-tree while the fs is mounted. > > btrfs-debug-tree <dev> | grep -A3 -B3 ORPHAN > > That'd find object type ORPHAN and item type ORPHAN_ITEM. Again, no > idea what to do about it if either are found, and matches inode 515. > You could also do a search using > > btrfs-debug-tree <dev> | grep -A3 -B3 "(515 " > > and see if that reveals anything. > > While the fs has to be umounted for this, the --init-csum-tree option > for btrfs check will obliterate the current csum tree (and all file > csums) and then compute new csums for everything. That might fix it, > at least if it's a stuck orphan item then this ought to give it a > valid csum so you can proceed with conversion. > > > > -- > Chris Murphy -- Tom Hunt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing 2016-01-25 3:21 ` Tom Hunt @ 2016-01-25 5:58 ` Chris Murphy 0 siblings, 0 replies; 12+ messages in thread From: Chris Murphy @ 2016-01-25 5:58 UTC (permalink / raw) To: Tom Hunt; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Jan 24, 2016 at 8:21 PM, Tom Hunt <tomdicksonhunt@gmail.com> wrote: > The only output from btrfs-debug-tree containing ORPHAN: > > item 151 key (ORPHAN ORPHAN_ITEM 150415) itemoff 4904 itemsize 0 > orphan item > item 152 key (ORPHAN ORPHAN_ITEM 150416) itemoff 4904 itemsize 0 > orphan item > item 153 key (ORPHAN ORPHAN_ITEM 175228) itemoff 4904 itemsize 0 > orphan item > item 154 key (ORPHAN ORPHAN_ITEM 175229) itemoff 4904 itemsize 0 > orphan item > > Given the later talk about orphans, I don't think this is an orphan; the disks > are new, and I know the checksum error issues came from running with bad RAM, > which has since been replaced. > > The output referring to 515: > > item 49 key (515 INODE_ITEM 0) itemoff 10851 itemsize 160 > inode generation 132 transid 132 size 262144 nbytes 4194304 > block group 0 mode 100600 links 1 uid 0 gid 0 > rdev 0 flags 0x1b > item 50 key (515 EXTENT_DATA 0) itemoff 10798 itemsize 53 > extent data disk byte 603428720640 nr 262144 > extent data offset 0 nr 262144 ram 262144 > extent compression 0 I'm kinda out of ideas. A full balance at least is an online process, maybe it gets rid of the phantom missing inode reference. It'd be interesting to see what 'btrfs check' shows. If you decide to try --init-csum-tree, I'd first refresh a backup, and make current ro snapshots of things you'd send/receive, should it become necessary to create a new file system. > > Incidentally, btrfs-debug-tree produced about 1.5G of text output; I'm not sure > whether that's normal for a 6TB volume with ~4TB used, or what. Yeah, very loose estimate is a btrfs-debug-tree output to a file is 1/4 of the metadata size reported by 'fi usage/df' although it could be much smaller if if you have a lot of small files with inline data extents. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-01-25 5:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-24 17:52 Chicken-egg: uncorrectable checksum error prevents RAID1 rebalancing Tom Hunt 2016-01-24 20:34 ` Chris Murphy 2016-01-24 20:58 ` Tom Hunt 2016-01-24 21:07 ` Chris Murphy 2016-01-24 21:17 ` Tom Hunt 2016-01-24 22:04 ` Chris Murphy 2016-01-24 22:16 ` Tom Hunt 2016-01-24 22:45 ` Chris Murphy 2016-01-25 1:06 ` Chris Murphy 2016-01-25 1:28 ` Duncan 2016-01-25 3:21 ` Tom Hunt 2016-01-25 5:58 ` Chris Murphy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.