* RAID1 filesystem not mounting @ 2019-02-02 4:28 Alan Hardman 2019-02-02 9:59 ` Bernhard K ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Alan Hardman @ 2019-02-02 4:28 UTC (permalink / raw) To: linux-btrfs I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's failing to mount. I've tried the common recommended safe check options, but I haven't gotten the disk to mount at all, even with -o ro,recovery. If necessary, I can try to use the recovery to another filesystem, but I have around 18 TB of data on the filesystem that won't mount, so I'd like to avoid that if there's some other way of recovering it. Versions: btrfs-progs v4.19.1 Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 2019 x86_64 GNU/Linux Based on my understanding of how RAID1 works with Btrfs, I would expect a single disk failure to not prevent the volume from mounting entirely, but I'm only seeing one disk with errors according to dmesg output, maybe I'm misinterpreting it: [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 'usebackuproot' instead [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount time [ 534.519443] BTRFS info (device sdd): disk space caching is enabled [ 534.519446] BTRFS info (device sdd): has skinny extents [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 22208378, flush 1, corrupt 29486730, gen 2933 [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 [ 556.134767] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 [ 556.150278] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 [ 556.150310] BTRFS error (device sdd): failed to read block groups: -5 [ 556.216418] BTRFS error (device sdd): open_ctree failed If helpful, here is some lsblk output: NAME TYPE SIZE FSTYPE MOUNTPOINT UUID sda disk 111.8G ├─sda1 part 1.9M └─sda2 part 111.8G ext4 / c598dfdf-d6e7-47d3-888a-10f5f53fa338 sdb disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b sdc disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b sdd disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b sde disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b sdf disk 2.7T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b sdh disk 2.7T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b My main system partition on sda mounts fine and is usable to work with the btrfs filesystem that's having issues. Running "btrfs check /dev/sdb" exits with this: Opening filesystem to check... Incorrect offsets 13898 13882 ERROR: cannot open file system Also, "btrfs restore -Dv /dev/sdb /tmp" outputs some of the files on the filesystem but not all of them. I'm not sure if this is limited to the files on that physical disk, or if there's a bigger issue with the filesystem. I'm not sure what the best approach from here is, so any advice would be great. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-02 4:28 RAID1 filesystem not mounting Alan Hardman @ 2019-02-02 9:59 ` Bernhard K 2019-02-02 12:01 ` Hugo Mills 2019-02-03 0:18 ` Chris Murphy 2 siblings, 0 replies; 7+ messages in thread From: Bernhard K @ 2019-02-02 9:59 UTC (permalink / raw) To: linux-btrfs On 02.02.2019 05:28 Alan Hardman wrote: > Also, "btrfs restore -Dv /dev/sdb /tmp" outputs some of the files on the filesystem but not all of them. I'm not sure if this is limited to the files on that physical disk, or if there's a bigger issue with the filesystem. I'm not sure what the best approach from here is, so any advice would be great. You could check if some of the older tree roots yield a better result, as described in https://btrfs.wiki.kernel.org/index.php/Restore#Advanced_usage. In my case I had to go back 2 generations to get a suitable file list. I am not an expert though and am only recommending this as to my understanding, btrfs restore and btrfs-find-root are non-destructive to the original file system. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-02 4:28 RAID1 filesystem not mounting Alan Hardman 2019-02-02 9:59 ` Bernhard K @ 2019-02-02 12:01 ` Hugo Mills 2019-02-03 0:26 ` Chris Murphy 2019-02-03 0:18 ` Chris Murphy 2 siblings, 1 reply; 7+ messages in thread From: Hugo Mills @ 2019-02-02 12:01 UTC (permalink / raw) To: Alan Hardman; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3840 bytes --] On Fri, Feb 01, 2019 at 11:28:27PM -0500, Alan Hardman wrote: > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's failing to mount. I've tried the common recommended safe check options, but I haven't gotten the disk to mount at all, even with -o ro,recovery. If necessary, I can try to use the recovery to another filesystem, but I have around 18 TB of data on the filesystem that won't mount, so I'd like to avoid that if there's some other way of recovering it. > > Versions: > btrfs-progs v4.19.1 > Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 2019 x86_64 GNU/Linux > > Based on my understanding of how RAID1 works with Btrfs, I would expect a single disk failure to not prevent the volume from mounting entirely, but I'm only seeing one disk with errors according to dmesg output, maybe I'm misinterpreting it: > > [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 'usebackuproot' instead > [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount time > [ 534.519443] BTRFS info (device sdd): disk space caching is enabled > [ 534.519446] BTRFS info (device sdd): has skinny extents > [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 22208378, flush 1, corrupt 29486730, gen 2933 > [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 It's worth noting that 13898-13882 = 16, which is a power of two. This means that you most likely have a single-bit error in your metadata. That, plus the checksum not being warned about, would strongly suggest that you have bad RAM. I would recommend that you check your RAM first before trying anything else that would write to your filesystem (including btrfs check --repair). Hugo. > [ 556.134767] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > [ 556.150278] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > [ 556.150310] BTRFS error (device sdd): failed to read block groups: -5 > [ 556.216418] BTRFS error (device sdd): open_ctree failed > > If helpful, here is some lsblk output: > > NAME TYPE SIZE FSTYPE MOUNTPOINT UUID > sda disk 111.8G > ├─sda1 part 1.9M > └─sda2 part 111.8G ext4 / c598dfdf-d6e7-47d3-888a-10f5f53fa338 > sdb disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > sdc disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > sdd disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > sde disk 7.3T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > sdf disk 2.7T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > sdh disk 2.7T btrfs 8f26ae2d-84b5-47d7-8f19-64b0ef5a481b > > My main system partition on sda mounts fine and is usable to work with the btrfs filesystem that's having issues. > > Running "btrfs check /dev/sdb" exits with this: > > Opening filesystem to check... > Incorrect offsets 13898 13882 > ERROR: cannot open file system > > Also, "btrfs restore -Dv /dev/sdb /tmp" outputs some of the files on the filesystem but not all of them. I'm not sure if this is limited to the files on that physical disk, or if there's a bigger issue with the filesystem. I'm not sure what the best approach from here is, so any advice would be great. -- Hugo Mills | If it's December 1941 in Casablanca, what time is it hugo@... carfax.org.uk | in New York? http://carfax.org.uk/ | PGP: E2AB1DE4 | Rick Blaine, Casablanca [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-02 12:01 ` Hugo Mills @ 2019-02-03 0:26 ` Chris Murphy 2019-02-03 5:40 ` Alan Hardman 0 siblings, 1 reply; 7+ messages in thread From: Chris Murphy @ 2019-02-03 0:26 UTC (permalink / raw) To: Hugo Mills, Alan Hardman, Btrfs BTRFS On Sat, Feb 2, 2019 at 5:02 AM Hugo Mills <hugo@carfax.org.uk> wrote: > > On Fri, Feb 01, 2019 at 11:28:27PM -0500, Alan Hardman wrote: > > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's failing to mount. I've tried the common recommended safe check options, but I haven't gotten the disk to mount at all, even with -o ro,recovery. If necessary, I can try to use the recovery to another filesystem, but I have around 18 TB of data on the filesystem that won't mount, so I'd like to avoid that if there's some other way of recovering it. > > > > Versions: > > btrfs-progs v4.19.1 > > Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 2019 x86_64 GNU/Linux > > > > Based on my understanding of how RAID1 works with Btrfs, I would expect a single disk failure to not prevent the volume from mounting entirely, but I'm only seeing one disk with errors according to dmesg output, maybe I'm misinterpreting it: > > > > [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 'usebackuproot' instead > > [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount time > > [ 534.519443] BTRFS info (device sdd): disk space caching is enabled > > [ 534.519446] BTRFS info (device sdd): has skinny extents > > [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 22208378, flush 1, corrupt 29486730, gen 2933 > > [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > > It's worth noting that 13898-13882 = 16, which is a power of > two. This means that you most likely have a single-bit error in your > metadata. That, plus the checksum not being warned about, would > strongly suggest that you have bad RAM. I would recommend that you > check your RAM first before trying anything else that would write to > your filesystem (including btrfs check --repair). Good catch! I think that can account for the corrupt and generation errors. I don't know that memory errors can account for the large number of read and write errors, however. So there may be more than one problem. -- Chris Murphy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-03 0:26 ` Chris Murphy @ 2019-02-03 5:40 ` Alan Hardman 2019-02-03 18:43 ` Chris Murphy 0 siblings, 1 reply; 7+ messages in thread From: Alan Hardman @ 2019-02-03 5:40 UTC (permalink / raw) To: Hugo Mills, Btrfs BTRFS, Chris Murphy [-- Attachment #1: Type: text/plain, Size: 5329 bytes --] Thanks for the quick response, Chris and Hugo! After some testing, there *was* a RAM issue that has now been resolved, so that should prevent it from being a factor going forward, but could definitely have been related. The high number of lifetime errors for the filesystem is expected, and isn't related to this issue; it was caused by a bad power supply that caused a disk to go completely offline during a balance operation, but was fully recovered via scrub and hasn't shown any increase in errors since then until this new issue (going several months without an error, several TB written). I've attached full output from Chris's recommendations, here are a couple excerpts: # btrfs rescue super -v /dev/sdb ... All supers are valid, no need to recover # journalctl | grep -A 15 exception ... Jan 23 01:06:37 localhost kernel: ata3.00: status: { DRDY } Jan 23 01:06:37 localhost kernel: ata3.00: failed command: WRITE FPDMA QUEUED Jan 23 01:06:37 localhost kernel: ata3.00: cmd 61/b0:98:ea:7a:48/00:00:0a:00:00/40 tag 19 ncq dma 90112 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) -- Jan 31 19:24:32 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 31 19:24:32 localhost kernel: ata5.00: failed command: READ DMA EXT Jan 31 19:24:32 localhost kernel: ata5.00: cmd 25/00:08:a8:2a:81/00:00:a3:03:00/e0 tag 0 dma 4096 in res 40/00:01:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) Jan 31 19:24:32 localhost kernel: ata5.00: status: { DRDY } Jan 31 19:24:32 localhost kernel: ata5: link is slow to respond, please be patient (ready=0) Jan 31 19:24:32 localhost kernel: ata5: device not ready (errno=-16), forcing hardreset Jan 31 19:24:32 localhost kernel: ata5: soft resetting link Jan 31 19:24:32 localhost kernel: ata5.00: configured for UDMA/33 Jan 31 19:24:32 localhost kernel: ata5.01: configured for UDMA/33 Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : Illegal Request [current] Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Add. Sense: Unaligned write command Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a a8 00 00 00 08 00 00 This last journalctl result was from the first system boot when the filesystem stopped being mountable. The filesystem had been remounted as read-only automatically after a few errors (see btrfs-journal.log in archive). None of my other system log files were relevant from what I could tell, so I limited this to journalctl's output. I have been able to successfully recover files via "btrfs restore ...", and there doesn't seem to be anything essential missing from its full output with -D, so if that's necessary to use to offload the entire filesystem, it at least seems possible if it can't be recovered directly. Thanks for the help! On Sat, Feb 2, 2019, at 17:26, Chris Murphy wrote: > On Sat, Feb 2, 2019 at 5:02 AM Hugo Mills <hugo@carfax.org.uk> wrote: > > > > On Fri, Feb 01, 2019 at 11:28:27PM -0500, Alan Hardman wrote: > > > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's failing to mount. I've tried the common recommended safe check options, but I haven't gotten the disk to mount at all, even with -o ro,recovery. If necessary, I can try to use the recovery to another filesystem, but I have around 18 TB of data on the filesystem that won't mount, so I'd like to avoid that if there's some other way of recovering it. > > > > > > Versions: > > > btrfs-progs v4.19.1 > > > Linux localhost 4.20.6-arch1-1-ARCH #1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 2019 x86_64 GNU/Linux > > > > > > Based on my understanding of how RAID1 works with Btrfs, I would expect a single disk failure to not prevent the volume from mounting entirely, but I'm only seeing one disk with errors according to dmesg output, maybe I'm misinterpreting it: > > > > > > [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 'usebackuproot' instead > > > [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount time > > > [ 534.519443] BTRFS info (device sdd): disk space caching is enabled > > > [ 534.519446] BTRFS info (device sdd): has skinny extents > > > [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 22208378, flush 1, corrupt 29486730, gen 2933 > > > [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > > > > It's worth noting that 13898-13882 = 16, which is a power of > > two. This means that you most likely have a single-bit error in your > > metadata. That, plus the checksum not being warned about, would > > strongly suggest that you have bad RAM. I would recommend that you > > check your RAM first before trying anything else that would write to > > your filesystem (including btrfs check --repair). > > Good catch! > > I think that can account for the corrupt and generation errors. I > don't know that memory errors can account for the large number of read > and write errors, however. So there may be more than one problem. > > > -- > Chris Murphy > [-- Attachment #2: btrfs.tar.gz --] [-- Type: application/x-gzip, Size: 11799 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-03 5:40 ` Alan Hardman @ 2019-02-03 18:43 ` Chris Murphy 0 siblings, 0 replies; 7+ messages in thread From: Chris Murphy @ 2019-02-03 18:43 UTC (permalink / raw) To: Alan Hardman; +Cc: Hugo Mills, Btrfs BTRFS, Chris Murphy On Sat, Feb 2, 2019 at 10:40 PM Alan Hardman <alanh@fastmail.com> wrote: > # journalctl | grep -A 15 exception > ... > Jan 23 01:06:37 localhost kernel: ata3.00: status: { DRDY } > Jan 23 01:06:37 localhost kernel: ata3.00: failed command: WRITE FPDMA QUEUED > Jan 23 01:06:37 localhost kernel: ata3.00: cmd 61/b0:98:ea:7a:48/00:00:0a:00:00/40 tag 19 ncq dma 90112 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > -- > Jan 31 19:24:32 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen > Jan 31 19:24:32 localhost kernel: ata5.00: failed command: READ DMA EXT > Jan 31 19:24:32 localhost kernel: ata5.00: cmd 25/00:08:a8:2a:81/00:00:a3:03:00/e0 tag 0 dma 4096 in > res 40/00:01:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) > Jan 31 19:24:32 localhost kernel: ata5.00: status: { DRDY } > Jan 31 19:24:32 localhost kernel: ata5: link is slow to respond, please be patient (ready=0) > Jan 31 19:24:32 localhost kernel: ata5: device not ready (errno=-16), forcing hardreset > Jan 31 19:24:32 localhost kernel: ata5: soft resetting link This kind of error on read is common when there is a marginally bad sector, and the drive is doing deep recovery, and the time for recovery is longer than the SCSI command timer. The reset clears the command queue, meaning it's no longer possible to find out what sector is marginal, and Btrfs can't fix it up. When left for a long time, it allows bad sectors to accumulate, including during scrubbing. I don't actually know what Btrfs does in this case, if the scrub is aborted, or if it continues and shows the error as uncorrectable. This isn't the only possible explanation for this kind of error. But the bottom line is that it's generic, indicates there is a problem, and you need to find out what the real cause is. It's asking for trouble to just leave this kind of error floating in the wind, even if it seems like everything is otherwise working. > Jan 31 19:24:32 localhost kernel: ata5.00: configured for UDMA/33 > Jan 31 19:24:32 localhost kernel: ata5.01: configured for UDMA/33 > Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : Illegal Request [current] > Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Add. Sense: Unaligned write command > Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a a8 00 00 00 08 00 00 > I don't know what the portion of this error sequence means, "illegal request" and "unaligned write command". On 512e Advanced Format hard drives, which have 512 byte logical 4096 byte physical sectors, a 512 byte write must actually be converted to 4096 byte read by the drive firmware in order to do a read write modify. It can't do an overwrite of just those 512 bytes. If there's a problem reading that sector, it comes back as a read error. But... I'm not sure any of that is what this is. To my knowledge Btrfs never does 512 byte writes, the minimum read or write is 4096 bytes. Moving on... From your attached log, there are lots of failed writes. That really must be sorted out because for all raid, write failures are fatal. With md based raid, it'll eject the drive as faulty on a single write failure. Whereas Btrfs keeps trying, as it has no concept of faulty drives still. So you've got recent missing writes to whatever drive has all these write errors. The 'grep -A 15' limited the output so I can't tell how the error ended up being handled either by libata or Btrfs. You really need to find out what device has write problems as the next priority, as this is decently likely to prevent any 'btrfs check --repair' from succeeding. Next the SCSI command timer (this is a kernel timeout per block device) for all devices is the default of 30 seconds. Most drives have SCT ERC of 70 deciseconds which is *good*. But two drives do not, and instead I see: SCT Error Recovery Control command not supported That's not good because it's possible marginally bad sectors will cause the drive firmwar Next, one of those drives has some UDMA errors which suggests an actual link problem between the drive's controller and the logic board's controller - the most logical suspect is just reseating the cable on both ends. But maybe the cable needs replacing. So you'll want to keep an eye on whether these errors continue or not. They'll trigger a libata error message and it'll either retry or reset the link, and therefore might be what caused one of your link reset errors. 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 19 OK so you have a corrupt leaf probably from bad memory, which probably corrupted both copies of the leaf, which is why we don't see any fixup messages. I don't actually know if 'btrfs check --repair' can fix this kind of bad memory induced corruption. But in any case, we've got to get the other hardware problems figured out before a repair is assured to stick. > I have been able to successfully recover files via "btrfs restore ...", and there doesn't seem to be anything essential missing from its full output with -D, so if that's necessary to use to offload the entire filesystem, it at least seems possible if it can't be recovered directly. If the data is important, and if this is the only copy, I always argue in favor of urgently making a backup. Set aside all other troubleshooting. -- Chris Murphy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID1 filesystem not mounting 2019-02-02 4:28 RAID1 filesystem not mounting Alan Hardman 2019-02-02 9:59 ` Bernhard K 2019-02-02 12:01 ` Hugo Mills @ 2019-02-03 0:18 ` Chris Murphy 2 siblings, 0 replies; 7+ messages in thread From: Chris Murphy @ 2019-02-03 0:18 UTC (permalink / raw) To: Alan Hardman; +Cc: Btrfs BTRFS On Fri, Feb 1, 2019 at 9:28 PM Alan Hardman <alanh@fastmail.com> wrote: > > I have a Btrfs filesystem using 6 partitionless disks in RAID1 that's failing to mount. I've tried the common recommended safe check options, but I haven't gotten the disk to mount at all, even with -o ro,recovery. Try '-o ro,degraded,nologreplay' If that works update your backups before you do anything else. Then you can report the following which are all read-only commands, and should still work whether or not the file system is mounted. btrfs fi show btrfs insp dump-s -f <pick any btrfs device> btrfs rescue super -v <pick any btrfs device> smartctl -x <each drive> smartctl -l scterc <each drive> cat /sys/block/sdX/device/timeout #also for each drive X Search all system logs you have with 'grep -A 15 exception' so we can see if there are any nasty libata messages. > [ 534.519437] BTRFS warning (device sdd): 'recovery' is deprecated, use 'usebackuproot' instead > [ 534.519441] BTRFS info (device sdd): trying to use backup root at mount time > [ 534.519443] BTRFS info (device sdd): disk space caching is enabled > [ 534.519446] BTRFS info (device sdd): has skinny extents > [ 536.306194] BTRFS info (device sdd): bdev /dev/sdc errs: wr 23038942, rd 22208378, flush 1, corrupt 29486730, gen 2933 That's a lot of errors. These statistics are for the life of the file system, until reset with 'btrfs dev stats -z' so it's possible all of these errors are for a previous problem you've since recovered from. However, that you now have a new problem, it's not clear to what degree they are the result of read, write, corruption and generation errors. > [ 556.126928] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > [ 556.134767] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 > [ 556.150278] BTRFS critical (device sdd): corrupt leaf: root=2 block=25540634836992 slot=45, unexpected item end, have 13882 expect 13898 The fact this is a raid1 volume, and there are no messages for fixups, tells me this is bad news. Either both copies are bad, or the good copy can't be found (missing device or more than one missing). Anyway, the less you modify the file system with repair attempts, or trying to mount it read write, the better the chance of recovery. Right now there isn't enough information to tell you what to do other than do as little as possible. -- Chris Murphy ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-02-03 18:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-02 4:28 RAID1 filesystem not mounting Alan Hardman 2019-02-02 9:59 ` Bernhard K 2019-02-02 12:01 ` Hugo Mills 2019-02-03 0:26 ` Chris Murphy 2019-02-03 5:40 ` Alan Hardman 2019-02-03 18:43 ` Chris Murphy 2019-02-03 0:18 ` Chris Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).