linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bw <bwakkie@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: cannot mount after freeze
Date: Fri, 16 Jul 2021 17:13:27 +0200	[thread overview]
Message-ID: <CAKqYf_+cyjnhFxpqdx1CU=0mGdagZu_bwOeW7ZesgkF29TTvqQ@mail.gmail.com> (raw)
In-Reply-To: <98b97dff-c165-92f6-9392-ebea55387814@gmx.com>

[-- Attachment #1: Type: text/plain, Size: 7024 bytes --]

On Fri, 16 Jul 2021 at 14:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2021/7/16 下午7:12, bw wrote:
> > Hello,
> >
> > My raid1 with 3 hdd's that contain /home and /data cannot be mounted
> > after a freeze/restart on the 14 of juli
> >
> > My root / (ubuntu 20.10) is on a raid with 2 ssd's so I can boot but I
> > always end up in rescue mode atm. When disabling the two mounts (/data
> > and /home) in fstab i can log in as root. (had to first change the
> > root password via a rescue usb in order to log in)
>
> /dev/sde has corrupted chunk root, which is pretty rare.
>
> [    8.175417] BTRFS error (device sde): parent transid verify failed on
> 5028524228608 wanted 1427600 found 1429491
> [    8.175729] BTRFS error (device sde): bad tree block start, want
> 5028524228608 have 0
> [    8.175771] BTRFS error (device sde): failed to read chunk root
>
> Chunk tree is the very essential tree, handling the logical bytenr ->
> physical device mapping.
>
> If it has something wrong, it's a big problem.
>
> But normally, such transid error indicates the HDD or the hardware RAID
> controller has something wrong handling barrier/flush command.
> Mostly it means the disk or the hardware controller is lying about its
> FLUSH command.
>
>
> You can try "btrfs rescue chunk-recovery" but I doubt the chance of
> success, as such transid error never just show up in one tree.

For a long time the command was chunk-ing away and had hopes when it
was reporting DONE in two places but then came an error:
~# btrfs rescue chunk-recover /dev/sde
Scanning: DONE in dev0, DONE in dev1, DONE in dev2
parent transid verify failed on 4146261671936 wanted 1427658 found 1439315
parent transid verify failed on 4146261721088 wanted 1427658 found 1439317
parent transid verify failed on 4146236669952 wanted 1427658 found 1439310
parent transid verify failed on 4146174771200 wanted 1427600 found 1439310
parent transid verify failed on 4146258919424 wanted 1427656 found 1439317
parent transid verify failed on 4146238095360 wanted 1427658 found 1439317
parent transid verify failed on 4146260951040 wanted 1427656 found 1439317
parent transid verify failed on 4146266193920 wanted 1427656 found 1439065
parent transid verify failed on 4146067701760 wanted 1427599 found 1439304
parent transid verify failed on 4146246123520 wanted 1427599 found 1439316
parent transid verify failed on 4146246139904 wanted 1427599 found 1439312
parent transid verify failed on 4146246238208 wanted 1427599 found 1439317
parent transid verify failed on 4146246254592 wanted 1427599 found 1439317
parent transid verify failed on 4146246303744 wanted 1427599 found 1439317
parent transid verify failed on 4146246320128 wanted 1427599 found 1439317
parent transid verify failed on 4146246336512 wanted 1427599 found 1439317
parent transid verify failed on 4146246352896 wanted 1427599 found 1438647
parent transid verify failed on 4146246369280 wanted 1427599 found 1439312
parent transid verify failed on 4146237063168 wanted 1427604 found 1439314
parent transid verify failed on 4146236637184 wanted 1427603 found 1439316
parent transid verify failed on 4146260754432 wanted 1427604 found 1439317
parent transid verify failed on 4146246516736 wanted 1427599 found 1439317
parent transid verify failed on 4146246533120 wanted 1427599 found 1439065
parent transid verify failed on 4146268749824 wanted 1427602 found 1439316
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
Ignoring transid failure
leaf parent key incorrect 5141904293888
ERROR: failed to read block groups: Operation not permitted
open with broken chunk error
Chunk tree recovery failed

... and attached a couple of btrfs checks without doing any dangerous option


>
>
> Now let's talk about the other device, /dev/sdb.
>
> This is more straightforward:
>
> [    3.165790] ata2.00: exception Emask 0x10 SAct 0x10000 SErr 0x680101
> action 0x6 frozen
> [    3.165846] ata2.00: irq_stat 0x08000000, interface fatal error
> [    3.165892] ata2: SError: { RecovData UnrecovData 10B8B BadCRC Handshk }
> [    3.165940] ata2.00: failed command: READ FPDMA QUEUED
> [    3.165987] ata2.00: cmd 60/f8:80:08:01:00/00:00:00:00:00/40 tag 16
> ncq dma 126976 in
>                          res 40/00:80:08:01:00/00:00:00:00:00/40 Emask
> 0x10 (ATA bus error)
> [    3.166055] ata2.00: status: { DRDY }
>
> Read command just failed, with hardware reporting internal checksum error.
>
> This definitely means the device is not working properly.
>
>
> And later we got the even stranger error message:
>
> [    3.571793] sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE cmd_age=0s
> [    3.571848] sd 1:0:0:0: [sdb] tag#16 Sense Key : Illegal Request
> [current]
> [    3.571895] sd 1:0:0:0: [sdb] tag#16 Add. Sense: Unaligned write command
> [    3.571943] sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 01 08
> 00 00 f8 00
> [    3.571996] blk_update_request: I/O error, dev sdb, sector 264 op
> 0x0:(READ) flags 0x80700 phys_seg 30 prio class 0
>
> The disk reports it got some unaligned write, but the block layer says
> the operation failed is a READ.
>
> Not sure if the device is really sane.
>
>
> All these disks are the same model, ST2000DM008, I think it should more
> or less indicate there is something wrong in the model...
>
> Recently I have got at least two friends reporting Seagate HDDs have
> various problems.
> Not sure if they are using the same model.
>
> >
> > smartctl seam to say the disks are ok but i'm still unable to mount.
> > scrub doesnt see any errors
>
> Well, you already have /dev/sdb report internal checksum error, aka data
> corruption inside the disk, and your smartctl is report everything fine.
>
> Then I guess the disk is lying again on the smart info.
> (Now I'm more convinced the disk is lying about FLUSH or at least has
> something wrong doing FLUSH).
>
> >
> > I have installed btrfsmaintanance btw.
> >
> > Can anyone advice me which steps to take in order to save the data?
> > There is no backup (yes I'm a fool but was under the impression that
> > with a copy of each file on 2 different disks I'll survive)
>
> For /dev/sdb, I have no idea at all, as we failed to read something from
> the disk, it's a completely disk failure.
>
> For /dev/sde, as mentioned, you can try "btrfs rescue chunk-recovery",
> and then let "btrfs check" to find out what's wrong.
>
> And I'm pretty sure you won't buy the same disks from Seagate next time.
>
> Thanks,
> Qu
> >
> > Attached all(?) important files + a history of my attempts the past
> > days. My attempts from the system rescue usb are not in though.
> >
> > kind regards,
> > Bastiaan
> >

thx,
Bastiaan

[-- Attachment #2: btrfs_check-Q --]
[-- Type: application/octet-stream, Size: 980 bytes --]

Script started on 2021-07-16 16:58:48+02:00 [TERM="linux" TTY="/dev/tty3" COLUMNS="210" LINES="65"]
Opening filesystem to check...
checksum verify failed on 5028524228608 found 000000B6 wanted 00000000
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
Ignoring transid failure
checksum verify failed on 5028524261376 found 000000B6 wanted 00000000
checksum verify failed on 5028524277760 found 000000B6 wanted 00000000
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
Ignoring transid failure
leaf parent key incorrect 5141904293888
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

Script done on 2021-07-16 16:58:48+02:00 [COMMAND_EXIT_CODE="1"]

[-- Attachment #3: btrfs_check-b --]
[-- Type: application/octet-stream, Size: 980 bytes --]

Script started on 2021-07-16 16:56:10+02:00 [TERM="linux" TTY="/dev/tty3" COLUMNS="210" LINES="65"]
Opening filesystem to check...
checksum verify failed on 5028524228608 found 000000B6 wanted 00000000
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
Ignoring transid failure
checksum verify failed on 5028524261376 found 000000B6 wanted 00000000
checksum verify failed on 5028524277760 found 000000B6 wanted 00000000
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
Ignoring transid failure
leaf parent key incorrect 5141904293888
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

Script done on 2021-07-16 16:56:11+02:00 [COMMAND_EXIT_CODE="1"]

[-- Attachment #4: btrfs_check-data-csum --]
[-- Type: application/octet-stream, Size: 980 bytes --]

Script started on 2021-07-16 16:57:08+02:00 [TERM="linux" TTY="/dev/tty3" COLUMNS="210" LINES="65"]
Opening filesystem to check...
checksum verify failed on 5028524228608 found 000000B6 wanted 00000000
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
Ignoring transid failure
checksum verify failed on 5028524261376 found 000000B6 wanted 00000000
checksum verify failed on 5028524277760 found 000000B6 wanted 00000000
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
Ignoring transid failure
leaf parent key incorrect 5141904293888
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

Script done on 2021-07-16 16:57:09+02:00 [COMMAND_EXIT_CODE="1"]

[-- Attachment #5: btrfs_check-s --]
[-- Type: application/octet-stream, Size: 1018 bytes --]

Script started on 2021-07-16 16:59:46+02:00 [TERM="linux" TTY="/dev/tty3" COLUMNS="210" LINES="65"]
using SB copy 2, bytenr 274877906944
Opening filesystem to check...
checksum verify failed on 5028524228608 found 000000B6 wanted 00000000
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
parent transid verify failed on 5028524228608 wanted 1427600 found 1429491
Ignoring transid failure
checksum verify failed on 5028524261376 found 000000B6 wanted 00000000
checksum verify failed on 5028524277760 found 000000B6 wanted 00000000
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
parent transid verify failed on 5141904293888 wanted 1419828 found 1439215
Ignoring transid failure
leaf parent key incorrect 5141904293888
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

Script done on 2021-07-16 16:59:46+02:00 [COMMAND_EXIT_CODE="1"]

[-- Attachment #6: btrfs_check-r --]
[-- Type: application/octet-stream, Size: 213 bytes --]

Script started on 2021-07-16 16:58:23+02:00 [TERM="linux" TTY="/dev/tty3" COLUMNS="210" LINES="65"]
ERROR: /dev/sde is not a valid numeric value.

Script done on 2021-07-16 16:58:23+02:00 [COMMAND_EXIT_CODE="1"]

  parent reply	other threads:[~2021-07-16 15:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-16 11:12 cannot mount after freeze bw
2021-07-16 12:26 ` Qu Wenruo
2021-07-16 13:00   ` Su Yue
2021-07-16 15:13   ` bw [this message]
2021-07-19 13:57     ` Fwd: " bw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKqYf_+cyjnhFxpqdx1CU=0mGdagZu_bwOeW7ZesgkF29TTvqQ@mail.gmail.com' \
    --to=bwakkie@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).