linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: bw <bwakkie@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: cannot mount after freeze
Date: Fri, 16 Jul 2021 20:26:55 +0800	[thread overview]
Message-ID: <98b97dff-c165-92f6-9392-ebea55387814@gmx.com> (raw)
In-Reply-To: <CAKqYf_KZ_fWN55adSCsf6VuaaYa3FSz4XVmE8gL7N45DDO+CBA@mail.gmail.com>



On 2021/7/16 下午7:12, bw wrote:
> Hello,
>
> My raid1 with 3 hdd's that contain /home and /data cannot be mounted
> after a freeze/restart on the 14 of juli
>
> My root / (ubuntu 20.10) is on a raid with 2 ssd's so I can boot but I
> always end up in rescue mode atm. When disabling the two mounts (/data
> and /home) in fstab i can log in as root. (had to first change the
> root password via a rescue usb in order to log in)

/dev/sde has corrupted chunk root, which is pretty rare.

[    8.175417] BTRFS error (device sde): parent transid verify failed on
5028524228608 wanted 1427600 found 1429491
[    8.175729] BTRFS error (device sde): bad tree block start, want
5028524228608 have 0
[    8.175771] BTRFS error (device sde): failed to read chunk root

Chunk tree is the very essential tree, handling the logical bytenr ->
physical device mapping.

If it has something wrong, it's a big problem.

But normally, such transid error indicates the HDD or the hardware RAID
controller has something wrong handling barrier/flush command.
Mostly it means the disk or the hardware controller is lying about its
FLUSH command.


You can try "btrfs rescue chunk-recovery" but I doubt the chance of
success, as such transid error never just show up in one tree.


Now let's talk about the other device, /dev/sdb.

This is more straightforward:

[    3.165790] ata2.00: exception Emask 0x10 SAct 0x10000 SErr 0x680101
action 0x6 frozen
[    3.165846] ata2.00: irq_stat 0x08000000, interface fatal error
[    3.165892] ata2: SError: { RecovData UnrecovData 10B8B BadCRC Handshk }
[    3.165940] ata2.00: failed command: READ FPDMA QUEUED
[    3.165987] ata2.00: cmd 60/f8:80:08:01:00/00:00:00:00:00/40 tag 16
ncq dma 126976 in
                         res 40/00:80:08:01:00/00:00:00:00:00/40 Emask
0x10 (ATA bus error)
[    3.166055] ata2.00: status: { DRDY }

Read command just failed, with hardware reporting internal checksum error.

This definitely means the device is not working properly.


And later we got the even stranger error message:

[    3.571793] sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE cmd_age=0s
[    3.571848] sd 1:0:0:0: [sdb] tag#16 Sense Key : Illegal Request
[current]
[    3.571895] sd 1:0:0:0: [sdb] tag#16 Add. Sense: Unaligned write command
[    3.571943] sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 01 08
00 00 f8 00
[    3.571996] blk_update_request: I/O error, dev sdb, sector 264 op
0x0:(READ) flags 0x80700 phys_seg 30 prio class 0

The disk reports it got some unaligned write, but the block layer says
the operation failed is a READ.

Not sure if the device is really sane.


All these disks are the same model, ST2000DM008, I think it should more
or less indicate there is something wrong in the model...

Recently I have got at least two friends reporting Seagate HDDs have
various problems.
Not sure if they are using the same model.

>
> smartctl seam to say the disks are ok but i'm still unable to mount.
> scrub doesnt see any errors

Well, you already have /dev/sdb report internal checksum error, aka data
corruption inside the disk, and your smartctl is report everything fine.

Then I guess the disk is lying again on the smart info.
(Now I'm more convinced the disk is lying about FLUSH or at least has
something wrong doing FLUSH).

>
> I have installed btrfsmaintanance btw.
>
> Can anyone advice me which steps to take in order to save the data?
> There is no backup (yes I'm a fool but was under the impression that
> with a copy of each file on 2 different disks I'll survive)

For /dev/sdb, I have no idea at all, as we failed to read something from
the disk, it's a completely disk failure.

For /dev/sde, as mentioned, you can try "btrfs rescue chunk-recovery",
and then let "btrfs check" to find out what's wrong.

And I'm pretty sure you won't buy the same disks from Seagate next time.

Thanks,
Qu
>
> Attached all(?) important files + a history of my attempts the past
> days. My attempts from the system rescue usb are not in though.
>
> kind regards,
> Bastiaan
>

  reply	other threads:[~2021-07-16 12:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-16 11:12 bw
2021-07-16 12:26 ` Qu Wenruo [this message]
2021-07-16 13:00   ` Su Yue
2021-07-16 15:13   ` bw
2021-07-19 13:57     ` Fwd: " bw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98b97dff-c165-92f6-9392-ebea55387814@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=bwakkie@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --subject='Re: cannot mount after freeze' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox