linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Su Yue <l@damenly.su>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: bw <bwakkie@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: cannot mount after freeze
Date: Fri, 16 Jul 2021 21:00:17 +0800	[thread overview]
Message-ID: <o8b2wg5i.fsf@damenly.su> (raw)
In-Reply-To: <98b97dff-c165-92f6-9392-ebea55387814@gmx.com>


On Fri 16 Jul 2021 at 20:26, Qu Wenruo <quwenruo.btrfs@gmx.com> 
wrote:

> On 2021/7/16 下午7:12, bw wrote:
>> Hello,
>>
>> My raid1 with 3 hdd's that contain /home and /data cannot be 
>> mounted
>> after a freeze/restart on the 14 of juli
>>
>> My root / (ubuntu 20.10) is on a raid with 2 ssd's so I can 
>> boot but I
>> always end up in rescue mode atm. When disabling the two mounts 
>> (/data
>> and /home) in fstab i can log in as root. (had to first change 
>> the
>> root password via a rescue usb in order to log in)
>
> /dev/sde has corrupted chunk root, which is pretty rare.
>
> [    8.175417] BTRFS error (device sde): parent transid verify 
> failed on
> 5028524228608 wanted 1427600 found 1429491
> [    8.175729] BTRFS error (device sde): bad tree block start, 
> want
> 5028524228608 have 0
> [    8.175771] BTRFS error (device sde): failed to read chunk 
> root
>
> Chunk tree is the very essential tree, handling the logical 
> bytenr ->
> physical device mapping.
>
> If it has something wrong, it's a big problem.
>
> But normally, such transid error indicates the HDD or the 
> hardware RAID
> controller has something wrong handling barrier/flush command.
> Mostly it means the disk or the hardware controller is lying 
> about its
> FLUSH command.
>
>
> You can try "btrfs rescue chunk-recovery" but I doubt the chance 
> of
> success, as such transid error never just show up in one tree.
>
>
> Now let's talk about the other device, /dev/sdb.
>
> This is more straightforward:
>
> [    3.165790] ata2.00: exception Emask 0x10 SAct 0x10000 SErr 
> 0x680101
> action 0x6 frozen
> [    3.165846] ata2.00: irq_stat 0x08000000, interface fatal 
> error
> [    3.165892] ata2: SError: { RecovData UnrecovData 10B8B 
> BadCRC Handshk }
> [    3.165940] ata2.00: failed command: READ FPDMA QUEUED
> [    3.165987] ata2.00: cmd 60/f8:80:08:01:00/00:00:00:00:00/40 
> tag 16
> ncq dma 126976 in
>                          res 40/00:80:08:01:00/00:00:00:00:00/40 
>                          Emask
> 0x10 (ATA bus error)
> [    3.166055] ata2.00: status: { DRDY }
>
> Read command just failed, with hardware reporting internal 
> checksum error.
>
> This definitely means the device is not working properly.
>
>
> And later we got the even stranger error message:
>
> [    3.571793] sd 1:0:0:0: [sdb] tag#16 FAILED Result: 
> hostbyte=DID_OK
> driverbyte=DRIVER_SENSE cmd_age=0s
> [    3.571848] sd 1:0:0:0: [sdb] tag#16 Sense Key : Illegal 
> Request
> [current]
> [    3.571895] sd 1:0:0:0: [sdb] tag#16 Add. Sense: Unaligned 
> write command
> [    3.571943] sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 
> 00 01 08
> 00 00 f8 00
> [    3.571996] blk_update_request: I/O error, dev sdb, sector 
> 264 op
> 0x0:(READ) flags 0x80700 phys_seg 30 prio class 0
>
> The disk reports it got some unaligned write, but the block 
> layer says
> the operation failed is a READ.
>
> Not sure if the device is really sane.
>
>
> All these disks are the same model, ST2000DM008, I think it 
> should more
> or less indicate there is something wrong in the model...
>
> Recently I have got at least two friends reporting Seagate HDDs 
> have
> various problems.
> Not sure if they are using the same model.
>
Just say what I have seen.

Right, the model is ST2000DM008, 2T version. The background is 
that
there are three disks in ST2000DM008 model running with xfs on 
Dell
OEM machines respectively. I think other hardwares OEM by Dell 
should
be stable enough except the suspicious Seagate disks in SMR.
All of the three disks have abnormal error info in their smart 
info
after running only 10,000 hours. One disk even has 609 error logs.

--
Su
>>
>> smartctl seam to say the disks are ok but i'm still unable to 
>> mount.
>> scrub doesnt see any errors
>
> Well, you already have /dev/sdb report internal checksum error, 
> aka data
> corruption inside the disk, and your smartctl is report 
> everything fine.
>
> Then I guess the disk is lying again on the smart info.
> (Now I'm more convinced the disk is lying about FLUSH or at 
> least has
> something wrong doing FLUSH).
>
>>
>> I have installed btrfsmaintanance btw.
>>
>> Can anyone advice me which steps to take in order to save the 
>> data?
>> There is no backup (yes I'm a fool but was under the impression 
>> that
>> with a copy of each file on 2 different disks I'll survive)
>
> For /dev/sdb, I have no idea at all, as we failed to read 
> something from
> the disk, it's a completely disk failure.
>
> For /dev/sde, as mentioned, you can try "btrfs rescue 
> chunk-recovery",
> and then let "btrfs check" to find out what's wrong.
>
> And I'm pretty sure you won't buy the same disks from Seagate 
> next time.
>
> Thanks,
> Qu
>>
>> Attached all(?) important files + a history of my attempts the 
>> past
>> days. My attempts from the system rescue usb are not in though.
>>
>> kind regards,
>> Bastiaan
>>

  reply	other threads:[~2021-07-16 13:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-16 11:12 cannot mount after freeze bw
2021-07-16 12:26 ` Qu Wenruo
2021-07-16 13:00   ` Su Yue [this message]
2021-07-16 15:13   ` bw
2021-07-19 13:57     ` Fwd: " bw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=o8b2wg5i.fsf@damenly.su \
    --to=l@damenly.su \
    --cc=bwakkie@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).