All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Andrey Zhunev <a-j@a-j.ru>
Cc: Chris Murphy <lists@colorremedies.com>,
	xfs list <linux-xfs@vger.kernel.org>
Subject: Re: Need help to recover root filesystem after a power supply issue
Date: Wed, 10 Jul 2019 10:46:12 -0600	[thread overview]
Message-ID: <CAJCQCtSTPaor-Wo6b1NF3QT_Pi2rO7B9QMbfudZS=9TEt-Oemw@mail.gmail.com> (raw)
In-Reply-To: <1015034894.20190710190746@a-j.ru>

On Wed, Jul 10, 2019 at 10:08 AM Andrey Zhunev <a-j@a-j.ru> wrote:
>
>
> Wednesday, July 10, 2019, 6:45:28 PM, you wrote:
>
> > On Wed, Jul 10, 2019 at 9:29 AM Andrey Zhunev <a-j@a-j.ru> wrote:
> >>
> >> Well, this machine is always online (24/7, with a UPS backup power).
> >> Yesterday we found it switched OFF, without any signs of life. Trying
> >> to switch it on, the PSU made a humming noise and the machine didn't
> >> even try to start. So we replaced the PSU. After that, the machine
> >> powered on - but refused to boot... Something tells me these two
> >> failures are likely related...
>
> > Most likely the drive is dying and the spin down from power failure
> > and subsequent spin up has increased the rate of degradation, and
> > that's why they seem related.
>
> > What do you get for:
>
> > # smarctl -x /dev/sda
>
>
> The '-x' option gives a lot of output.
> It's pasted here: https://pastebin.com/raw/yW3yDuSF

197 Current_Pending_Sector  -O--CK   200   200   000    -    68


> Well, if there are evidnces the drive is really dying - so be it...
> I just need to recover the data, if possible.
> On the other hand, if the drive will work further - I will find some
> unimportant files to store...

I think 68 pending sectors is excessive and I'd plan to have the drive
replaced under warranty, or demote it to something you don't care
about. Chances are this is going to get worse. I don't know how many
reserve sectors drives have, I don't even have a guess. But I have
seen drives run out of reserve sectors, at which point you start to
see write failures because LBA's can't be remapped from a bad sector
that fails writes, to a good one. At that point, the drive is
untenable.

Anyway, it's a bit tedious to fix 68 sectors manually, so if you have
the time to just wait for it, try this:


# smartctl -l scterc,900,100
# echo 180 > /sys/block/sda/device/timeout

And now try to fsck.

If it fails with i/o very quickly, as in less than 90 seconds, then
that means the drive firmware has concluded deep recovery won't matter
and is pretty much immediately giving up. At that point, those sectors
are lost. You could overwrite those sectors one by one with zeros and
maybe an xfs_repair will have enough information it can reconstruct
and repair things well enough to copy data off. But you'll have to be
suspicious of every file, as anyone of them could have been silently
corrupted - either bad ECC reconstruction by drive firmware or from
overwriting with zeros.

I'd say there's a decent chance of recovery but it will be tedious.

If it seems like the system is hanging without errors, that's actually
a good sign deep recovery is working. But like I said, it could take
hours. And then in the end it might still find a totally unrecoverable
sector.


-- 
Chris Murphy

  reply	other threads:[~2019-07-10 16:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-10  9:47 Need help to recover root filesystem after a power supply issue Andrey Zhunev
2019-07-10 14:30 ` Chris Murphy
2019-07-10 15:28   ` Andrey Zhunev
2019-07-10 15:45     ` Chris Murphy
2019-07-10 16:07       ` Andrey Zhunev
2019-07-10 16:46         ` Chris Murphy [this message]
2019-07-10 16:47           ` Chris Murphy
2019-07-10 17:16             ` Andrey Zhunev
2019-07-10 18:03               ` Chris Murphy
2019-07-10 18:35                 ` Carlos E. R.
2019-07-10 19:30                   ` Chris Murphy
2019-07-10 23:43                     ` Andrey Zhunev
2019-07-11  2:47                       ` Carlos E. R.
2019-07-11  7:10                         ` Andrey Zhunev
2019-07-11 10:23                           ` Carlos E. R.
2019-07-10 16:51         ` Chris Murphy
2019-07-10  9:56 Andrey Zhunev
2019-07-10 13:26 ` Eric Sandeen
2019-07-10 13:58   ` Andrey Zhunev
2019-07-10 14:23     ` Eric Sandeen
2019-07-10 15:02       ` Andrey Zhunev
2019-07-10 15:23         ` Eric Sandeen
2019-07-10 18:21         ` Carlos E. R.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJCQCtSTPaor-Wo6b1NF3QT_Pi2rO7B9QMbfudZS=9TEt-Oemw@mail.gmail.com' \
    --to=lists@colorremedies.com \
    --cc=a-j@a-j.ru \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.