linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Andrei Borzenkov <arvidjaar@gmail.com>,
	bolderbast@duckstad.net,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Need help with potential ~45TB dataloss
Date: Mon, 3 Dec 2018 20:16:05 -0700	[thread overview]
Message-ID: <CAJCQCtRgzw=UPfHxeaiDzqKGzrbSKz-B3xfUpBjYM4GGe=W4YA@mail.gmail.com> (raw)
In-Reply-To: <7dac5577-2231-dcba-39fd-c229e4ed5e02@gmx.com>

Also useful information for autopsy, perhaps not for fixing, is to
know whether the SCT ERC value for every drive is less than the
kernel's SCSI driver block device command timeout value. It's super
important that the drive reports an explicit read failure before the
read command is considered failed by the kernel. If the drive is still
trying to do a read, and the kernel command timer times out, it'll
just do a reset of the whole link and we lose the outcome for the
hanging command. Upon explicit read error only, can Btrfs, or md RAID,
know what device and physical sector has a problem, and therefore how
to reconstruct the block, and fix the bad sector with a write of known
good data.

smartctl -l scterc /device/
and
cat /sys/block/sda/device/timeout

Only if SCT ERC is enabled with a value below 30, or if the kernel
command timer is change to be well above 30 (like 180, which is
absolutely crazy but a separate conversation) can we be sure that
there haven't just been resets going on for a while, preventing bad
sectors from being fixed up all along, and can contribute to the
problem. This comes up on the linux-raid (mainly md driver) list all
the time, and it contributes to lost RAID all the time. And arguably
it leads to unnecessary data loss in even the single device
desktop/laptop use case as well.


Chris Murphy

  reply	other threads:[~2018-12-04  3:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 13:53 Need help with potential ~45TB dataloss Patrick Dijkgraaf
2018-11-30 23:57 ` Qu Wenruo
2018-12-02  9:03   ` Patrick Dijkgraaf
2018-12-02 20:14     ` Patrick Dijkgraaf
2018-12-02 20:30       ` Andrei Borzenkov
2018-12-03  5:58         ` Qu Wenruo
2018-12-04  3:16           ` Chris Murphy [this message]
2018-12-04 10:09             ` Patrick Dijkgraaf
2018-12-04 19:38               ` Chris Murphy
2018-12-09  9:28                 ` Patrick Dijkgraaf
2018-12-03  0:35     ` Qu Wenruo
2018-12-03  0:45       ` Qu Wenruo
2018-12-04  9:58       ` Patrick Dijkgraaf
2018-12-09  9:32         ` Patrick Dijkgraaf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJCQCtRgzw=UPfHxeaiDzqKGzrbSKz-B3xfUpBjYM4GGe=W4YA@mail.gmail.com' \
    --to=lists@colorremedies.com \
    --cc=arvidjaar@gmail.com \
    --cc=bolderbast@duckstad.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).