linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Dijkgraaf <bolderbast@duckstad.net>
To: Chris Murphy <lists@colorremedies.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Andrei Borzenkov <arvidjaar@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Need help with potential ~45TB dataloss
Date: Tue, 04 Dec 2018 11:09:55 +0100	[thread overview]
Message-ID: <8e5729f3c15997a13bdce73800146e91222ed89c.camel@duckstad.net> (raw)
In-Reply-To: <CAJCQCtRgzw=UPfHxeaiDzqKGzrbSKz-B3xfUpBjYM4GGe=W4YA@mail.gmail.com>

Hi Chris,

See the output below. Any suggestions based on it?
Thanks!

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Mon, 2018-12-03 at 20:16 -0700, Chris Murphy wrote:
> Also useful information for autopsy, perhaps not for fixing, is to
> know whether the SCT ERC value for every drive is less than the
> kernel's SCSI driver block device command timeout value. It's super
> important that the drive reports an explicit read failure before the
> read command is considered failed by the kernel. If the drive is
> still
> trying to do a read, and the kernel command timer times out, it'll
> just do a reset of the whole link and we lose the outcome for the
> hanging command. Upon explicit read error only, can Btrfs, or md
> RAID,
> know what device and physical sector has a problem, and therefore how
> to reconstruct the block, and fix the bad sector with a write of
> known
> good data.
> 
> smartctl -l scterc /device/

Seems to not work:

[root@cornelis ~]# for disk in /dev/sd{e..x}; do echo ${disk}; smartctl
-l scterc ${disk}; done
/dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdf
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdg
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdh
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdi
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdj
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdk
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

Smartctl open device: /dev/sdk failed: No such device
/dev/sdl
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdm
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdn
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

/dev/sdo
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdp
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdq
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdr
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sds
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdt
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

/dev/sdu
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdv
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdw
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdx
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

> and
> cat /sys/block/sda/device/timeout

[root@cornelis ~]# cat /sys/block/sd{e..x}/device/timeout
30
30
30
30
30
30
cat: /sys/block/sdk/device/timeout: No such file or directory
30
30
30
30
30
30
30
30
30
30
30
30
30

> Only if SCT ERC is enabled with a value below 30, or if the kernel
> command timer is change to be well above 30 (like 180, which is
> absolutely crazy but a separate conversation) can we be sure that
> there haven't just been resets going on for a while, preventing bad
> sectors from being fixed up all along, and can contribute to the
> problem. This comes up on the linux-raid (mainly md driver) list all
> the time, and it contributes to lost RAID all the time. And arguably
> it leads to unnecessary data loss in even the single device
> desktop/laptop use case as well.
> 
> 
> Chris Murphy


  reply	other threads:[~2018-12-04 10:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 13:53 Need help with potential ~45TB dataloss Patrick Dijkgraaf
2018-11-30 23:57 ` Qu Wenruo
2018-12-02  9:03   ` Patrick Dijkgraaf
2018-12-02 20:14     ` Patrick Dijkgraaf
2018-12-02 20:30       ` Andrei Borzenkov
2018-12-03  5:58         ` Qu Wenruo
2018-12-04  3:16           ` Chris Murphy
2018-12-04 10:09             ` Patrick Dijkgraaf [this message]
2018-12-04 19:38               ` Chris Murphy
2018-12-09  9:28                 ` Patrick Dijkgraaf
2018-12-03  0:35     ` Qu Wenruo
2018-12-03  0:45       ` Qu Wenruo
2018-12-04  9:58       ` Patrick Dijkgraaf
2018-12-09  9:32         ` Patrick Dijkgraaf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e5729f3c15997a13bdce73800146e91222ed89c.camel@duckstad.net \
    --to=bolderbast@duckstad.net \
    --cc=arvidjaar@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).