From: Nix <nix@esperi.org.uk>
To: Tim Small <tim@buttersideup.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Fault tolerance with badblocks
Date: Tue, 09 May 2017 16:30:35 +0100 [thread overview]
Message-ID: <874lwu59pg.fsf@esperi.org.uk> (raw)
In-Reply-To: <7dfa3eff-9194-002d-918b-42fbae865df3@buttersideup.com> (Tim Small's message of "Tue, 9 May 2017 13:15:46 +0100")
On 9 May 2017, Tim Small spake thusly:
> On 09/05/17 11:40, Nix wrote:
>> I've had disk failures without warning, and
>> non-failed disks with both read and write errors that would not go away,
>> but that SMART reallocation value just stayed stuck at zero through all
>> of it.
>
> Really? I see them pretty frequently... Let's see
>
> server1, RAID6 (4 disks), reallocated_sector_ct: 0 9 1 0
> server2, RAID5 (4 disks), reallocated_sector_ct: 0 0 0 0
> server3, RAID6 (5 disks), reallocated_sector_ct: 34 754 15 115 1
> server4, RAID5 (4 disks), reallocated_sector_ct: 0 0 0 0
> server5, RAID5 (4 disks), reallocated_sector_ct: 0 0 0 0
>
> Disk 2 in server3 (which has drives which are a bit long in the tooth)
> is scheduled to be replaced next time I visit that site.
>
> Are you looking at the 'raw' column in the smartctl output?
No, but since they all read all zero:
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
this is pretty redundant.
I do see, on all my disks (regardless of hardware versus software RAID
or indeed age, and some of these disks are seven years old):
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0
One figure is much higher:
195 Hardware_ECC_Recovered -O-RC- 100 064 000 - 2067212
195 Hardware_ECC_Recovered -O-RC- 100 064 000 - 2088928
195 Hardware_ECC_Recovered -O-RC- 082 064 000 - 156528817
195 Hardware_ECC_Recovered -O-RC- 082 065 000 - 156513792
but this is on a bunch of three-month-old Seagate enterprise disks, and
as with the seek error rate Seagate use a deeply bizarre encoding for
this value, and none of the SeaChest programs seem to be able to decode
it.
It appears that the lower the decoded value, the worse things are -- I
have no idea why two of my drives are doing so much worse than two
others on this score. I guess I should keep an eye on them. In any case,
it's going up fast on those two even when the drives are totally idle
and even when I forcibly spin them down... I don't trust this figure to
tell me anything useful at all. SMART, borderline useless as ever.
Aside: in hex these are
001f8b0c
001fdfe0
095470b1
09543600
which rather suggests that the drives have two distinct encodings to me,
with two drives using one encoding and the other two another one,
probably split at the four-hex-digit mark -- but the drives have
identical firmware and the same model number...
next prev parent reply other threads:[~2017-05-09 15:30 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05 4:03 ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20 ` Anthony Youngman
2017-05-06 11:21 ` Ravi (Tom) Hale
2017-05-06 13:00 ` Wols Lists
2017-05-08 14:50 ` Nix
2017-05-08 18:00 ` Anthony Youngman
2017-05-09 10:11 ` David Brown
2017-05-09 10:18 ` Nix
2017-05-08 19:02 ` Phil Turmel
2017-05-08 19:52 ` Nix
2017-05-08 20:27 ` Anthony Youngman
2017-05-09 9:53 ` Nix
2017-05-09 11:09 ` David Brown
2017-05-09 11:27 ` Nix
2017-05-09 11:58 ` David Brown
2017-05-09 17:25 ` Chris Murphy
2017-05-09 19:44 ` Wols Lists
2017-05-10 3:53 ` Chris Murphy
2017-05-10 4:49 ` Wols Lists
2017-05-10 17:18 ` Chris Murphy
2017-05-16 3:20 ` NeilBrown
2017-05-10 5:00 ` Dave Stevens
2017-05-10 16:44 ` Edward Kuns
2017-05-10 18:09 ` Chris Murphy
2017-05-09 20:18 ` Nix
2017-05-09 20:52 ` Wols Lists
2017-05-10 8:41 ` David Brown
2017-05-09 21:06 ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14 ` Nix
2017-05-16 3:27 ` NeilBrown
2017-05-16 9:13 ` Nix
2017-05-16 21:11 ` NeilBrown
2017-05-16 21:46 ` Nix
2017-05-18 0:07 ` Shaohua Li
2017-05-19 4:53 ` NeilBrown
2017-05-19 10:31 ` Nix
2017-05-19 16:48 ` Shaohua Li
2017-06-02 12:28 ` Nix
2017-05-19 4:49 ` NeilBrown
2017-05-19 10:32 ` Nix
2017-05-19 16:55 ` Shaohua Li
2017-05-21 22:00 ` NeilBrown
2017-05-09 19:16 ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01 ` Nix
2017-05-09 20:57 ` Wols Lists
2017-05-09 21:22 ` Nix
2017-05-09 21:23 ` Phil Turmel
2017-05-09 21:32 ` NeilBrown
2017-05-10 19:03 ` Nix
2017-05-09 16:05 ` Chris Murphy
2017-05-09 17:49 ` Wols Lists
2017-05-10 3:06 ` Chris Murphy
2017-05-08 20:56 ` Phil Turmel
2017-05-09 10:28 ` Nix
2017-05-09 10:50 ` Reindl Harald
2017-05-09 11:15 ` Nix
2017-05-09 11:48 ` Reindl Harald
2017-05-09 16:11 ` Nix
2017-05-09 16:46 ` Reindl Harald
2017-05-09 7:37 ` David Brown
2017-05-09 9:58 ` Nix
2017-05-09 10:28 ` Brad Campbell
2017-05-09 10:40 ` Nix
2017-05-09 12:15 ` Tim Small
2017-05-09 15:30 ` Nix [this message]
2017-05-05 20:23 ` Peter Grandi
2017-05-05 22:14 ` Nix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874lwu59pg.fsf@esperi.org.uk \
--to=nix@esperi.org.uk \
--cc=linux-raid@vger.kernel.org \
--cc=tim@buttersideup.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.