All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Disk Monitoring
@ 2017-06-28 13:19 Wolfgang Denk
  2017-06-29  9:52 ` Gandalf Corvotempesta
  0 siblings, 1 reply; 23+ messages in thread
From: Wolfgang Denk @ 2017-06-28 13:19 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2945 bytes --]

Dear Gandalf,

In message <CAJH6TXgvrVckHDmh1oiN9mupLrsS2NP3J44bG1_wE9Nnx4=yHQ@mail.gmail.com> you wrote:
> 
> 1) all raid controllers have proactive monitoring features, like
> patrol read, consistency check and (more or less) some SMART
> integration.
> Any counterpart in mdadm?

As Wol already pointed out, you should use  smaartctl  to monitor
the state of the disk drives, ideally on a regular base.  Changes
(increases) of numbers like "Reallocated Sectors", ""Current Pending
Sectors" or ""Offline Uncorrectable Sectors" are always suspicious.
If they increase just by one, and then stay constant for weeks you
can probably ignore it.  But if you see I/O errors in the system
logs and/or "Reallocated Sectors" increasing every few days then you
should not wait much longer and replace the respective drive.

Attached are two very simple scripts I use for this purpose;
"disk-test" simply runs smartctl on all /dev/sd? devices and parses
the output.  The result is something like this:

$ sudo disk-test
=== /dev/sda : ST1000NM0011 S/N Z1N2RA6E *** ERRORS ***
        Reallocated Sectors:     1
=== /dev/sdb : ST2000NM0033-9ZM175 S/N Z1X1J1K9 OK
=== /dev/sdc : ST2000NM0033-9ZM175 S/N Z1X1JEF6 OK
=== /dev/sdd : ST2000NM0033-9ZM175 S/N Z1X4XSN9 OK
=== /dev/sde : ST2000NM0033-9ZM175 S/N Z1X4X6G8 OK
=== /dev/sdf : ST2000NM0033-9ZM175 S/N Z1X54EA1 OK
=== /dev/sdg : ST2000NM0033-9ZM175 S/N Z1X5443W OK
=== /dev/sdh : ST2000NM0033-9ZM175 S/N Z1X4XAHQ OK
=== /dev/sdi : ST2000NM0033-9ZM175 S/N Z1X4X6NB OK
=== /dev/sdj : TOSHIBA MK1002TSKB S/N 32E3K0K2F OK
=== /dev/sdk : TOSHIBA MK1002TSKB S/N 32F3K0PRF OK
=== /dev/sdl : TOSHIBA MK1002TSKB S/N 32H3K10CF *** ERRORS ***
        Reallocated Sectors:     1
=== /dev/sdm : TOSHIBA MK1002TSKB S/N 32H3K0ZLF OK
=== /dev/sdn : TOSHIBA MK1002TSKB S/N 32H3K104F OK
=== /dev/sdo : TOSHIBA MK1002TSKB S/N 32H1K31DF OK
=== /dev/sdp : TOSHIBA MK1002TSKB S/N 32F3K0PUF OK
=== /dev/sdq : TOSHIBA MK1002TSKB S/N 32E3K0JZF OK

Here I have two drives with 1 reallocated sector each, which I
consider harmeless as it has stayed constant for several months.

The second script "disk-watch" is intended to be run as a cron job
on a regular base (here usually twice per day).  It will send out
email whenever the state changes (don't forget to adjust the MAIL_TO
setting).  You may also want to clean up the entries in /var/log/diskwatch
every now and then (or better add it to your logrotate
configuration).

HTH.


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Yes, it's a technical challenge, and  you  have  to  kind  of  admire
people  who go to the lengths of actually implementing it, but at the
same time you wonder about their IQ...
         --  Linus Torvalds in <5phda5$ml6$1@palladium.transmeta.com>


[-- Attachment #2: disk-test --]
[-- Type: text/plain , Size: 1083 bytes --]

#!/bin/sh

DISKS="$(echo /dev/sd?)"

PATH=$PATH:/sbin:/usr/sbin

for i in ${DISKS}
do
	SMARTDATA=$(smartctl -a $i | \
	egrep 'Device Model:|Serial Number:|Reallocated_Sector_Ct|Current_Pending_Sector|Offline_Uncorrectable|failed|Unknown USB' | \
	grep -v ' -  *0$')
	LINES=$(echo "${SMARTDATA}" | wc -l)
	HEAD=$(echo "${SMARTDATA}" | \
	       sed -n -e 's/Device Model: //p' \
		      -e 's!Serial Number:!S/N!p')	
	BODY=$(echo "${SMARTDATA}" | \
	       awk '$2 ~ /Reallocated_Sector_Ct/	{ printf "Reallocated Sectors:   %3d\n", $10 }
		    $2 ~ /Current_Pending_Sector/	{ printf "Current Pending Sect:  %3d\n", $10 }
		    $2 ~ /Offline_Uncorrectable/	{ printf "Offline Uncorrectable: %3d\n", $10 }
		    $0 ~ /failed:.*AMCC/		{ printf "Unsupported AMCC/3ware controller\n" }
		    $0 ~ /SMART command failed/		{ printf "Device does not support SMART\n" }
		    $0 ~ /Unknown USB bridge/		{ printf "Unknown USB bridge\n" }
		'
	     )
	if [ $LINES -eq 2 ]
	then
		echo === $i : ${HEAD} OK
	else
		echo === $i : ${HEAD} "*** ERRORS ***"
		echo "${BODY}" | sed -e 's/^/	/'
	fi
done

[-- Attachment #3: disk-watch --]
[-- Type: text/plain , Size: 683 bytes --]

#!/bin/sh

D_TEST=/usr/local/sbin/disk-test
D_LOGDIR=/var/log/diskwatch
MAIL_TO="root"

[ -x ${D_TEST} ] || { echo "ERROR: cannot execute ${D_TEST}" >&2 ; exit 1 ; }

[ -d ${D_LOGDIR} ] || \
	mkdir -p ${D_LOGDIR} || \
		{ echo "ERROR: cannot create ${D_LOGDIR}" >&2 ; exit 1 ; }

cd ${D_LOGDIR} || { echo "ERROR: cannot cd ${D_LOGDIR}" >&2 ; exit 1 ; }

rm -f previous

[ -L latest ] && mv latest previous

NOW=$(date "+%F-%T")

${D_TEST} >${NOW}

ln -s "${NOW}" latest

DIFF=''

[ -r previous ] && DIFF=$(diff -u previous latest)

[ -z "${DIFF}" ] && exit 0

mailx -s "$(hostname): SMART DISK WARNING" ${MAIL_TO} <<+++
Disk status change:
${DIFF}

Recent results:
$(cat latest)
+++

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-28 13:19 Disk Monitoring Wolfgang Denk
@ 2017-06-29  9:52 ` Gandalf Corvotempesta
  2017-06-29 10:10   ` Reindl Harald
                     ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-29  9:52 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

2017-06-28 15:19 GMT+02:00 Wolfgang Denk <wd@denx.de>:
> As Wol already pointed out, you should use  smaartctl  to monitor
> the state of the disk drives, ideally on a regular base.  Changes
> (increases) of numbers like "Reallocated Sectors", ""Current Pending
> Sectors" or ""Offline Uncorrectable Sectors" are always suspicious.
> If they increase just by one, and then stay constant for weeks you
> can probably ignore it.  But if you see I/O errors in the system
> logs and/or "Reallocated Sectors" increasing every few days then you
> should not wait much longer and replace the respective drive.

Sure, but smart is not always reliable.
My personal experience was that Patrol Read feature on LSI controllers [1]
saved me multiple times. During a patrol read some bad sectors where found
(probably in an unallocated part of the array, thus the OS doesn't
know anything about that).

I've proactive replaced the drive and the day after resync, another
disk failed totally.
Without "Patrol Read" that scrubs all disks, one by one (consistency
check is a different thing),
probably this server would fails totally with data loss

Linux kernels detect something bad only by accessing to the failed
sector. If that sector is not used,
kernel knows nothing. HW RAID like LSI are able to read the whole
disks, block by block, even the unused part,
detecting failures before something is trying to write to that sector.

If you have a bad sector on an unused part of the array, when you have
to rebuild due to another disk failure, you'll
hit a URE and the whole array is failed.

Simple example:

disk0 has sector X (unused) failed.
It's unused, thus, kernel knows nothing about that and is operting
normally, no warning message or anything. If you don't access sector
X, you wont be notified.

Now, disk1 hard-fail. You have to replace that.
During the resync, you have to resync the whole array, but disk0,
sectorX is unreadable.
The resync will fail and the whole array is down.

Am I missing something?

[1] http://www.dell.com/downloads/global/power/ps1q06-20050212-Habas.pdf

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29  9:52 ` Gandalf Corvotempesta
@ 2017-06-29 10:10   ` Reindl Harald
  2017-06-29 10:14     ` Gandalf Corvotempesta
  2017-06-29 10:14   ` Andreas Klauer
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Reindl Harald @ 2017-06-29 10:10 UTC (permalink / raw)
  To: Gandalf Corvotempesta, Wolfgang Denk; +Cc: linux-raid



Am 29.06.2017 um 11:52 schrieb Gandalf Corvotempesta:
> Linux kernels detect something bad only by accessing to the failed
> sector. If that sector is not used,
> kernel knows nothing. HW RAID like LSI are able to read the whole
> disks, block by block, even the unused part,
> detecting failures before something is trying to write to that sector
> 
> If you have a bad sector on an unused part of the array, when you have
> to rebuild due to another disk failure, you'll
> hit a URE and the whole array is failed

there is nothing like "unused part of the array" since the 
linux-raid-layer knows nothing about the filesystem on top and hence a 
raid-check (scrub) reads every block as said hardware controller

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29  9:52 ` Gandalf Corvotempesta
  2017-06-29 10:10   ` Reindl Harald
@ 2017-06-29 10:14   ` Andreas Klauer
  2017-06-29 10:14   ` Mateusz Korniak
  2017-06-29 10:20   ` Mateusz Korniak
  3 siblings, 0 replies; 23+ messages in thread
From: Andreas Klauer @ 2017-06-29 10:14 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Wolfgang Denk, linux-raid

On Thu, Jun 29, 2017 at 11:52:01AM +0200, Gandalf Corvotempesta wrote:
> disk0 has sector X (unused) failed.
> It's unused, thus, kernel knows nothing about that and is operting
> normally, no warning message or anything. If you don't access sector
> X, you wont be notified.
> 
> Now, disk1 hard-fail. You have to replace that.
> During the resync, you have to resync the whole array, but disk0,
> sectorX is unreadable.
> The resync will fail and the whole array is down.

> Am I missing something?

Not really. It's just that you have to set up the monitoring yourself, 
whichever way you feel comfortable with.

SMART has a selftest feature which causes the disk to read sectors.
You can do whole disk at once (long selftest) or in segments 
(selective selftest). I prefer the selective since that allows you 
to place the selftest in the time window of least activity.

Instead of spending an entire day (or two) testing the whole drive 
you can put in an hour or two of testing every night and have it 
cover the entire drive over X days.

mdadm can also perform RAID checks, reading everything including 
parity, RAID layer would attempt to fix read errors then, and you 
can also check mismatch_cnt.

The mdadm checks can also done region by region to distribute 
load over several days but I think it's still not a direct option 
for mdadm, the region can be set via /proc or /sys somewhere...

Both smartmontools and mdadm should be set up to run such checks 
periodically, and instantly notify you by email if any problem occurs.

If a disk has problems, replace it, otherwise it's a gamble. 
Whatever promises RAID makes regarding redundancy, it always 
assumes the other drives to work 100%.

It's very unlikely to encounter read errors during rebuild if 
you ran regular checks and didn't forcibly keep bad drives.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:10   ` Reindl Harald
@ 2017-06-29 10:14     ` Gandalf Corvotempesta
  2017-06-29 10:37       ` Reindl Harald
  2017-06-29 14:28       ` Wols Lists
  0 siblings, 2 replies; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-29 10:14 UTC (permalink / raw)
  To: Reindl Harald; +Cc: Wolfgang Denk, linux-raid

2017-06-29 12:10 GMT+02:00 Reindl Harald <h.reindl@thelounge.net>:
> there is nothing like "unused part of the array" since the linux-raid-layer
> knows nothing about the filesystem on top and hence a raid-check (scrub)
> reads every block as said hardware controller

Yes, this during a scrub.
Without a scrub, kernel know nothing about an unused part of the array.

Let's assume a unreadable sector is found during a scrub. What
happens? mdadm kick-out the disk from the array? It tried to
reallocate the failed sector somewhere keeping the disk up and running
(and thus, resolving the URE that would prevent a successful resync?)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29  9:52 ` Gandalf Corvotempesta
  2017-06-29 10:10   ` Reindl Harald
  2017-06-29 10:14   ` Andreas Klauer
@ 2017-06-29 10:14   ` Mateusz Korniak
  2017-06-29 10:16     ` Gandalf Corvotempesta
  2017-06-29 10:20   ` Mateusz Korniak
  3 siblings, 1 reply; 23+ messages in thread
From: Mateusz Korniak @ 2017-06-29 10:14 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Wolfgang Denk, linux-raid

On Thursday 29 of June 2017 11:52:01 Gandalf Corvotempesta wrote:
> During a patrol read some bad sectors where found
> (probably in an unallocated part of the array, thus the OS doesn't
> know anything about that).

One can read of all disks/partitions used by RAIDs by script
/usr/share/mdadm/checkarray 

In many distros it is set to be performed monthly out-of-box.

-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
 	krótko mówiąc - podpora społeczeństwa."
				Nikos Kazantzakis - "Grek Zorba"


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:14   ` Mateusz Korniak
@ 2017-06-29 10:16     ` Gandalf Corvotempesta
  2017-06-29 14:33       ` Wols Lists
  0 siblings, 1 reply; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-29 10:16 UTC (permalink / raw)
  To: Mateusz Korniak; +Cc: Wolfgang Denk, linux-raid

2017-06-29 12:14 GMT+02:00 Mateusz Korniak <mateusz-lists@ant.gliwice.pl>:
> One can read of all disks/partitions used by RAIDs by script
> /usr/share/mdadm/checkarray
>
> In many distros it is set to be performed monthly out-of-box.

I know this and I'm running weekly. (just to be sure)
What Is unclear to me is why an hardware raid controller like LSI, has
two different kind of checks: patrol read and consistency check.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29  9:52 ` Gandalf Corvotempesta
                     ` (2 preceding siblings ...)
  2017-06-29 10:14   ` Mateusz Korniak
@ 2017-06-29 10:20   ` Mateusz Korniak
  2017-06-29 10:25     ` Gandalf Corvotempesta
  3 siblings, 1 reply; 23+ messages in thread
From: Mateusz Korniak @ 2017-06-29 10:20 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Wolfgang Denk, linux-raid

On Thursday 29 of June 2017 11:52:01 Gandalf Corvotempesta wrote:
> During a patrol read some bad sectors where found
> (probably in an unallocated part of the array, thus the OS doesn't
> know anything about that).

Also SMART long test should perform "read/verify scan of all of the user data 
area", so it is good idea to do it periodically and monitor results.

-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
 	krótko mówiąc - podpora społeczeństwa."
				Nikos Kazantzakis - "Grek Zorba"


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:20   ` Mateusz Korniak
@ 2017-06-29 10:25     ` Gandalf Corvotempesta
  2017-06-29 10:34       ` Reindl Harald
  0 siblings, 1 reply; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-29 10:25 UTC (permalink / raw)
  To: Mateusz Korniak; +Cc: Wolfgang Denk, linux-raid

2017-06-29 12:20 GMT+02:00 Mateusz Korniak <mateusz-lists@ant.gliwice.pl>:
> Also SMART long test should perform "read/verify scan of all of the user data
> area", so it is good idea to do it periodically and monitor results.

I'm already using this, but is not very reliable.
I had some disks failed by hw raid controller due to unreadable sector
(3/11) still passing the smart long test
and some failed smart long test still passing the hw raid check

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:25     ` Gandalf Corvotempesta
@ 2017-06-29 10:34       ` Reindl Harald
  0 siblings, 0 replies; 23+ messages in thread
From: Reindl Harald @ 2017-06-29 10:34 UTC (permalink / raw)
  To: Gandalf Corvotempesta, Mateusz Korniak; +Cc: Wolfgang Denk, linux-raid



Am 29.06.2017 um 12:25 schrieb Gandalf Corvotempesta:
> 2017-06-29 12:20 GMT+02:00 Mateusz Korniak <mateusz-lists@ant.gliwice.pl>:
>> Also SMART long test should perform "read/verify scan of all of the user data
>> area", so it is good idea to do it periodically and monitor results.
> 
> I'm already using this, but is not very reliable.
> I had some disks failed by hw raid controller due to unreadable sector
> (3/11) still passing the smart long test
> and some failed smart long test still passing the hw raid check

hence you do both - in any case, software-raid as well as hardware-raid

* scheduled smart check
* scheduled scrub

i had enough cases where while the scrub was running smartd alerted 
about a disk going bad - at the end of the day it don't matter which of 
both cries, the point is that you get (hopefully) a alert that you 
should replace a specific disk before a second one starts to go bad

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:14     ` Gandalf Corvotempesta
@ 2017-06-29 10:37       ` Reindl Harald
  2017-06-29 14:28       ` Wols Lists
  1 sibling, 0 replies; 23+ messages in thread
From: Reindl Harald @ 2017-06-29 10:37 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Wolfgang Denk, linux-raid



Am 29.06.2017 um 12:14 schrieb Gandalf Corvotempesta:
> 2017-06-29 12:10 GMT+02:00 Reindl Harald <h.reindl@thelounge.net>:
>> there is nothing like "unused part of the array" since the linux-raid-layer
>> knows nothing about the filesystem on top and hence a raid-check (scrub)
>> reads every block as said hardware controller
> 
> Yes, this during a scrub.
> Without a scrub, kernel know nothing about an unused part of the array.
> 
> Let's assume a unreadable sector is found during a scrub. What
> happens? mdadm kick-out the disk from the array? It tried to
> reallocate the failed sector somewhere keeping the disk up and running
> (and thus, resolving the URE that would prevent a successful resync?)

it tries to re-write the data block (depending if repair is enabled)

normally the disk firmware itself is aware of that event and realloctes 
a spare block so the rewrite just suceeds until the disk runs out of 
spare sectors

but well, even if it just kicks out the disk soon enough you can replace 
it, the array rebuilds and nothing happened - without smart-check/scrub 
sooner or later it's detected due rebuild after a disk failed and the 
array is gone

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:14     ` Gandalf Corvotempesta
  2017-06-29 10:37       ` Reindl Harald
@ 2017-06-29 14:28       ` Wols Lists
  1 sibling, 0 replies; 23+ messages in thread
From: Wols Lists @ 2017-06-29 14:28 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

On 29/06/17 11:14, Gandalf Corvotempesta wrote:
> 2017-06-29 12:10 GMT+02:00 Reindl Harald <h.reindl@thelounge.net>:
>> there is nothing like "unused part of the array" since the linux-raid-layer
>> knows nothing about the filesystem on top and hence a raid-check (scrub)
>> reads every block as said hardware controller
> 
> Yes, this during a scrub.
> Without a scrub, kernel know nothing about an unused part of the array.
> 
> Let's assume a unreadable sector is found during a scrub. What
> happens? mdadm kick-out the disk from the array? It tried to
> reallocate the failed sector somewhere keeping the disk up and running
> (and thus, resolving the URE that would prevent a successful resync?)

As I said, read up on disk failure modes. If mdadm finds an unreadable
sector, it does a full stripe read, recalculates the unreadable sector,
and writes it back.

If the read failed because the magnetism has decayed, this will reset it
and everything will be hunky-dory again.

If the read failed because the magnetic layer is failing, it will
trigger a relocate (which SMART should pick up) and alarm bells should
start ringing. It might be benign, the magnetic properties of the layer
might be failing such that that bit can no longer record properly, or it
could be that the layer itself is starting to fail, at which point you
are heading for platter failure, a head crash, and serious pain.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 10:16     ` Gandalf Corvotempesta
@ 2017-06-29 14:33       ` Wols Lists
  2017-06-30 12:35         ` Gandalf Corvotempesta
  0 siblings, 1 reply; 23+ messages in thread
From: Wols Lists @ 2017-06-29 14:33 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

On 29/06/17 11:16, Gandalf Corvotempesta wrote:
> 2017-06-29 12:14 GMT+02:00 Mateusz Korniak <mateusz-lists@ant.gliwice.pl>:
>> One can read of all disks/partitions used by RAIDs by script
>> /usr/share/mdadm/checkarray
>>
>> In many distros it is set to be performed monthly out-of-box.
> 
> I know this and I'm running weekly. (just to be sure)
> What Is unclear to me is why an hardware raid controller like LSI, has
> two different kind of checks: patrol read and consistency check.

Because they're two completely different things.

A patrol check reads the entire disk. It doesn't give two hoots what the
data is, it just cares that the data can actually be retrieved from the
disk.

A consistency check, on the other hand, wants to make sure that the data
is correct. It will read both copies of a mirror and compare them. Or it
will read the data from 4/5/6, calculate the parities, then read them
from disk and compare them.

In other words, a patrol check looks for a failing disk. A consistency
check looks for corrupt data. (A consistency check does a patrol check
as a side effect, but you might not want to do just that, as it is
computationally much more expensive. You might want to do a patrol check
every day, and a consistency check of a weekend.)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-29 14:33       ` Wols Lists
@ 2017-06-30 12:35         ` Gandalf Corvotempesta
  2017-06-30 14:35           ` Phil Turmel
  0 siblings, 1 reply; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-30 12:35 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

2017-06-29 16:33 GMT+02:00 Wols Lists <antlists@youngman.org.uk>:
> In other words, a patrol check looks for a failing disk. A consistency
> check looks for corrupt data. (A consistency check does a patrol check
> as a side effect, but you might not want to do just that, as it is
> computationally much more expensive. You might want to do a patrol check
> every day, and a consistency check of a weekend.)

Ok, so, if resources are not a problem, someone could only run a
consistency check and totally skip the patrol check. Right ?

What about md reliability? There are many detractors out there.

https://bugzilla.kernel.org/show_bug.cgi?id=99171

One of the most common complaint is the absence of write-back cache,
and if you force a "writeback" you'll risk data loss in case of
unclean shutdown (power failure and so on)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-30 12:35         ` Gandalf Corvotempesta
@ 2017-06-30 14:35           ` Phil Turmel
  2017-06-30 19:56             ` Anthony Youngman
  0 siblings, 1 reply; 23+ messages in thread
From: Phil Turmel @ 2017-06-30 14:35 UTC (permalink / raw)
  To: Gandalf Corvotempesta, Wols Lists; +Cc: linux-raid

On 06/30/2017 08:35 AM, Gandalf Corvotempesta wrote:
> 2017-06-29 16:33 GMT+02:00 Wols Lists <antlists@youngman.org.uk>:
>> In other words, a patrol check looks for a failing disk. A consistency
>> check looks for corrupt data. (A consistency check does a patrol check
>> as a side effect, but you might not want to do just that, as it is
>> computationally much more expensive. You might want to do a patrol check
>> every day, and a consistency check of a weekend.)
> 
> Ok, so, if resources are not a problem, someone could only run a
> consistency check and totally skip the patrol check. Right ?

Hardware raid and MD raid are not directly comparable.  Based on your
description, patrol read and consistency check are separate functions in
your hardware raid.  (I don't do hardware raid, myself.)

In MD raid, you have a "check" scrub which reads all member devices to
find and rewrite any UREs (patrol read), while comparing mirrors/parity
for mismatches.  You end up with a "mismatch count" in sysfs when done.

You also have a "repair" scrub which also reads all data blocks in
parity arrays and first mirrors and write all parity and other mirrors.
This is only recommended when a "check" scrub finds mismatches, as this
type of scrub can miss developing bad sectors on parity blocks and other
mirrors.  (Drives can only *detect* failing sectors on *read*, and can
only relocate them on *write* *after* detection.)

> What about md reliability? There are many detractors out there.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=99171

That's hardly a great case.  Changing data in a chunk of memory in one
thread while another thread is writing it out is undefined behavior and
MD will give you back something that was there at the time of actual
write.  Do that with hardware raid and you will gain consistency, but
you'll still have jumbled data.  Real applications don't do this or they
have much bigger problems than raid mirror mismatches.

> One of the most common complaint is the absence of write-back cache,
> and if you force a "writeback" you'll risk data loss in case of
> unclean shutdown (power failure and so on)

The only good reason for hardware raid, in my opinion.  Balanced against
vendor lock-in, limited layout options, and a plethora of management
interfaces.

Phil

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-30 14:35           ` Phil Turmel
@ 2017-06-30 19:56             ` Anthony Youngman
  2017-07-01 13:42               ` Drew
  0 siblings, 1 reply; 23+ messages in thread
From: Anthony Youngman @ 2017-06-30 19:56 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

On 30/06/17 15:35, Phil Turmel wrote:
> The only good reason for hardware raid, in my opinion.  Balanced against
> vendor lock-in, limited layout options, and a plethora of management
> interfaces.

And lost data when your controller quits, and you can't find another one 
that implements the same sort of raid ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-30 19:56             ` Anthony Youngman
@ 2017-07-01 13:42               ` Drew
  2017-07-01 14:12                 ` Gandalf Corvotempesta
  0 siblings, 1 reply; 23+ messages in thread
From: Drew @ 2017-07-01 13:42 UTC (permalink / raw)
  To: Linux RAID Mailing List

On Fri, Jun 30, 2017 at 12:56 PM, Anthony Youngman
<antlists@youngman.org.uk> wrote:
> On 30/06/17 15:35, Phil Turmel wrote:
>>
>> The only good reason for hardware raid, in my opinion.  Balanced against
>> vendor lock-in, limited layout options, and a plethora of management
>> interfaces.
>
>
> And lost data when your controller quits, and you can't find another one
> that implements the same sort of raid ...

Back in the days of SCSI that was very true but ..

In the defense of the hw raid vendor I've worked with quite a bit
(LSI), I've found you can migrate RAID arrays from controller to
controller quite easily. So if your raid controller fails, you can
always grab the current available controller that supports your
required raid level. I've created arrays in the older 1068 controller
(PCI-X card) and easily migrated it to a newer 2008 or 2108 based
controller with no fuss. See's the older array as an 'external' array
and imports just fine.


-- 
Drew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-07-01 13:42               ` Drew
@ 2017-07-01 14:12                 ` Gandalf Corvotempesta
  2017-07-01 15:36                   ` Drew
  0 siblings, 1 reply; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-07-01 14:12 UTC (permalink / raw)
  To: Drew; +Cc: Linux RAID Mailing List

2017-07-01 15:42 GMT+02:00 Drew <drew.kay@gmail.com>:
> In the defense of the hw raid vendor I've worked with quite a bit
> (LSI), I've found you can migrate RAID arrays from controller to
> controller quite easily. So if your raid controller fails, you can
> always grab the current available controller that supports your
> required raid level. I've created arrays in the older 1068 controller
> (PCI-X card) and easily migrated it to a newer 2008 or 2108 based
> controller with no fuss. See's the older array as an 'external' array
> and imports just fine.

This is true for "real" LSI controllers.
But if you look at DELL specs, DELL is supporting only 1 generation to
1 generation migration.
DELL firmware is different from LSI, I'm unsure what could happen by
migrating a raid array
between multiple generations.

And in additiojn to this, DELL doesn't provide any kind of support to
move disks skipping more than 1 generation.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-07-01 14:12                 ` Gandalf Corvotempesta
@ 2017-07-01 15:36                   ` Drew
  0 siblings, 0 replies; 23+ messages in thread
From: Drew @ 2017-07-01 15:36 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Linux RAID Mailing List

> This is true for "real" LSI controllers.
> But if you look at DELL specs, DELL is supporting only 1 generation to
> 1 generation migration.
> DELL firmware is different from LSI, I'm unsure what could happen by
> migrating a raid array
> between multiple generations.
>
> And in additiojn to this, DELL doesn't provide any kind of support to
> move disks skipping more than 1 generation.

There's your problem. :-)

I have nothing good to say about DELL, and what I could say, well it's
nothing I can print in a public forum that wouldn't be expletive
laced.

"Real" LSI, or IBM/Lenovo branded LSI raid cards don't restrict their
users in such a fashion. I've taken IBM branded drives out of a failed
IBM/LSI controller, imported them into a genuine LSI controller, and
then ran it for years before the machine became too old and was
replaced. It's telling when you take the IBM stickers off their LSI
raid controllers and you see the original OEM markings still on the
card. Add to that how easy it is to reflash 'genuine' LSI firmware on
the IBM cards and the cards run slightly faster because all IBM
apparently inserted into the controller BIOS was interface code for
the BMC & IMM. :)

Try doing that with a PERC card. :P


-- 
Drew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-28 10:45 ` Johannes Truschnigg
@ 2017-07-06  3:31   ` NeilBrown
  0 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2017-07-06  3:31 UTC (permalink / raw)
  To: Johannes Truschnigg, Gandalf Corvotempesta; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

On Wed, Jun 28 2017, Johannes Truschnigg wrote:

> Hi Gandalf,
>
> On Wed, Jun 28, 2017 at 12:25:55PM +0200, Gandalf Corvotempesta wrote:
>> Hi to all
>> I always used hardwre raid but with my next server I would like to use mdadm.
>> 
>> Some questions:
>> 
>> 1) all raid controllers have proactive monitoring features, like
>> patrol read, consistency check and (more or less) some SMART
>> integration.
>> Any counterpart in mdadm?
>
> mdmon(8) is what you seek. Also, monitoring the kernel debug ringbuffer I can
> highly recommend.

Not mdmon(8).
Possible you mean "mdadm --monitor".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-28 10:25 Gandalf Corvotempesta
  2017-06-28 10:45 ` Johannes Truschnigg
@ 2017-06-28 12:43 ` Wols Lists
  1 sibling, 0 replies; 23+ messages in thread
From: Wols Lists @ 2017-06-28 12:43 UTC (permalink / raw)
  To: Gandalf Corvotempesta, linux-raid

On 28/06/17 11:25, Gandalf Corvotempesta wrote:
> Hi to all
> I always used hardwre raid but with my next server I would like to use mdadm.
> 
> Some questions:
> 
> 1) all raid controllers have proactive monitoring features, like
> patrol read, consistency check and (more or less) some SMART
> integration.
> Any counterpart in mdadm?
> 
> 2) thanks to this features, raid controller are usually able to detect
> disk issues before they cause data-loss. what about mdadm ?
> 
> How and when do you replace disks ? Based on which params? Do you
> always wait for a total failure before replacing the disk?

Not wise. mdadm has the --replace option which will copy a failing
drive. This ensures redundancy is not lost during a disk replacement
(unless other stuff goes wrong too).

You need to use stuff like SMART to monitor disk health, read up on
smartctl. Okay, disks often fail unexpectedly even when SMART says
they're healthy, but if things like the relocate count start climbing
it's an indication of trouble ...

Some people are very aggressive and replace disks at the first hint of
trouble. Other people only replace disks when things start going badly
wrong. Your call. The whole point of raid is to enable recovery when
things have otherwise gone irretrievably wrong, but it's best not to
push your luck that far as many people have found out ...
> 
> Is mdadm able to notify some possible bad-things before they happens ?

You probably need to turn on kernel logging. And monitor the logs!

Also keep an eye on /proc/mdstat.

I don't know what state xosview is in at the moment but that's my
favourite monitoring tool. Run it on the server with the array, use X to
display it on your local desktop. Last I checked, the raid monitoring
stuff was broken, but the author knows and was fixing it.
> 
> Many times in the past our raid controllers forced a bad sector
> reallocation during proactive tasks like patrol read. This saved me
> many times before. I've tried to not replace a disks when this
> reallocation was made (it was a test server) and after some weeks the
> disk failed totally.

Read up on how disks fail. If you tell mdadm to do a "scrub" it will
read the array from end to end. This should cause any dodgy sectors to
be rewritten. Note that this doesn't mean anything is wrong - just as
RAM decays and needs to be refreshed every few nanoseconds, so disk
decays and needs to be refreshed every few years. It's only when the
magnetic coating begins to physically decay that you need to worry about
the health of the disk on that score.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Disk Monitoring
  2017-06-28 10:25 Gandalf Corvotempesta
@ 2017-06-28 10:45 ` Johannes Truschnigg
  2017-07-06  3:31   ` NeilBrown
  2017-06-28 12:43 ` Wols Lists
  1 sibling, 1 reply; 23+ messages in thread
From: Johannes Truschnigg @ 2017-06-28 10:45 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

Hi Gandalf,

On Wed, Jun 28, 2017 at 12:25:55PM +0200, Gandalf Corvotempesta wrote:
> Hi to all
> I always used hardwre raid but with my next server I would like to use mdadm.
> 
> Some questions:
> 
> 1) all raid controllers have proactive monitoring features, like
> patrol read, consistency check and (more or less) some SMART
> integration.
> Any counterpart in mdadm?

mdmon(8) is what you seek. Also, monitoring the kernel debug ringbuffer I can
highly recommend.


> 2) thanks to this features, raid controller are usually able to detect
> disk issues before they cause data-loss. what about mdadm ?
> 
> How and when do you replace disks ? Based on which params? Do you
> always wait for a total failure before replacing the disk?
> 
> Is mdadm able to notify some possible bad-things before they happens ?

md doesn't do low-level management of block devices/disks; that's the job of
other parts of the kernel. The block layer will report errors that you may
want to act upon before md itself complains and/or the disk gets kicked from
its array (which renders your array degraded, but otherwise operational), but
that's usually not necessary.

There's generally no need to replace a disk without any indication of serious
problems (like it getting booted from the array due to I/O timeouts, for
instance).


> Many times in the past our raid controllers forced a bad sector
> reallocation during proactive tasks like patrol read. This saved me
> many times before. I've tried to not replace a disks when this
> reallocation was made (it was a test server) and after some weeks the
> disk failed totally.

You can initiate the resilvering of array data via sysfs; check md(4) for
details.


-- 
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Disk Monitoring
@ 2017-06-28 10:25 Gandalf Corvotempesta
  2017-06-28 10:45 ` Johannes Truschnigg
  2017-06-28 12:43 ` Wols Lists
  0 siblings, 2 replies; 23+ messages in thread
From: Gandalf Corvotempesta @ 2017-06-28 10:25 UTC (permalink / raw)
  To: linux-raid

Hi to all
I always used hardwre raid but with my next server I would like to use mdadm.

Some questions:

1) all raid controllers have proactive monitoring features, like
patrol read, consistency check and (more or less) some SMART
integration.
Any counterpart in mdadm?

2) thanks to this features, raid controller are usually able to detect
disk issues before they cause data-loss. what about mdadm ?

How and when do you replace disks ? Based on which params? Do you
always wait for a total failure before replacing the disk?

Is mdadm able to notify some possible bad-things before they happens ?

Many times in the past our raid controllers forced a bad sector
reallocation during proactive tasks like patrol read. This saved me
many times before. I've tried to not replace a disks when this
reallocation was made (it was a test server) and after some weeks the
disk failed totally.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-07-06  3:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 13:19 Disk Monitoring Wolfgang Denk
2017-06-29  9:52 ` Gandalf Corvotempesta
2017-06-29 10:10   ` Reindl Harald
2017-06-29 10:14     ` Gandalf Corvotempesta
2017-06-29 10:37       ` Reindl Harald
2017-06-29 14:28       ` Wols Lists
2017-06-29 10:14   ` Andreas Klauer
2017-06-29 10:14   ` Mateusz Korniak
2017-06-29 10:16     ` Gandalf Corvotempesta
2017-06-29 14:33       ` Wols Lists
2017-06-30 12:35         ` Gandalf Corvotempesta
2017-06-30 14:35           ` Phil Turmel
2017-06-30 19:56             ` Anthony Youngman
2017-07-01 13:42               ` Drew
2017-07-01 14:12                 ` Gandalf Corvotempesta
2017-07-01 15:36                   ` Drew
2017-06-29 10:20   ` Mateusz Korniak
2017-06-29 10:25     ` Gandalf Corvotempesta
2017-06-29 10:34       ` Reindl Harald
  -- strict thread matches above, loose matches on Subject: below --
2017-06-28 10:25 Gandalf Corvotempesta
2017-06-28 10:45 ` Johannes Truschnigg
2017-07-06  3:31   ` NeilBrown
2017-06-28 12:43 ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.