All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Neil Brown <neilb@suse.de>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	Peter Rabbitson <rabbit+list@rabbit.us>,
	Goswin von Brederlow <goswin-v-b@web.de>,
	Doug Ledford <dledford@redhat.com>,
	Michael Evans <mjevans1983@gmail.com>,
	Eyal Lebedinsky <eyal@eyal.emu.id.au>,
	linux-raid list <linux-raid@vger.kernel.org>
Subject: Re: mismatch_cnt again
Date: Fri, 13 Nov 2009 06:30:43 +0100	[thread overview]
Message-ID: <87k4xvxa7w.fsf@frosties.localdomain> (raw)
In-Reply-To: <19196.50782.113024.239657@notabene.brown> (Neil Brown's message of "Fri, 13 Nov 2009 13:37:18 +1100")

Neil Brown <neilb@suse.de> writes:

> On Tuesday November 10, piergiorgio.sartor@nexgo.de wrote:
>> Hi again,
>> 
>> > It seems we might have been talking at cross-purposes.
>> > 
>> > When I wrote about the need for a threat model, it was in the
>> > context of automatically determining which block was most
>> > likely to be in error (e.g. voting with a 3-drive RAID1 or
>> > fancy arithmetic with RAID6).  I do not believe there is any
>> > value in doing that.  At least not automatically in the kernel
>> > with the aim of just repairing which block was decided to be
>> > most wrong.
>> > 
>> > You now seem to be talking about the ability to find out which
>> > blocks are inconsistent.  That is very different.  I do agree there
>> > is value in that.  Maybe it should appear in the kernel logs,
>> > or maybe we could store the information and report in via sysfs
>> > (the former would certainly be easier).
>> 
>> maybe there is a misunderstanding between us! :-)
>> 
>> Automatic repair *might* be a far end target, but I do
>> agree, this needs to be clarified deeply.
>> 
>> I see the thing similarly to a previous comment from a
>> fellow poster.
>> To do:
>> 1) detect which MD block is inconsistent
>> 2) detect, when possible, which device component is responsible
>> 3) trigger a repair action
>> 
>> This would be done all under user control, i.e. the user
>> will get the mismatch count, maybe with some hint on which
>> device could be guilty (RAID-6 or RAID-1/10 with multiple
>> redundancy) and then he could decide what to do.
>> 
>> The user will have full control and full *responsability*
>> on the action, but it will also be fully informed on what
>> the situation is.
>> 
>> The system will tell: block ABC is inconsistent, maybe
>> device /dev/sdX is guilty, you could: do nothing, resync
>> the parity, try to repair.
>
> I think just "block ABC is inconsistent" is sufficient.
> user-space can then quiesce that part of the array, read the relevant
> blocks, do any analysis that might be appropriate, and report to the
> admin. 

It is a begining. Eventualy I would like to see the guilty device in
the log though. That way the log can be analysed quickly and for
example a bad cable or failing drive will show up to be always the
guilty drive. Only makes sence for 3+ mirrors or raid6 though.

The repair should also determine the likely faulty block and rewrite
that instead of picking a random one. So you already need a "who is to
blame" function. The loging and repair can share the code.

>> As I mentioned some times ago, I built a RAID-6, where
>> one disk, due to a strange cabling problem, was sometimes
>> returning wrong data (one bit flip, actually).
>> And this without any errors reported, i.e. a bit was
>> sometimes flipped, at the very end it seems, and it
>> was undetected by ECC/CRC/whatever.
>
> That is a very interesting threat scenario - occasional bit flip on
> read between media and memory.  I had a drive like that once.  One
> particular bit in the sector would fairly often return '1' no matter
> what had been written.  I had it in a RAID1 and it quickly made a mess
> of the filesystem.

I had a external raid enclosure that would flip bits in the block
number data was read from or written too. With the box alone data
written to one file would suddenly appear in another file.

To make matters worse 2 enclosures where combined in a software raid1
giving the strangest errors. The file contents would randomly change
depending on which enclosure was used to read data.

Those errors do happen from time to time and will keep hapening.

> As you say, there is nothing that md can or should do about this
> except report that something odd is happening, which it does, and
> report where it is happening, which it does not.
>
> NeilBrown

MfG
        Goswin

  reply	other threads:[~2009-11-13  5:30 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-07  0:41 mismatch_cnt again Eyal Lebedinsky
2009-11-07  1:53 ` berk walker
2009-11-07  7:49   ` Eyal Lebedinsky
2009-11-07  8:08     ` Michael Evans
2009-11-07  8:42       ` Eyal Lebedinsky
2009-11-07 13:51       ` Goswin von Brederlow
2009-11-07 14:58         ` Doug Ledford
2009-11-07 16:23           ` Piergiorgio Sartor
2009-11-07 16:37             ` Doug Ledford
2009-11-07 22:25               ` Eyal Lebedinsky
2009-11-07 22:57                 ` Doug Ledford
2009-11-08 15:32             ` Goswin von Brederlow
2009-11-09 18:08               ` Bill Davidsen
2009-11-07 22:19           ` Eyal Lebedinsky
2009-11-07 22:58             ` Doug Ledford
2009-11-08 15:46           ` Goswin von Brederlow
2009-11-08 16:04             ` Piergiorgio Sartor
2009-11-09 18:22               ` Bill Davidsen
2009-11-09 21:50                 ` NeilBrown
2009-11-10 18:05                   ` Bill Davidsen
2009-11-10 22:17                     ` Peter Rabbitson
2009-11-13  2:15                     ` Neil Brown
2009-11-09 19:13               ` Goswin von Brederlow
2009-11-08 22:51             ` Peter Rabbitson
2009-11-09 18:56               ` Piergiorgio Sartor
2009-11-09 21:14                 ` NeilBrown
2009-11-09 21:54                   ` Piergiorgio Sartor
2009-11-10  0:17                     ` NeilBrown
2009-11-10  9:09                       ` Peter Rabbitson
2009-11-10 14:03                         ` Martin K. Petersen
2009-11-12 22:40                           ` Bill Davidsen
2009-11-13 17:12                             ` Martin K. Petersen
2009-11-14 17:01                               ` Bill Davidsen
2009-11-17  5:19                                 ` Martin K. Petersen
2009-11-14 19:04                               ` Goswin von Brederlow
2009-11-17  5:22                                 ` Martin K. Petersen
2009-11-10 19:52                       ` Piergiorgio Sartor
2009-11-13  2:37                         ` Neil Brown
2009-11-13  5:30                           ` Goswin von Brederlow [this message]
2009-11-13  9:33                           ` Peter Rabbitson
2009-11-15 21:05                           ` Piergiorgio Sartor
2009-11-15 22:29                             ` Guy Watkins
2009-11-16  1:23                               ` Goswin von Brederlow
2009-11-16  1:37                               ` Neil Brown
2009-11-16  5:21                                 ` Goswin von Brederlow
2009-11-16  5:35                                   ` Neil Brown
2009-11-16  7:40                                     ` Goswin von Brederlow
2009-11-12 22:57                       ` Bill Davidsen
2009-11-09 18:11           ` Bill Davidsen
2009-11-09 20:58             ` Doug Ledford
2009-11-09 22:03 ` Eyal Lebedinsky
2009-11-12 19:20 greg
2009-11-13  2:28 ` Neil Brown
2009-11-13  5:19   ` Goswin von Brederlow
2009-11-15  1:54   ` Bill Davidsen
2009-11-16 21:36 greg
2009-11-16 22:14 ` Neil Brown
2009-11-17  4:50   ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k4xvxa7w.fsf@frosties.localdomain \
    --to=goswin-v-b@web.de \
    --cc=dledford@redhat.com \
    --cc=eyal@eyal.emu.id.au \
    --cc=linux-raid@vger.kernel.org \
    --cc=mjevans1983@gmail.com \
    --cc=neilb@suse.de \
    --cc=piergiorgio.sartor@nexgo.de \
    --cc=rabbit+list@rabbit.us \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.