All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: Why does one get mismatches?
@ 2010-01-20 11:52 Jon Hardcastle
  2010-01-22 18:13 ` Goswin von Brederlow
  2010-02-01 21:18 ` Bill Davidsen
  0 siblings, 2 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-20 11:52 UTC (permalink / raw)
  To: linux-raid

--- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:

> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
> Subject: Why does one get mismatches?
> To: linux-raid@vger.kernel.org
> Date: Tuesday, 19 January, 2010, 10:04
> Hi,
> 
> I kicked off a check/repair cycle on my machine after i
> moved the phyiscal ordering of my drives around and I am now
> on my second check/repair cycle and it has kept finding
> mismatches.
> 
> Is it correct that the mismatch value after a repair was
> needed should equal the value present after a check? What if
> it doesn't? What does it mean if another check STILL reveals
> mismatches?
> 
> I had something similar after i reshaped from raid 5 to 6 i
> had to run check/repair/check/repair several times before i
> got my 0.
> 
> 

Guys,

Anyone got any suggestions here? I am now on my ~5 check/repair and after a reboot the first check is still returning 8.

All i have done is move the drives around. It is the same controllers/cables/etc 

I really dont like the seeming random nature of what can/does/has caused the mismatches?


      

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
@ 2010-01-22 18:13 ` Goswin von Brederlow
  2010-01-24 17:40   ` Jon Hardcastle
  2010-02-01 21:18 ` Bill Davidsen
  1 sibling, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-22 18:13 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid

Jon Hardcastle <jd_hardcastle@yahoo.com> writes:

> --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:
>
>> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
>> Subject: Why does one get mismatches?
>> To: linux-raid@vger.kernel.org
>> Date: Tuesday, 19 January, 2010, 10:04
>> Hi,
>> 
>> I kicked off a check/repair cycle on my machine after i
>> moved the phyiscal ordering of my drives around and I am now
>> on my second check/repair cycle and it has kept finding
>> mismatches.
>> 
>> Is it correct that the mismatch value after a repair was
>> needed should equal the value present after a check? What if
>> it doesn't? What does it mean if another check STILL reveals
>> mismatches?
>> 
>> I had something similar after i reshaped from raid 5 to 6 i
>> had to run check/repair/check/repair several times before i
>> got my 0.
>> 
>> 
>
> Guys,
>
> Anyone got any suggestions here? I am now on my ~5 check/repair and after a reboot the first check is still returning 8.
>
> All i have done is move the drives around. It is the same controllers/cables/etc 
>
> I really dont like the seeming random nature of what can/does/has caused the mismatches?

There is some unknown corruption going on with raid1 that causes
mismatches but it is believed that it will never occur on any used
block. Swapping is a likely cause.

Any swap device on the raid? Try turning that off.
If that doesn't help try umounting filesystems or remounting RO.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-22 18:13 ` Goswin von Brederlow
@ 2010-01-24 17:40   ` Jon Hardcastle
  2010-01-24 21:52     ` Roger Heflin
  2010-01-24 23:13     ` Goswin von Brederlow
  0 siblings, 2 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-24 17:40 UTC (permalink / raw)
  To: Jon, Goswin von Brederlow; +Cc: linux-raid

--- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:

> From: Goswin von Brederlow <goswin-v-b@web.de>
> Subject: Re: Fw: Why does one get mismatches?
> To: Jon@eHardcastle.com
> Cc: linux-raid@vger.kernel.org
> Date: Friday, 22 January, 2010, 18:13
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> writes:
> 
> > --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>
> wrote:
> >
> >> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
> >> Subject: Why does one get mismatches?
> >> To: linux-raid@vger.kernel.org
> >> Date: Tuesday, 19 January, 2010, 10:04
> >> Hi,
> >> 
> >> I kicked off a check/repair cycle on my machine
> after i
> >> moved the phyiscal ordering of my drives around
> and I am now
> >> on my second check/repair cycle and it has kept
> finding
> >> mismatches.
> >> 
> >> Is it correct that the mismatch value after a
> repair was
> >> needed should equal the value present after a
> check? What if
> >> it doesn't? What does it mean if another check
> STILL reveals
> >> mismatches?
> >> 
> >> I had something similar after i reshaped from raid
> 5 to 6 i
> >> had to run check/repair/check/repair several times
> before i
> >> got my 0.
> >> 
> >> 
> >
> > Guys,
> >
> > Anyone got any suggestions here? I am now on my ~5
> check/repair and after a reboot the first check is still
> returning 8.
> >
> > All i have done is move the drives around. It is the
> same controllers/cables/etc 
> >
> > I really dont like the seeming random nature of what
> can/does/has caused the mismatches?
> 
> There is some unknown corruption going on with raid1 that
> causes
> mismatches but it is believed that it will never occur on
> any used
> block. Swapping is a likely cause.
> 
> Any swap device on the raid? Try turning that off.
> If that doesn't help try umounting filesystems or
> remounting RO.
> 
> MfG
>         Goswin

Hello, my usual savior Goswin!

The deal is it is a 7 drive raid 6 array. it has LVM on it and is not used for swapping. I have umounted all LV's and still got mismatches, i run smartctl --test=long on all drives - nothing. I have now dismantled the array and am 3/4 the way through 'badblocks -svn' on each of the component drive. I have a hunch that it may be a dodgy SATA cable but have no evidence. No errors in log, nothing on dmesg.

Is there any way to get more information? I am starting to think this is more happened since i changed from raid 5 to 6..... which i did < 1 month ago.

The only lead i have is that whilst doing the bad blocks 1 drive ran at ~10~15MB/s whereas the rest are going at ~30 i have another identical model drive coming up so i will see if that one is slow too. But the lack of logging info is not helpful and worrying! and the prospect of silent corruption a big worry!


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-24 17:40   ` Jon Hardcastle
@ 2010-01-24 21:52     ` Roger Heflin
  2010-01-24 23:13     ` Goswin von Brederlow
  1 sibling, 0 replies; 104+ messages in thread
From: Roger Heflin @ 2010-01-24 21:52 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid

Jon Hardcastle wrote:
> --- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> 
>> From: Goswin von Brederlow <goswin-v-b@web.de>
>> Subject: Re: Fw: Why does one get mismatches?
>> To: Jon@eHardcastle.com
>> Cc: linux-raid@vger.kernel.org
>> Date: Friday, 22 January, 2010, 18:13
>> Jon Hardcastle <jd_hardcastle@yahoo.com>
>> writes:
>>
>>> --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>
>> wrote:
>>>> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
>>>> Subject: Why does one get mismatches?
>>>> To: linux-raid@vger.kernel.org
>>>> Date: Tuesday, 19 January, 2010, 10:04
>>>> Hi,
>>>>
>>>> I kicked off a check/repair cycle on my machine
>> after i
>>>> moved the phyiscal ordering of my drives around
>> and I am now
>>>> on my second check/repair cycle and it has kept
>> finding
>>>> mismatches.
>>>>
>>>> Is it correct that the mismatch value after a
>> repair was
>>>> needed should equal the value present after a
>> check? What if
>>>> it doesn't? What does it mean if another check
>> STILL reveals
>>>> mismatches?
>>>>
>>>> I had something similar after i reshaped from raid
>> 5 to 6 i
>>>> had to run check/repair/check/repair several times
>> before i
>>>> got my 0.
>>>>
>>>>
>>> Guys,
>>>
>>> Anyone got any suggestions here? I am now on my ~5
>> check/repair and after a reboot the first check is still
>> returning 8.
>>> All i have done is move the drives around. It is the
>> same controllers/cables/etc 
>>> I really dont like the seeming random nature of what
>> can/does/has caused the mismatches?
>>
>> There is some unknown corruption going on with raid1 that
>> causes
>> mismatches but it is believed that it will never occur on
>> any used
>> block. Swapping is a likely cause.
>>
>> Any swap device on the raid? Try turning that off.
>> If that doesn't help try umounting filesystems or
>> remounting RO.
>>
>> MfG
>>         Goswin
> 
> Hello, my usual savior Goswin!
> 
> The deal is it is a 7 drive raid 6 array. it has LVM on it and is not used for swapping. I have umounted all LV's and still got mismatches, i run smartctl --test=long on all drives - nothing. I have now dismantled the array and am 3/4 the way through 'badblocks -svn' on each of the component drive. I have a hunch that it may be a dodgy SATA cable but have no evidence. No errors in log, nothing on dmesg.
> 
> Is there any way to get more information? I am starting to think this is more happened since i changed from raid 5 to 6..... which i did < 1 month ago.
> 
> The only lead i have is that whilst doing the bad blocks 1 drive ran at ~10~15MB/s whereas the rest are going at ~30 i have another identical model drive coming up so i will see if that one is slow too. But the lack of logging info is not helpful and worrying! and the prospect of silent corruption a big worry!
> 

It is possible that the reads are somehow corrupting sometimes.

I have seen a couple of different controllers fail and result in read 
corruptions, basically you have 50 largish files or so on the disk 
with the same checksum (50xsize needs to be 2x greater than ram), and 
you cksum all of the files and see if the cksum changes, if it does 
the "bad" file will move around, so in this case the data on disk 
should be ok.   I have seen a couple of different companies controller 
fail this way, usually it is from a bad PCI interface chip or a bad 
config (too fast) causing PCI parity errors.  I had one controller 
fail (broken) and cause errors (replaced with spare corrected), and in 
the second case I found that the MB was running the PCI bus too faster 
for the number of cards (two different companies FC card fails--both 
in slightly different ways-one silently corrupted, the other crashed 
the machine about the time an error would have been expected), and had 
to slow the bus down one step (PCIX-133 -> PCIX-100, or PCIX-100 to 
PCIX-66) and the issue went away.

In both cases I did not find any write corruptions, but found read 
corruptions often, if you have this happening with a raid5 device it 
would be bad if you had to use parity (corrupt read would mean 
regenerated parity would be wrong, and restore from parity would lead 
to corrupted data).

I don't know how strong the internal SATA communication is, if it uses 
CRC's errors are almost impossible on the cable, if it uses parity 
errors are easy, the PCI bus uses parity, so it is pretty easy for 
errors to get through, but I have only seen them very rarely, maybe 5 
times in 10,000 years of machine operations (2000+ machines for 
several years).

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-24 17:40   ` Jon Hardcastle
  2010-01-24 21:52     ` Roger Heflin
@ 2010-01-24 23:13     ` Goswin von Brederlow
  2010-01-25 10:07       ` Jon Hardcastle
  1 sibling, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-24 23:13 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid

Jon Hardcastle <jd_hardcastle@yahoo.com> writes:

> --- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>
>> From: Goswin von Brederlow <goswin-v-b@web.de>
>> Subject: Re: Fw: Why does one get mismatches?
>> To: Jon@eHardcastle.com
>> Cc: linux-raid@vger.kernel.org
>> Date: Friday, 22 January, 2010, 18:13
>> Jon Hardcastle <jd_hardcastle@yahoo.com>
>> writes:
>> 
>> > --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>
>> wrote:
>> >
>> >> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
>> >> Subject: Why does one get mismatches?
>> >> To: linux-raid@vger.kernel.org
>> >> Date: Tuesday, 19 January, 2010, 10:04
>> >> Hi,
>> >> 
>> >> I kicked off a check/repair cycle on my machine
>> after i
>> >> moved the phyiscal ordering of my drives around
>> and I am now
>> >> on my second check/repair cycle and it has kept
>> finding
>> >> mismatches.
>> >> 
>> >> Is it correct that the mismatch value after a
>> repair was
>> >> needed should equal the value present after a
>> check? What if
>> >> it doesn't? What does it mean if another check
>> STILL reveals
>> >> mismatches?
>> >> 
>> >> I had something similar after i reshaped from raid
>> 5 to 6 i
>> >> had to run check/repair/check/repair several times
>> before i
>> >> got my 0.
>> >> 
>> >> 
>> >
>> > Guys,
>> >
>> > Anyone got any suggestions here? I am now on my ~5
>> check/repair and after a reboot the first check is still
>> returning 8.
>> >
>> > All i have done is move the drives around. It is the
>> same controllers/cables/etc 
>> >
>> > I really dont like the seeming random nature of what
>> can/does/has caused the mismatches?
>> 
>> There is some unknown corruption going on with raid1 that
>> causes
>> mismatches but it is believed that it will never occur on
>> any used
>> block. Swapping is a likely cause.
>> 
>> Any swap device on the raid? Try turning that off.
>> If that doesn't help try umounting filesystems or
>> remounting RO.
>> 
>> MfG
>>         Goswin
>
> Hello, my usual savior Goswin!
>
> The deal is it is a 7 drive raid 6 array. it has LVM on it and is not used for swapping. I have umounted all LV's and still got mismatches, i run smartctl --test=long on all drives - nothing. I have now dismantled the array and am 3/4 the way through 'badblocks -svn' on each of the component drive. I have a hunch that it may be a dodgy SATA cable but have no evidence. No errors in log, nothing on dmesg.
>
> Is there any way to get more information? I am starting to think this is more happened since i changed from raid 5 to 6..... which i did < 1 month ago.
>
> The only lead i have is that whilst doing the bad blocks 1 drive ran at ~10~15MB/s whereas the rest are going at ~30 i have another identical model drive coming up so i will see if that one is slow too. But the lack of logging info is not helpful and worrying! and the prospect of silent corruption a big worry!

You did run a repair pass and not just repeated check passes, right?
Check itself only counts the mismatches but does not correct them.
If the raid is unused (vgchange -a n) and you do first repair and then
check then that definetly should not find any mismatches.

MfG

        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-24 23:13     ` Goswin von Brederlow
@ 2010-01-25 10:07       ` Jon Hardcastle
  2010-01-25 10:37         ` Goswin von Brederlow
  0 siblings, 1 reply; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-25 10:07 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid


--- On Sun, 24/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:

> From: Goswin von Brederlow <goswin-v-b@web.de>
> Subject: Re: Fw: Why does one get mismatches?
> To: Jon@eHardcastle.com
> Cc: "Goswin von Brederlow" <goswin-v-b@web.de>, linux-raid@vger.kernel.org
> Date: Sunday, 24 January, 2010, 23:13
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> writes:
> 
> > --- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de>
> wrote:
> >
> >> From: Goswin von Brederlow <goswin-v-b@web.de>
> >> Subject: Re: Fw: Why does one get mismatches?
> >> To: Jon@eHardcastle.com
> >> Cc: linux-raid@vger.kernel.org
> >> Date: Friday, 22 January, 2010, 18:13
> >> Jon Hardcastle <jd_hardcastle@yahoo.com>
> >> writes:
> >> 
> >> > --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>
> >> wrote:
> >> >
> >> >> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
> >> >> Subject: Why does one get mismatches?
> >> >> To: linux-raid@vger.kernel.org
> >> >> Date: Tuesday, 19 January, 2010, 10:04
> >> >> Hi,
> >> >> 
> >> >> I kicked off a check/repair cycle on my
> machine
> >> after i
> >> >> moved the phyiscal ordering of my drives
> around
> >> and I am now
> >> >> on my second check/repair cycle and it
> has kept
> >> finding
> >> >> mismatches.
> >> >> 
> >> >> Is it correct that the mismatch value
> after a
> >> repair was
> >> >> needed should equal the value present
> after a
> >> check? What if
> >> >> it doesn't? What does it mean if another
> check
> >> STILL reveals
> >> >> mismatches?
> >> >> 
> >> >> I had something similar after i reshaped
> from raid
> >> 5 to 6 i
> >> >> had to run check/repair/check/repair
> several times
> >> before i
> >> >> got my 0.
> >> >> 
> >> >> 
> >> >
> >> > Guys,
> >> >
> >> > Anyone got any suggestions here? I am now on
> my ~5
> >> check/repair and after a reboot the first check is
> still
> >> returning 8.
> >> >
> >> > All i have done is move the drives around. It
> is the
> >> same controllers/cables/etc 
> >> >
> >> > I really dont like the seeming random nature
> of what
> >> can/does/has caused the mismatches?
> >> 
> >> There is some unknown corruption going on with
> raid1 that
> >> causes
> >> mismatches but it is believed that it will never
> occur on
> >> any used
> >> block. Swapping is a likely cause.
> >> 
> >> Any swap device on the raid? Try turning that
> off.
> >> If that doesn't help try umounting filesystems or
> >> remounting RO.
> >> 
> >> MfG
> >>         Goswin
> >
> > Hello, my usual savior Goswin!
> >
> > The deal is it is a 7 drive raid 6 array. it has LVM
> on it and is not used for swapping. I have umounted all LV's
> and still got mismatches, i run smartctl --test=long on all
> drives - nothing. I have now dismantled the array and am 3/4
> the way through 'badblocks -svn' on each of the component
> drive. I have a hunch that it may be a dodgy SATA cable but
> have no evidence. No errors in log, nothing on dmesg.
> >
> > Is there any way to get more information? I am
> starting to think this is more happened since i changed from
> raid 5 to 6..... which i did < 1 month ago.
> >
> > The only lead i have is that whilst doing the bad
> blocks 1 drive ran at ~10~15MB/s whereas the rest are going
> at ~30 i have another identical model drive coming up so i
> will see if that one is slow too. But the lack of logging
> info is not helpful and worrying! and the prospect of silent
> corruption a big worry!
> 
> You did run a repair pass and not just repeated check
> passes, right?
> Check itself only counts the mismatches but does not
> correct them.
> If the raid is unused (vgchange -a n) and you do first
> repair and then
> check then that definetly should not find any mismatches.
> 
> MfG
> 
>         Goswin

> 

Hello!

Yes, I have a simple script that first does a check, then if there are mismatches it does repair. I have then been manually rerunning a check and I keep getting mismatches. I goes like this 232, 8, 24, 8, 8, 16, 16, 24, 24, 8, 16, 24. But I have also done this manually and run several repairs in a row (assuming that will return 0 if no work is to be done)

Now the array is completely dismantled and I am running bad blocks on the drives but I am on the last 2 of the 7 drives and I still have no leads. No bad blocks, no offline uncorrectable, no pending sectors no dmesg errors no nothing. I have absolutely no leads what so ever.

The only thing i have left to try is a full Mem test and disconnect and reseat the additional sata controllers, oh and buy 7 new sata cables incase 1 is bad.

But it would be REALLY helpful to know on what drive the mismatches have occured.

Any help here would be gratefully received! I might even try converting the array back to raid 5 as i remember i had mismatches immediately after i converted from 5 to 6.


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-25 10:07       ` Jon Hardcastle
@ 2010-01-25 10:37         ` Goswin von Brederlow
  2010-01-25 10:52           ` Jon Hardcastle
  0 siblings, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-25 10:37 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid

Jon Hardcastle <jd_hardcastle@yahoo.com> writes:

> Now the array is completely dismantled and I am running bad blocks on the drives but I am on the last 2 of the 7 drives and I still have no leads. No bad blocks, no offline uncorrectable, no pending sectors no dmesg errors no nothing. I have absolutely no leads what so ever.
>
> The only thing i have left to try is a full Mem test and disconnect and reseat the additional sata controllers, oh and buy 7 new sata cables incase 1 is bad.

The problem with badblocks is that it writes the same pattern
everywhere. If the problem is that data gets read/written to the wrong
block then that will not show up.

Try formating each drive and run fstest [1] on it. Or some other test
that verifies data integrity using different patterns per block.

MfG
        Goswin

[1] http://mrvn.homeip.net/fstest/

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-25 10:37         ` Goswin von Brederlow
@ 2010-01-25 10:52           ` Jon Hardcastle
  2010-01-25 17:32             ` Goswin von Brederlow
  2010-01-25 19:32             ` Iustin Pop
  0 siblings, 2 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-25 10:52 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid

--- On Mon, 25/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:

> From: Goswin von Brederlow <goswin-v-b@web.de>
> Subject: Re: Fw: Why does one get mismatches?
> To: Jon@eHardcastle.com
> Cc: "Goswin von Brederlow" <goswin-v-b@web.de>, linux-raid@vger.kernel.org
> Date: Monday, 25 January, 2010, 10:37
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> writes:
> 
> > Now the array is completely dismantled and I am
> running bad blocks on the drives but I am on the last 2 of
> the 7 drives and I still have no leads. No bad blocks, no
> offline uncorrectable, no pending sectors no dmesg errors no
> nothing. I have absolutely no leads what so ever.
> >
> > The only thing i have left to try is a full Mem test
> and disconnect and reseat the additional sata controllers,
> oh and buy 7 new sata cables incase 1 is bad.
> 
> The problem with badblocks is that it writes the same
> pattern
> everywhere. If the problem is that data gets read/written
> to the wrong
> block then that will not show up.
> 
> Try formating each drive and run fstest [1] on it. Or some
> other test
> that verifies data integrity using different patterns per
> block.
> 
> MfG
>         Goswin
> 
> [1] http://mrvn.homeip.net/fstest/
> 

This is going to be a time consuming process as i'll have to remove and read from the array each drive 1 at a time then resync. 

Thanks for the link, but could a similar result be achieved with the -w option for badblocks? or perhaps a dd if=/dev/urandom? hmm scratch that the urandom wont work as you need to read AND write.

Just a worry as i clearly have mismatches and therefore corrupted data.



      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-25 10:52           ` Jon Hardcastle
@ 2010-01-25 17:32             ` Goswin von Brederlow
  2010-01-25 19:32             ` Iustin Pop
  1 sibling, 0 replies; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-25 17:32 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid

Jon Hardcastle <jd_hardcastle@yahoo.com> writes:

> --- On Mon, 25/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>
>> From: Goswin von Brederlow <goswin-v-b@web.de>
>> Subject: Re: Fw: Why does one get mismatches?
>> To: Jon@eHardcastle.com
>> Cc: "Goswin von Brederlow" <goswin-v-b@web.de>, linux-raid@vger.kernel.org
>> Date: Monday, 25 January, 2010, 10:37
>> Jon Hardcastle <jd_hardcastle@yahoo.com>
>> writes:
>> 
>> > Now the array is completely dismantled and I am
>> running bad blocks on the drives but I am on the last 2 of
>> the 7 drives and I still have no leads. No bad blocks, no
>> offline uncorrectable, no pending sectors no dmesg errors no
>> nothing. I have absolutely no leads what so ever.
>> >
>> > The only thing i have left to try is a full Mem test
>> and disconnect and reseat the additional sata controllers,
>> oh and buy 7 new sata cables incase 1 is bad.
>> 
>> The problem with badblocks is that it writes the same
>> pattern
>> everywhere. If the problem is that data gets read/written
>> to the wrong
>> block then that will not show up.
>> 
>> Try formating each drive and run fstest [1] on it. Or some
>> other test
>> that verifies data integrity using different patterns per
>> block.
>> 
>> MfG
>>         Goswin
>> 
>> [1] http://mrvn.homeip.net/fstest/
>> 
>
> This is going to be a time consuming process as i'll have to remove and read from the array each drive 1 at a time then resync. 
>
> Thanks for the link, but could a similar result be achieved with the -w option for badblocks? or perhaps a dd if=/dev/urandom? hmm scratch that the urandom wont work as you need to read AND write.
>
> Just a worry as i clearly have mismatches and therefore corrupted data.

No. You obviously should use the -w option in badblocks. Doing a
read-only test is completly pointless as the raid check already tested a
read of every block without errors (I assume). But -w will write one
pattern on the whole disk, then read and compare. Then repeat for the
next pattern. If the disk messes up the address of blocks then that
won't be detected. E.g. I had a raid enclosure that droped a bit in the
block address every once in a while. You get really interesting
corruption with that.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-25 10:52           ` Jon Hardcastle
  2010-01-25 17:32             ` Goswin von Brederlow
@ 2010-01-25 19:32             ` Iustin Pop
  1 sibling, 0 replies; 104+ messages in thread
From: Iustin Pop @ 2010-01-25 19:32 UTC (permalink / raw)
  To: Jon; +Cc: Goswin von Brederlow, linux-raid

On Mon, Jan 25, 2010 at 02:52:58AM -0800, Jon Hardcastle wrote:
> This is going to be a time consuming process as i'll have to remove
> and read from the array each drive 1 at a time then resync. 
> 
> Thanks for the link, but could a similar result be achieved with the
> -w option for badblocks? or perhaps a dd if=/dev/urandom? hmm scratch
> that the urandom wont work as you need to read AND write.
> 
> Just a worry as i clearly have mismatches and therefore corrupted
> data.

Just a comment from the 'benches' here: looking at all the tests you
have done, my personal opinion is that this is *not* HW problems of any
kind, but indeed some MD software issue. I've never seen such high
percentage of consistent and silent corruption in the hardware, and to
me it seems corruption in the software, *if at all*.

I would run a counter-test, to see at least if the 'check' test is
right:

- run your array until 'check' returns mismatches
- shutdown the array
- check that the contents of the drives is indeed different using
  something else than 'check' (e.g. checksum each 1MB block on the
  drives independently, and compare the checksum lists)
- if indeed there are diffs, start the array, run a repair (but no other
  traffic to the array)
- shutdown the array and re-run the external diff test

The above tests should tell you if: check is right, and if repair indeed
fixes the differences.

And another side-note: it would be really good if md had a debug option
to actually show the checksums for the differing blocks and their
offsets, to at least see if the same areas of the drive show differences
(it would be really funny if the diffs are, for example, in the MD
metadata :) (or does md already have something like this? I've stopped
using md a year or so ago).

regards,
iustin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Fw: Why does one get mismatches?
  2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
  2010-01-22 18:13 ` Goswin von Brederlow
@ 2010-02-01 21:18 ` Bill Davidsen
  2010-02-01 22:37   ` Neil Brown
  1 sibling, 1 reply; 104+ messages in thread
From: Bill Davidsen @ 2010-02-01 21:18 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid

Jon Hardcastle wrote:
> --- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:
>
>   
>> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
>> Subject: Why does one get mismatches?
>> To: linux-raid@vger.kernel.org
>> Date: Tuesday, 19 January, 2010, 10:04
>> Hi,
>>
>> I kicked off a check/repair cycle on my machine after i
>> moved the phyiscal ordering of my drives around and I am now
>> on my second check/repair cycle and it has kept finding
>> mismatches.
>>
>> Is it correct that the mismatch value after a repair was
>> needed should equal the value present after a check? What if
>> it doesn't? What does it mean if another check STILL reveals
>> mismatches?
>>
>> I had something similar after i reshaped from raid 5 to 6 i
>> had to run check/repair/check/repair several times before i
>> got my 0.
>>
>>
>>     
>
> Guys,
>
> Anyone got any suggestions here? I am now on my ~5 check/repair and after a reboot the first check is still returning 8.
>
> All i have done is move the drives around. It is the same controllers/cables/etc 
>
> I really dont like the seeming random nature of what can/does/has caused the mismatches?
>   

If you have an ext[34] filesystem on this array, try mounting it 
data=journal (yes it will slow down, this is a TEST). I did limited 
testing using this, and it appeared to solve the problem, at least for 
eight hours I had to test.

Comment: when there is a three way RAID-1, why doesn't repair *vote* on 
the correct value instead of just making a guess?

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-01 21:18 ` Bill Davidsen
@ 2010-02-01 22:37   ` Neil Brown
  2010-02-02 15:11     ` Bill Davidsen
  0 siblings, 1 reply; 104+ messages in thread
From: Neil Brown @ 2010-02-01 22:37 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Jon, linux-raid

On Mon, 01 Feb 2010 16:18:23 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> Comment: when there is a three way RAID-1, why doesn't repair *vote* on 
> the correct value instead of just making a guess?
> 

Because truth is not democratic.

(and I defy you to define "correct" in any general way in this context).

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-01 22:37   ` Neil Brown
@ 2010-02-02 15:11     ` Bill Davidsen
  2010-02-03 11:17       ` Goswin von Brederlow
  2010-02-11  5:14       ` Neil Brown
  0 siblings, 2 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-02 15:11 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jon, linux-raid

Neil Brown wrote:
> On Mon, 01 Feb 2010 16:18:23 -0500
> Bill Davidsen <davidsen@tmr.com> wrote:
>
>   
>> Comment: when there is a three way RAID-1, why doesn't repair *vote* on 
>> the correct value instead of just making a guess?
>>
>>     
>
> Because truth is not democratic.
>
> (and I defy you to define "correct" in any general way in this context).
>   

If you are willing to accept that the reconstructed data from RAID-[56] 
is "correct" then the data from RAID-1 majority opinion is "correct." If 
you say that such recovered data is the "most likely to match what was 
written," then data consistent on (N+1)/2 drives of a RAID-1 should be 
viewed in the same light. Call it "most likely to be correct" if you 
prefer, but picking a value from a drive at random is less likely.

This whole discussion simply shows that for RAID-1 software RAID is less 
reliable than hardware RAID (no, I don't mean fake-RAID), because it 
doesn't pin the data buffer until all copies are written.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-02 15:11     ` Bill Davidsen
@ 2010-02-03 11:17       ` Goswin von Brederlow
  2010-02-11  5:14       ` Neil Brown
  1 sibling, 0 replies; 104+ messages in thread
From: Goswin von Brederlow @ 2010-02-03 11:17 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Neil Brown, Jon, linux-raid

Bill Davidsen <davidsen@tmr.com> writes:

> Neil Brown wrote:
>> On Mon, 01 Feb 2010 16:18:23 -0500
>> Bill Davidsen <davidsen@tmr.com> wrote:
>>
>>
>>> Comment: when there is a three way RAID-1, why doesn't repair
>>> *vote* on the correct value instead of just making a guess?
>>>
>>>
>>
>> Because truth is not democratic.
>>
>> (and I defy you to define "correct" in any general way in this context).
>>
>
> If you are willing to accept that the reconstructed data from
> RAID-[56] is "correct" then the data from RAID-1 majority opinion is
> "correct." If you say that such recovered data is the "most likely to
> match what was written," then data consistent on (N+1)/2 drives of a
> RAID-1 should be viewed in the same light. Call it "most likely to be
> correct" if you prefer, but picking a value from a drive at random is
> less likely.
>
> This whole discussion simply shows that for RAID-1 software RAID is
> less reliable than hardware RAID (no, I don't mean fake-RAID), because
> it doesn't pin the data buffer until all copies are written.

Lets ignore the fact that software raid seems to write bad data
supposedly only to unused blocks for now. If the block is really unused
it doesn't mater what is done. And if it is used then software raid has
a big bug that needs to be fixed and not repaired after the fact.

So lets assume there actualy is a true mismatch because one of the
drives returns false data on read. Then in raid1/10 with >2 copies and
raid6 you have a way to detect the correct data, correct as in most
likely to be what was written originally. For raid6 that means detecting
the drive so that the rest still give a correct parity and for raid1/10
that means finding the majority. Say you have a 10 way raid1 with 9
blocks having the same data and one differs. Picking a random block is
wrong 10% of the time. Do you realy think that in 10% of the cases 9
disks will be corrupt in exactly the same way?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-02 15:11     ` Bill Davidsen
  2010-02-03 11:17       ` Goswin von Brederlow
@ 2010-02-11  5:14       ` Neil Brown
  2010-02-11 17:51         ` Bryan Mesich
  2010-02-11 18:12         ` Piergiorgio Sartor
  1 sibling, 2 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-11  5:14 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Jon, linux-raid

On Tue, 02 Feb 2010 10:11:03 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> Neil Brown wrote:
> > On Mon, 01 Feb 2010 16:18:23 -0500
> > Bill Davidsen <davidsen@tmr.com> wrote:
> >
> >   
> >> Comment: when there is a three way RAID-1, why doesn't repair *vote* on 
> >> the correct value instead of just making a guess?
> >>
> >>     
> >
> > Because truth is not democratic.
> >
> > (and I defy you to define "correct" in any general way in this context).
> >   
> 
> If you are willing to accept that the reconstructed data from RAID-[56] 
> is "correct" then the data from RAID-1 majority opinion is "correct." If 
> you say that such recovered data is the "most likely to match what was 
> written," then data consistent on (N+1)/2 drives of a RAID-1 should be 
> viewed in the same light. Call it "most likely to be correct" if you 
> prefer, but picking a value from a drive at random is less likely.
> 
> This whole discussion simply shows that for RAID-1 software RAID is less 
> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
> doesn't pin the data buffer until all copies are written.
> 

That doesn't make it less reliable.  It just makes it more confusing.

But for a more complete discussion on raid recovery and when it might be
sensible to "vote" among the blocks, see
   http://neil.brown.name/blog/20100211050355

NeilBrown


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-11  5:14       ` Neil Brown
@ 2010-02-11 17:51         ` Bryan Mesich
  2010-02-16 21:25           ` Bill Davidsen
  2010-02-11 18:12         ` Piergiorgio Sartor
  1 sibling, 1 reply; 104+ messages in thread
From: Bryan Mesich @ 2010-02-11 17:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Jon, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
> > This whole discussion simply shows that for RAID-1 software RAID is less 
> > reliable than hardware RAID (no, I don't mean fake-RAID), because it 
> > doesn't pin the data buffer until all copies are written.
> 
> That doesn't make it less reliable.  It just makes it more confusing.

I agree that linux software RAID is no less reliable than
hardware RAID with regards to the above conversation.  It's
however confusing to have a counter that indicates there are
problems with a RAID 1 array when in fact there is not.

I (and I'm sure others) value your expertise on this matter, but
it's hard to feel at ease when the car you're driving across the
country has the check engine light on.  In this case, I believe the 
mechanic when you say the car is okay, but it might be difficult 
for others to believe as I do.

I rely heavily on software RAID as I'm sure many others are.  I
believe this is quite evident in the amount of email that has been
circulated about the mismatch_cnt "problem".  IMO, a users perception 
of reliability is really the root of the problem in this case. No
one who depends on this stuff wants to see weakness.  Those who
do are going to be concerned.  Especially those who are running
distributions such as RedHat/Fedora that do weekly checks on the 
arrays. 

Neil, you had mentioned some time ago that you were going to
create a patch that would show where the mismatches were located
on disk.  Did you do this and if so where can I find the patch?

Bryan

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-11  5:14       ` Neil Brown
  2010-02-11 17:51         ` Bryan Mesich
@ 2010-02-11 18:12         ` Piergiorgio Sartor
  1 sibling, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-11 18:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Jon, linux-raid

Hi all,

> > This whole discussion simply shows that for RAID-1 software RAID is less 
> > reliable than hardware RAID (no, I don't mean fake-RAID), because it 
> > doesn't pin the data buffer until all copies are written.
> > 
> 
> That doesn't make it less reliable.  It just makes it more confusing.

well, sorry to say, but it makes it useless.

The problem is: how can we be sure that the FS really
plays tricks only with blocks which will be unused?

In other words, either there should be an agreed and
confirmed interface between caller (FS) and called (MD),
handling the situation properly (i.e. the FS will not
do these pranks), or the called (MD) should be robust
agains all possible nasty things the caller (FS) can do.

Because what will happen if someone introduces a new
FS which works fine with all, but software RAID?

Similarly, I've some, identical, PCs, with RAID-10 f2.

Starting with Fedora 12, there is a weekly check of
the RAID array (with email notification, BTW without
mismatch count...).

On these PCs I get mismatches, sometimes.
Checking the mismatch count I found out that this is
changing, sometimes a bit more, sometimes a bit less (o zero).

Now, IMHO the check is completely useless and even annoying.

I've got mismatches, changing, but I do not know how
serious these are.

Not good... I could have lost data or not, and I do
not know...

> But for a more complete discussion on raid recovery and when it might be
> sensible to "vote" among the blocks, see
>    http://neil.brown.name/blog/20100211050355
> 

Nice, discussion.
Expecially the clarification about the unclean shutdown event.
This could be, in effect, a killer for the majority select
(or RAID-6 reconstrunction) decision.

I personally agree with the conclusion of your conclusion.
Anyway, I miss, or I did not get, one more point.

Specifically, the "smart recovery" should be composed by
two steps. One is detecting where the problems are.
This means not only the stripe, but, in case of RAID-6,
also the *potential* component (HDD) of the array.

Reason is that, as I already wrote some times ago,
there is a *huge* difference between having all the
mismatches *potentially* on one single component, or
spread around several.

The first case clearly gives more information and allows
a better judgment of the situation.

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-11 17:51         ` Bryan Mesich
@ 2010-02-16 21:25           ` Bill Davidsen
  2010-02-16 21:38             ` Steven Haigh
  0 siblings, 1 reply; 104+ messages in thread
From: Bill Davidsen @ 2010-02-16 21:25 UTC (permalink / raw)
  To: Bryan Mesich, Neil Brown, Jon, linux-raid

Bryan Mesich wrote:
> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
>   
>>> This whole discussion simply shows that for RAID-1 software RAID is less 
>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
>>> doesn't pin the data buffer until all copies are written.
>>>       
>> That doesn't make it less reliable.  It just makes it more confusing.
>>     
>
> I agree that linux software RAID is no less reliable than
> hardware RAID with regards to the above conversation.  It's
> however confusing to have a counter that indicates there are
> problems with a RAID 1 array when in fact there is not.
>   

Sorry, but real hardware raid is more reliable than software raid, and 
Neil's justification for not doing smart recovery mentions it. Note this 
referes to real hardware raid, not fakeraid which is just some firmware 
in a BIOS to use the existing hardware.

The issue lies with data changing between write to multiple drives. In 
hardware raid the data traverses the memory bus once, only once, and 
goes into cache in the controller, from which it is written to all 
mirrored drives. With software raid an individual write is done to each 
drive, and if the data in the buffer changes between writes to one drive 
or the other you get different values. Neil may be convinced that the OS 
somehow "knows" which of the mirror copies is correct, ie. most recent, 
and never uses the stale data, but if that information was really 
available reads would always return the latest value and it wouldn't be 
possible to read the same file multiple times and get different MD5sums. 
It would also be possible to do a stable smart recovery by propagating 
the most recent copy to the other mirror drives.

I hoped that mounting data=journal would lead to consistency, that seems 
not to be true either.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-16 21:25           ` Bill Davidsen
@ 2010-02-16 21:38             ` Steven Haigh
  2010-02-17  3:19               ` Bryan Mesich
  2010-02-17 23:05               ` Neil Brown
  0 siblings, 2 replies; 104+ messages in thread
From: Steven Haigh @ 2010-02-16 21:38 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Bryan Mesich, Neil Brown, Jon, linux-raid

On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com>
wrote:
> Bryan Mesich wrote:
>> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
>>   
>>>> This whole discussion simply shows that for RAID-1 software RAID is
>>>> less
>>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
>>>> doesn't pin the data buffer until all copies are written.
>>>>       
>>> That doesn't make it less reliable.  It just makes it more confusing.
>>>     
>>
>> I agree that linux software RAID is no less reliable than
>> hardware RAID with regards to the above conversation.  It's
>> however confusing to have a counter that indicates there are
>> problems with a RAID 1 array when in fact there is not.
>>   
> 
> Sorry, but real hardware raid is more reliable than software raid, and 
> Neil's justification for not doing smart recovery mentions it. Note this

> referes to real hardware raid, not fakeraid which is just some firmware 
> in a BIOS to use the existing hardware.
> 
> The issue lies with data changing between write to multiple drives. In 
> hardware raid the data traverses the memory bus once, only once, and 
> goes into cache in the controller, from which it is written to all 
> mirrored drives. With software raid an individual write is done to each 
> drive, and if the data in the buffer changes between writes to one drive

> or the other you get different values. Neil may be convinced that the OS

> somehow "knows" which of the mirror copies is correct, ie. most recent, 
> and never uses the stale data, but if that information was really 
> available reads would always return the latest value and it wouldn't be 
> possible to read the same file multiple times and get different MD5sums.

> It would also be possible to do a stable smart recovery by propagating 
> the most recent copy to the other mirror drives.
> 
> I hoped that mounting data=journal would lead to consistency, that seems

> not to be true either.

I agree Bill, there is an issue with the software RAID1 when it comes down
to some hardware. I have one machine where the ONLY way to stop the root
filesystem going readonly due to journal issues is to remove RAID. Having
RAID1 enabled gives silent corruption of both data and the journal at
seemingly random times.

I can see the data corruption from running a verify between RPM and data
on the drive. Reinstalling these packages fixes things - until something
random things get corrupted next time.

The myth that data corruption in RAID1 ONLY happens to swap and/or unused
space on a drive is absolute rubbish.

-- 
Steven Haigh
 
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-16 21:38             ` Steven Haigh
@ 2010-02-17  3:19               ` Bryan Mesich
  2010-02-17 23:05               ` Neil Brown
  1 sibling, 0 replies; 104+ messages in thread
From: Bryan Mesich @ 2010-02-17  3:19 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Bill Davidsen, Neil Brown, Jon, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2397 bytes --]

On Wed, Feb 17, 2010 at 08:38:11AM +1100, Steven Haigh wrote:
> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com> wrote:
> > The issue lies with data changing between write to multiple drives. In 
> > hardware raid the data traverses the memory bus once, only once, and 
> > goes into cache in the controller, from which it is written to all 
> > mirrored drives. With software raid an individual write is done to each 
> > drive, and if the data in the buffer changes between writes to one drive
> > or the other you get different values. Neil may be convinced that the OS
> > somehow "knows" which of the mirror copies is correct, ie. most recent, 
> > and never uses the stale data, but if that information was really 
> > available reads would always return the latest value and it wouldn't be 
> > possible to read the same file multiple times and get different MD5sums.

[snip...]

> I agree Bill, there is an issue with the software RAID1 when it comes down
> to some hardware. I have one machine where the ONLY way to stop the root
> filesystem going readonly due to journal issues is to remove RAID. Having
> RAID1 enabled gives silent corruption of both data and the journal at
> seemingly random times.

Maybe I missed something earlier in this thread...and if so I apologize.
However, I was not aware of anyone reporting FS corruption due do software
RAID 1.  Needless to say, a serious problem if occurring.

At work, we use software RAID 1 on the majority of our production servers
and have never seen problems as you describe.  I'm not trying to
discredit you...just that we have had not seen similar results. 

> I can see the data corruption from running a verify between RPM and data
> on the drive. Reinstalling these packages fixes things - until something
> random things get corrupted next time.

For curiosity sake, what kind of files did RPM report as being corrupt
after running the verify?  The reason I ask as that I would expect user
data to be corrupt before system files as they are typically written to
disk at install/update  and never written to again.  Or maybe there is a
reason...correct me if I'm wrong ;)

In my last post, I asked Neil if he had a patch that would indicate where
the mis-matches exist on disk.  Have you found a way to correlate the
mis-matches with your FS corruption?  

Bryan

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-16 21:38             ` Steven Haigh
  2010-02-17  3:19               ` Bryan Mesich
@ 2010-02-17 23:05               ` Neil Brown
  2010-02-19 15:18                 ` Piergiorgio Sartor
  2010-02-24 14:46                 ` Bill Davidsen
  1 sibling, 2 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-17 23:05 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Bill Davidsen, Bryan Mesich, Jon, linux-raid

On Wed, 17 Feb 2010 08:38:11 +1100
Steven Haigh <netwiz@crc.id.au> wrote:

> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com>
> wrote:
> > Bryan Mesich wrote:
> >> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
> >>   
> >>>> This whole discussion simply shows that for RAID-1 software RAID is
> >>>> less
> >>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
> >>>> doesn't pin the data buffer until all copies are written.
> >>>>       
> >>> That doesn't make it less reliable.  It just makes it more confusing.
> >>>     
> >>
> >> I agree that linux software RAID is no less reliable than
> >> hardware RAID with regards to the above conversation.  It's
> >> however confusing to have a counter that indicates there are
> >> problems with a RAID 1 array when in fact there is not.
> >>   
> > 
> > Sorry, but real hardware raid is more reliable than software raid, and 
> > Neil's justification for not doing smart recovery mentions it. Note this
> 
> > referes to real hardware raid, not fakeraid which is just some firmware 
> > in a BIOS to use the existing hardware.
> > 
> > The issue lies with data changing between write to multiple drives. In 
> > hardware raid the data traverses the memory bus once, only once, and 
> > goes into cache in the controller, from which it is written to all 
> > mirrored drives. With software raid an individual write is done to each 
> > drive, and if the data in the buffer changes between writes to one drive
> 
> > or the other you get different values. Neil may be convinced that the OS
> 
> > somehow "knows" which of the mirror copies is correct, ie. most recent, 
> > and never uses the stale data, but if that information was really 
> > available reads would always return the latest value and it wouldn't be 
> > possible to read the same file multiple times and get different MD5sums.
> 
> > It would also be possible to do a stable smart recovery by propagating 
> > the most recent copy to the other mirror drives.
> > 
> > I hoped that mounting data=journal would lead to consistency, that seems
> 
> > not to be true either.
> 
> I agree Bill, there is an issue with the software RAID1 when it comes down
> to some hardware. I have one machine where the ONLY way to stop the root
> filesystem going readonly due to journal issues is to remove RAID. Having
> RAID1 enabled gives silent corruption of both data and the journal at
> seemingly random times.
> 
> I can see the data corruption from running a verify between RPM and data
> on the drive. Reinstalling these packages fixes things - until something
> random things get corrupted next time.

Sounds very much like dodgy drives.

> 
> The myth that data corruption in RAID1 ONLY happens to swap and/or unused
> space on a drive is absolute rubbish.
> 

Absolute rubbish does seem to be a suitable phrase here.
There is no question of data corruption.
When memory changes between being written to one device and to another, this
does not cause corruption, only inconsistency.   Either the block will be
written again consistently soon, or it will never be read.
If the host crashes before the blocks are made consistent, then the 
inconsistency will not be visible as the resync will fix it.

If you are getting any corruption, then it is NOT due to this facet of the
RAID1 implementation - it due to something else.
My guess is bad hardware - anywhere from memory to hard drive.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-17 23:05               ` Neil Brown
@ 2010-02-19 15:18                 ` Piergiorgio Sartor
  2010-02-19 22:02                   ` Neil Brown
  2010-02-24 14:46                 ` Bill Davidsen
  1 sibling, 1 reply; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-19 15:18 UTC (permalink / raw)
  To: Neil Brown; +Cc: Steven Haigh, Bill Davidsen, Bryan Mesich, Jon, linux-raid

Hi,

> When memory changes between being written to one device and to another, this
> does not cause corruption, only inconsistency.   Either the block will be
> written again consistently soon, or it will never be read.

well, is this for sure?
I mean, by design of the md subsystem.

Or it is like that because we trust the filesystem?

And why it is like that? Why not to use the good old
readers-writer mechanism to make sure all blocks are
the same, when they're are written (namely lock).

It seems to me, maybe I'm wrong, not a so safe design.

I assume, it should not be possible to cause this
situation, unless there is a crash or a bug in the
md layer.

What if a new filesystem will write a block, changing
on the fly, i.e. during RAID-1 writes, and then, later,
reading this block again?

It will get, maybe, not the correct data.

In other words, would it be better, for the md layer,
to be robust against these kind of threats?

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 15:18                 ` Piergiorgio Sartor
@ 2010-02-19 22:02                   ` Neil Brown
  2010-02-19 22:37                     ` Piergiorgio Sartor
                                       ` (3 more replies)
  0 siblings, 4 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-19 22:02 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Steven Haigh, Bill Davidsen, Bryan Mesich, Jon, linux-raid

On Fri, 19 Feb 2010 16:18:09 +0100
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:

> Hi,
> 
> > When memory changes between being written to one device and to another, this
> > does not cause corruption, only inconsistency.   Either the block will be
> > written again consistently soon, or it will never be read.
> 
> well, is this for sure?
> I mean, by design of the md subsystem.
> 
> Or it is like that because we trust the filesystem?

It is because we trust the filesystem.

> 
> And why it is like that? Why not to use the good old
> readers-writer mechanism to make sure all blocks are
> the same, when they're are written (namely lock).

md is not in a position to lock the page - there is simply no way it can stop
the filesystem from changing it.
The only thing it could do would be to make a copy, then write the copy out.
This would incur a performance cost.

> 
> It seems to me, maybe I'm wrong, not a so safe design.

I think you are wrong.

> 
> I assume, it should not be possible to cause this
> situation, unless there is a crash or a bug in the
> md layer.

I'm not sure what situation you are referring to...

> 
> What if a new filesystem will write a block, changing
> on the fly, i.e. during RAID-1 writes, and then, later,
> reading this block again?
> 
> It will get, maybe, not the correct data.

This is correct.  However it would be equally correct if you were talking
about s normal disk drive rather than a RAID1 pair.
If the filesystem changes the page (or allows it to change) while a write is
pending, then it cannot know what actual data was written.  So it must write
the block out again before it ever reads it in.
RAID1 is no different to any other device in this respect.


> 
> In other words, would it be better, for the md layer,
> to be robust against these kind of threats?
>

Possibly, but at what cost?
There are two ways that I can imagine to 'solve' this issue.

1/ always copy the page before writing.  This would incur a significant
  overhead, both in the complexity of pre-allocation memory and in the
  delay taken to perform the copy.  And it would very rarely be actually
  needed.
2/ Have the filesystem protect the page from changes while it is being
   written.  This is quite possible for the filesystem to do (while it
   is impossible for md to do).  There could be some performance
   cost with memory-mapped pages as they would need to be unmapped,
   but there would be no significant cost for reads, writes, and filesystem
   metadata operations.
   Further, any filesystem that wants to make use of the integrity checks
   that newer drives provide (where the filesystem provides a 'checksum' for
   the block which gets passed all the way down and written to storage, and
   returned on a read) will need to do this anyway.  So it is likely the in
   the near future all significant filesystems will provide all the
   guarantees md needs or order to simply do nothing different.

So my feeling is that md is doing the best thing already.

I believe 'swap' will always be an issue as unmapping swap pages during write
could be a serious performance cost.  It might be that the best thing to do
with swap is to somehow mark the area of an array used for swap as "don't
care" so md never bothers to resync it, and never reports inconsistencies
there, as they really are not an issue.

NeilBrown


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 22:02                   ` Neil Brown
@ 2010-02-19 22:37                     ` Piergiorgio Sartor
  2010-02-19 23:34                     ` Asdo
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-19 22:37 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Steven Haigh, Bill Davidsen, Bryan Mesich,
	Jon, linux-raid

Hi,

> > Or it is like that because we trust the filesystem?
> 
> It is because we trust the filesystem.

well, I hope the trust is not misplaced... :-)

> md is not in a position to lock the page - there is simply no way it can stop
> the filesystem from changing it.

How can this be?

> The only thing it could do would be to make a copy, then write the copy out.

Even making a copy would not be safe, since during
the copy the data could still change, or not?

> This would incur a performance cost.

It's a matter of deciding what is more important.

> > It seems to me, maybe I'm wrong, not a so safe design.
> 
> I think you are wrong.

Could be, I never heard of situations like this.
 
> > I assume, it should not be possible to cause this
> > situation, unless there is a crash or a bug in the
> > md layer.
> 
> I'm not sure what situation you are referring to...

It should not be possible to cause that different
mirrors of a RAID-1 end up with different data.

Otherwise, no point to have the mirroring.
 
> > What if a new filesystem will write a block, changing
> > on the fly, i.e. during RAID-1 writes, and then, later,
> > reading this block again?
> > 
> > It will get, maybe, not the correct data.
> 
> This is correct.  However it would be equally correct if you were talking
> about s normal disk drive rather than a RAID1 pair.

Nono, there is a huge difference.
In a single drive case, the FS is responsible of writing
rubbish to a single block. The result would be that a
block has "strange" data, but *always* the same data.

Here the situation is that the data might be "strange",
but different accesses, to the same block of the RAID-1,
could potentially return different data.

As a byproduct of this effect, the "check" functionality
becomes not so useful anymore.

> If the filesystem changes the page (or allows it to change) while a write is
> pending, then it cannot know what actual data was written.  So it must write
> the block out again before it ever reads it in.
> RAID1 is no different to any other device in this respect.

Is different, as mentioned above.

The FS could, intentionally, change the data during a write,
but later it could expect to have always the same data.

In other words, the FS does not guarantee the "spatial"
consistency of the data (the bytes in a block), but the
"temporal" consistency (successive reads return always
the same data) could be expected. And this happens in
case of a normal HDD. It does not happen in RAID-1.

> Possibly, but at what cost?

As I wrote: it is matter to decide what is more important
and useful.

> There are two ways that I can imagine to 'solve' this issue.
> 
> 1/ always copy the page before writing.  This would incur a significant
>   overhead, both in the complexity of pre-allocation memory and in the
>   delay taken to perform the copy.  And it would very rarely be actually
>   needed.

Does really a copy solve the issue? Is the copy done
in atomic way?
The pre-allocation does not seem to me to be a problem,
since it will be done once and for all (at device creation),
and not dynamically.
The copy *might* be an overhead, nevertheless I wonder if it
is really so much of a problem, expecially considering that,
after the copy, the MD layer can optimize the transaction
to the HDDs as much as it likes.

> 2/ Have the filesystem protect the page from changes while it is being
>    written.  This is quite possible for the filesystem to do (while it
>    is impossible for md to do).  There could be some performance

I'm really curious to understand what kind of thinking
is behind a design allowing such a situation...
I mean *system* design, not md design.

>    cost with memory-mapped pages as they would need to be unmapped,
>    but there would be no significant cost for reads, writes, and filesystem
>    metadata operations.
>    Further, any filesystem that wants to make use of the integrity checks
>    that newer drives provide (where the filesystem provides a 'checksum' for
>    the block which gets passed all the way down and written to storage, and
>    returned on a read) will need to do this anyway.  So it is likely the in
>    the near future all significant filesystems will provide all the
>    guarantees md needs or order to simply do nothing different.

That's good to know.
 
> So my feeling is that md is doing the best thing already.

I do not think this is an md issue, per se, it seems to me,
from the description, this is a overall design issue.

Normally, also for performance reasons, one approach is
to allocate queue(s) of buffers between two modules (like
FS and MD) and each of the modules has always *exclusive*
access to its own buffer(s), i.e. the buffer(s) it holds
in a certain time frame.
Once a module releases the buffer(s) this/these cannot be
anymore touched (read or write) by the module itself.
Once the buffer(s) arrive(s) to the other module, this
can do whatever it wants with it/them, and it is sure
it has exclusive access to it/them.

Normally real-time systems use techniques like this to
guarantee consistency *and* performances.

Anyway, thanks for the clarifications,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 22:02                   ` Neil Brown
  2010-02-19 22:37                     ` Piergiorgio Sartor
@ 2010-02-19 23:34                     ` Asdo
  2010-02-20  4:27                       ` Goswin von Brederlow
  2010-02-20  4:23                     ` Goswin von Brederlow
  2010-02-24 14:54                     ` Bill Davidsen
  3 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-02-19 23:34 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Steven Haigh, Bill Davidsen, Bryan Mesich,
	Jon, linux-raid

Thank you for your explanation Neil,

Neil Brown wrote:
> When memory changes between being written to one device and to another, this
> does not cause corruption, only inconsistency. Either the block will be
> written again consistently soon, or it will never be read.

This is the crucial part...

Why would the filesystem reuse the same memory without rewriting the 
*same* block?

Can the same memory area be used for another block?
If yes, I understand. If not, no I don't understand why the block is not 
eventually rewritten to contain equal data on both disks.

Is this a power-fail-in-the-middle thing, or it can happen even when the 
power is always on?

Do I understand correctly that raid-456 is instead safe ("no-mismatch") 
because it copies the memory region?

Thank you

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 22:02                   ` Neil Brown
  2010-02-19 22:37                     ` Piergiorgio Sartor
  2010-02-19 23:34                     ` Asdo
@ 2010-02-20  4:23                     ` Goswin von Brederlow
  2010-02-24 14:54                     ` Bill Davidsen
  3 siblings, 0 replies; 104+ messages in thread
From: Goswin von Brederlow @ 2010-02-20  4:23 UTC (permalink / raw)
  To: linux-raid

Neil Brown <neilb@suse.de> writes:

> On Fri, 19 Feb 2010 16:18:09 +0100
> Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:
>
>> Hi,
>> 
>> > When memory changes between being written to one device and to another, this
>> > does not cause corruption, only inconsistency.   Either the block will be
>> > written again consistently soon, or it will never be read.
>> 
>> well, is this for sure?
>> I mean, by design of the md subsystem.
>> 
>> Or it is like that because we trust the filesystem?
>
> It is because we trust the filesystem.
>
>> 
>> And why it is like that? Why not to use the good old
>> readers-writer mechanism to make sure all blocks are
>> the same, when they're are written (namely lock).
>
> md is not in a position to lock the page - there is simply no way it can stop
> the filesystem from changing it.
> The only thing it could do would be to make a copy, then write the copy out.
> This would incur a performance cost.
>
>> 
>> It seems to me, maybe I'm wrong, not a so safe design.
>
> I think you are wrong.

No, he is right. The safe design is to copy or at least copy-on-write
the page. Maybe this could be configurable so people can choose between
really safe and fast.

>> I assume, it should not be possible to cause this
>> situation, unless there is a crash or a bug in the
>> md layer.
>
> I'm not sure what situation you are referring to...
>
>> 
>> What if a new filesystem will write a block, changing
>> on the fly, i.e. during RAID-1 writes, and then, later,
>> reading this block again?
>> 
>> It will get, maybe, not the correct data.
>
> This is correct.  However it would be equally correct if you were talking
> about s normal disk drive rather than a RAID1 pair.
> If the filesystem changes the page (or allows it to change) while a write is
> pending, then it cannot know what actual data was written.  So it must write
> the block out again before it ever reads it in.
> RAID1 is no different to any other device in this respect.
>
>
>> 
>> In other words, would it be better, for the md layer,
>> to be robust against these kind of threats?
>>
>
> Possibly, but at what cost?
> There are two ways that I can imagine to 'solve' this issue.
>
> 1/ always copy the page before writing.  This would incur a significant
>   overhead, both in the complexity of pre-allocation memory and in the
>   delay taken to perform the copy.  And it would very rarely be actually
>   needed.
> 2/ Have the filesystem protect the page from changes while it is being
>    written.  This is quite possible for the filesystem to do (while it
>    is impossible for md to do).  There could be some performance
>    cost with memory-mapped pages as they would need to be unmapped,
>    but there would be no significant cost for reads, writes, and filesystem
>    metadata operations.
>    Further, any filesystem that wants to make use of the integrity checks
>    that newer drives provide (where the filesystem provides a 'checksum' for
>    the block which gets passed all the way down and written to storage, and
>    returned on a read) will need to do this anyway.  So it is likely the in
>    the near future all significant filesystems will provide all the
>    guarantees md needs or order to simply do nothing different.
>
> So my feeling is that md is doing the best thing already.
>
> I believe 'swap' will always be an issue as unmapping swap pages during write
> could be a serious performance cost.  It might be that the best thing to do
> with swap is to somehow mark the area of an array used for swap as "don't
> care" so md never bothers to resync it, and never reports inconsistencies
> there, as they really are not an issue.
>
> NeilBrown

Or one could turn on the copy/copy-on-write mode at least during the
test.

I'm also not convinced performance of swap is an issue. Swap speed is
already many magnitudes lower than real memory making any relevant use
of swap prohibitive. I certainly would not care one bit or another if
swapping gets 50% slower. I do care about not having a mismatch count
though.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 23:34                     ` Asdo
@ 2010-02-20  4:27                       ` Goswin von Brederlow
  2010-02-20 11:12                         ` Asdo
  0 siblings, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-02-20  4:27 UTC (permalink / raw)
  To: linux-raid

Asdo <asdo@shiftmail.org> writes:

> Thank you for your explanation Neil,
>
> Neil Brown wrote:
>> When memory changes between being written to one device and to another, this
>> does not cause corruption, only inconsistency. Either the block will be
>> written again consistently soon, or it will never be read.
>
> This is the crucial part...
>
> Why would the filesystem reuse the same memory without rewriting the
> *same* block?
>
> Can the same memory area be used for another block?
> If yes, I understand. If not, no I don't understand why the block is
> not eventually rewritten to contain equal data on both disks.
>
> Is this a power-fail-in-the-middle thing, or it can happen even when
> the power is always on?

The check is usualy done with the filesystem mounted and in use. So one
case would be that the block got written, changed and then checked
before the FS decided to flush the dirty block again.

The other scenario suggested in the past is that the block was written,
changed and then the file deleted, making the block unused, before it
got flushed again. The filesystem then sees no need to write a dirty but
unused block so it never gets rewritten. It never gets read either so
that  is safe.

> Do I understand correctly that raid-456 is instead safe
> ("no-mismatch") because it copies the memory region?
>
> Thank you

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-20  4:27                       ` Goswin von Brederlow
@ 2010-02-20 11:12                         ` Asdo
  2010-02-21 11:13                           ` Goswin von Brederlow
  0 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-02-20 11:12 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid

Goswin von Brederlow wrote:
> The check is usualy done with the filesystem mounted and in use. So one
> case would be that the block got written, changed and then checked
> before the FS decided to flush the dirty block again.
>
> The other scenario suggested in the past is that the block was written,
> changed and then the file deleted, making the block unused, 
This is not enough to cause the problem if I understand correctly, it 
also needs to change value at this point, right?
So how can it change value... is the same buffer used for another block?
> before it
> got flushed again. The filesystem then sees no need to write a dirty but
> unused block so it never gets rewritten. It never gets read either so
> that  is safe.
>   


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-20 11:12                         ` Asdo
@ 2010-02-21 11:13                           ` Goswin von Brederlow
       [not found]                             ` <8754A21825504719B463AD9809E54349@m5>
  0 siblings, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-02-21 11:13 UTC (permalink / raw)
  To: Asdo; +Cc: Goswin von Brederlow, linux-raid

Asdo <asdo@shiftmail.org> writes:

> Goswin von Brederlow wrote:
>> The check is usualy done with the filesystem mounted and in use. So one
>> case would be that the block got written, changed and then checked
>> before the FS decided to flush the dirty block again.
>>
>> The other scenario suggested in the past is that the block was written,
>> changed and then the file deleted, making the block unused,
> This is not enough to cause the problem if I understand correctly, it
> also needs to change value at this point, right?
> So how can it change value... is the same buffer used for another block?

open tempfile
write tempfile
raid1 starts to commit the block
write some more changing the block
 raid1 writes the 2nd copy of the block
delete file
fs never recommits the dirty page

Personally I don't really buy that scenario. At least not in the
frequency mismatches occur.

>> before it
>> got flushed again. The filesystem then sees no need to write a dirty but
>> unused block so it never gets rewritten. It never gets read either so
>> that  is safe.
>>

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
       [not found]                               ` <20100221194400.GA2570@lazy.lzy>
@ 2010-02-22 13:01                                 ` Asdo
  2010-02-22 13:30                                   ` Piergiorgio Sartor
  2010-02-22 13:44                                   ` Piergiorgio Sartor
  0 siblings, 2 replies; 104+ messages in thread
From: Asdo @ 2010-02-22 13:01 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Guy Watkins, 'Goswin von Brederlow', linux-raid

Piergiorgio Sartor wrote:
> Hi
>> If someone can map a mismatch to a file, the debate would be over.
>>     
> well, IMHO mismatches should not happen "by design",
> but only due to failures or bugs.
>
> For me, it is not so relevant if there is a file (or
> metadata) or nothing, under a mismatch, the whole idea
> of "mirroring" turns out to be not usable, still IMHO,
> if a mismatch can be caused intentionally.
>   
Even "nothing"?
Why?


Here we are talking about "nothing".

Or so it seems.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-22 13:01                                 ` Asdo
@ 2010-02-22 13:30                                   ` Piergiorgio Sartor
  2010-02-22 13:44                                   ` Piergiorgio Sartor
  1 sibling, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-22 13:30 UTC (permalink / raw)
  To: Asdo
  Cc: Piergiorgio Sartor, Guy Watkins, 'Goswin von Brederlow',
	linux-raid

Hi,

> Even "nothing"?
> Why?

for the following reasons:

1) the "check" command is useless if there are mismatches,
harmful or harmless they could be
2) the mirroring concept implies *identical* mirrors, not
identical only if the upper layer decides so
3) if a filesystem has a small bug, this will not be catched
later, that is, it could be the filesystem causes a *wrong*
mismatch (like there are correct ones)
4) in general, it is not safe to offer a mirroring which is
not always mirroring properly

> 
> Here we are talking about "nothing".
> 
> Or so it seems.

As I wrote, it does not matter, it is just not correct
to rely on the good will of other pieces of software to
guarantee the RAID-1 is working properly.
The RAID-1 should work properly because it does work
properly, not because the filesystem is kind enough to
allow it to work properly.

This could be a system design problem, not and MD one, of
course, so I'm stating the Neil or others should fix the
MD, what I'm writing is that it is astonishing to me
that things work this way (or "walk this way").

That's it, I'm just surprised to learn such situations
are present.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-22 13:01                                 ` Asdo
  2010-02-22 13:30                                   ` Piergiorgio Sartor
@ 2010-02-22 13:44                                   ` Piergiorgio Sartor
  1 sibling, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-22 13:44 UTC (permalink / raw)
  To: Asdo
  Cc: Piergiorgio Sartor, Guy Watkins, 'Goswin von Brederlow',
	linux-raid

Hi again,

forgot one thing...

I've some PCs where those mismatches shows up, sometimes
more, sometime less, sometimes nothing.

All these PCs have the filesystem (ext3) directly on
the RAID drive.

I've one more PC, where there is a LVM layer inbetween.

In this PC, which has also different HW (the others are
all identical), I never saw mismatches.
I can imagine the LVM takes care of handling properly
the memory buffers.

Can anyone confirm this?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-17 23:05               ` Neil Brown
  2010-02-19 15:18                 ` Piergiorgio Sartor
@ 2010-02-24 14:46                 ` Bill Davidsen
  2010-02-24 16:12                   ` Martin K. Petersen
  2010-02-24 21:32                   ` Neil Brown
  1 sibling, 2 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-24 14:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Steven Haigh, Bryan Mesich, Jon, linux-raid

Neil Brown wrote:
> On Wed, 17 Feb 2010 08:38:11 +1100
> Steven Haigh <netwiz@crc.id.au> wrote:
>
>   
>> On Tue, 16 Feb 2010 16:25:25 -0500, Bill Davidsen <davidsen@tmr.com>
>> wrote:
>>     
>>> Bryan Mesich wrote:
>>>       
>>>> On Thu, Feb 11, 2010 at 04:14:44PM +1100, Neil Brown wrote:
>>>>   
>>>>         
>>>>>> This whole discussion simply shows that for RAID-1 software RAID is
>>>>>> less
>>>>>> reliable than hardware RAID (no, I don't mean fake-RAID), because it 
>>>>>> doesn't pin the data buffer until all copies are written.
>>>>>>       
>>>>>>             
>>>>> That doesn't make it less reliable.  It just makes it more confusing.
>>>>>     
>>>>>           
>>>> I agree that linux software RAID is no less reliable than
>>>> hardware RAID with regards to the above conversation.  It's
>>>> however confusing to have a counter that indicates there are
>>>> problems with a RAID 1 array when in fact there is not.
>>>>   
>>>>         
>>> Sorry, but real hardware raid is more reliable than software raid, and 
>>> Neil's justification for not doing smart recovery mentions it. Note this
>>>       
>>> referes to real hardware raid, not fakeraid which is just some firmware 
>>> in a BIOS to use the existing hardware.
>>>
>>> The issue lies with data changing between write to multiple drives. In 
>>> hardware raid the data traverses the memory bus once, only once, and 
>>> goes into cache in the controller, from which it is written to all 
>>> mirrored drives. With software raid an individual write is done to each 
>>> drive, and if the data in the buffer changes between writes to one drive
>>>       
>>> or the other you get different values. Neil may be convinced that the OS
>>>       
>>> somehow "knows" which of the mirror copies is correct, ie. most recent, 
>>> and never uses the stale data, but if that information was really 
>>> available reads would always return the latest value and it wouldn't be 
>>> possible to read the same file multiple times and get different MD5sums.
>>>       
>>> It would also be possible to do a stable smart recovery by propagating 
>>> the most recent copy to the other mirror drives.
>>>
>>> I hoped that mounting data=journal would lead to consistency, that seems
>>>       
>>> not to be true either.
>>>       
>> I agree Bill, there is an issue with the software RAID1 when it comes down
>> to some hardware. I have one machine where the ONLY way to stop the root
>> filesystem going readonly due to journal issues is to remove RAID. Having
>> RAID1 enabled gives silent corruption of both data and the journal at
>> seemingly random times.
>>
>> I can see the data corruption from running a verify between RPM and data
>> on the drive. Reinstalling these packages fixes things - until something
>> random things get corrupted next time.
>>     
>
> Sounds very much like dodgy drives.
>
>   
>> The myth that data corruption in RAID1 ONLY happens to swap and/or unused
>> space on a drive is absolute rubbish.
>>
>>     
>
> Absolute rubbish does seem to be a suitable phrase here.
> There is no question of data corruption.
> When memory changes between being written to one device and to another, this
> does not cause corruption, only inconsistency.   Either the block will be
> written again consistently soon, or it will never be read.
>   

Just what is it that rewrites the data block? The user program doesn't 
know it's needed, the filesystem, if any, doesn't know it's needed, and 
as far as I can tell md doesn't do checksum before issuing the write and 
after the last write is done. Doesn't make a copy and write from that. 
So what sees that the data has changed and rewrites it?

> If the host crashes before the blocks are made consistent, then the 
> inconsistency will not be visible as the resync will fix it.
>
> If you are getting any corruption, then it is NOT due to this facet of the
> RAID1 implementation - it due to something else.
> My guess is bad hardware - anywhere from memory to hard drive.
>   

Having switched an array from three way raid-1 to raid-6, using the same 
kernel, utilities, and hardware, I can speak to that. When I first 
started to run checks, I took the array offline to do repair, and 
usually saw ~12k mismatches by the end of a week. After changing the 
array to raid-6 I never had a mismatch again. Therefore, while hardware 
clearly can be a factor, it is unlikely to be the cause of all mismatch 
events.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-19 22:02                   ` Neil Brown
                                       ` (2 preceding siblings ...)
  2010-02-20  4:23                     ` Goswin von Brederlow
@ 2010-02-24 14:54                     ` Bill Davidsen
  2010-02-24 21:37                       ` Neil Brown
  3 siblings, 1 reply; 104+ messages in thread
From: Bill Davidsen @ 2010-02-24 14:54 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon, linux-raid

Neil Brown wrote:
> md is not in a position to lock the page - there is simply no way it can stop
> the filesystem from changing it.
> The only thing it could do would be to make a copy, then write the copy out.
> This would incur a performance cost.
>
>   
Two thoughts on that - one is that for critical data, give me the option 
at array start time, make the copy, slow the performance and make it 
more consistent. My second thought is that a checksum of the page before 
initiating write and after all writes are complete might be less of a 
performance hit, and still could detect that the buffer had changed.
>> It seems to me, maybe I'm wrong, not a so safe design.
>>     
>
> I think you are wrong.
>
>   
> This is correct.  However it would be equally correct if you were talking
> about s normal disk drive rather than a RAID1 pair.
> If the filesystem changes the page (or allows it to change) while a write is
> pending, then it cannot know what actual data was written.  So it must write
> the block out again before it ever reads it in.
> RAID1 is no different to any other device in this respect.
>
>
>   
>> In other words, would it be better, for the md layer,
>> to be robust against these kind of threats?
>>
>>     
>
> Possibly, but at what cost?
> There are two ways that I can imagine to 'solve' this issue.
>
> 1/ always copy the page before writing.  This would incur a significant
>   overhead, both in the complexity of pre-allocation memory and in the
>   delay taken to perform the copy.  And it would very rarely be actually
>   needed.
> 2/ Have the filesystem protect the page from changes while it is being
>    written.  This is quite possible for the filesystem to do (while it
>    is impossible for md to do).  There could be some performance
>    cost with memory-mapped pages as they would need to be unmapped,
>    but there would be no significant cost for reads, writes, and filesystem
>    metadata operations.
>   

Your next section somewhat mirrors my thought on md checking the data 
after write to be sure it didn't change.

>    Further, any filesystem that wants to make use of the integrity checks
>    that newer drives provide (where the filesystem provides a 'checksum' for
>    the block which gets passed all the way down and written to storage, and
>    returned on a read) will need to do this anyway.  So it is likely the in
>    the near future all significant filesystems will provide all the
>    guarantees md needs or order to simply do nothing different.
>
> So my feeling is that md is doing the best thing already.
>
> I believe 'swap' will always be an issue as unmapping swap pages during write
> could be a serious performance cost.  It might be that the best thing to do
> with swap is to somehow mark the area of an array used for swap as "don't
> care" so md never bothers to resync it, and never reports inconsistencies
> there, as they really are not an issue.
>
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 14:46                 ` Bill Davidsen
@ 2010-02-24 16:12                   ` Martin K. Petersen
  2010-02-24 18:51                     ` Piergiorgio Sartor
  2010-02-24 21:39                     ` Neil Brown
  2010-02-24 21:32                   ` Neil Brown
  1 sibling, 2 replies; 104+ messages in thread
From: Martin K. Petersen @ 2010-02-24 16:12 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Neil Brown, Steven Haigh, Bryan Mesich, Jon, linux-raid

>>>>> "Bill" == Bill Davidsen <davidsen@tmr.com> writes:

>> Absolute rubbish does seem to be a suitable phrase here.  There is no
>> question of data corruption.  When memory changes between being
>> written to one device and to another, this does not cause corruption,
>> only inconsistency.  Either the block will be written again
>> consistently soon, or it will never be read.

Bill> Just what is it that rewrites the data block? The user program
Bill> doesn't know it's needed, the filesystem, if any, doesn't know
Bill> it's needed, and as far as I can tell md doesn't do checksum
Bill> before issuing the write and after the last write is done. Doesn't
Bill> make a copy and write from that. So what sees that the data has
Bill> changed and rewrites it?

The filesystem updates the page, causing it to be marked dirty again.
The VM will then eventually schedule the page to be written out.  The
"when" depends on filesystem type and whether there's metadata or data
in the page.

In this discussion there seems to be a focus on the case where one
mirror is correct and one is not.  However, that's usually not how it
works out.  A more realistic scenario is that both mirror copies are
incorrect because the page was continuously updated.  I.e. both mirrors
have various degrees of new and stale data inside a 4KB block.

So realistically both disk blocks are wrong and there's a window until
the new, correct block is written.  That window will only cause problems
if there is a crash and we'll need to recover.  My main concern here is
how big the discrepancy between the disks can get, and whether we'll end
up corrupting the filesystem during recovery because we could
potentially be matching metadata from one disk with journal entries from
another.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 16:12                   ` Martin K. Petersen
@ 2010-02-24 18:51                     ` Piergiorgio Sartor
  2010-02-24 22:21                       ` Neil Brown
  2010-02-24 21:39                     ` Neil Brown
  1 sibling, 1 reply; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-24 18:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bill Davidsen, Neil Brown, Steven Haigh, Bryan Mesich, Jon, linux-raid

Hi,

> So realistically both disk blocks are wrong and there's a window until
> the new, correct block is written.  That window will only cause problems
> if there is a crash and we'll need to recover.  My main concern here is
> how big the discrepancy between the disks can get, and whether we'll end
> up corrupting the filesystem during recovery because we could
> potentially be matching metadata from one disk with journal entries from
> another.

well, I know already people will not believe me, but
just this evening, one of the infamous PCs with mismatch
count going up and down, could not boot anymore.

Reason: you must specifiy the filesystem type

So, I started it with a live CD.

My first idea was a problem with the RAID (type is 10 f2).

This was assembled fine, so I tried to mount it, but mount
returned the same error as above.
So I tried to mount it specifying "-text3" and it was mounted.
Everything seemed to be fine, I backup the data anyhow.

Some interesting discoveries:

tune2fs -l /dev/md/2_0 returns the FS data, no errors.
blkid /dev/md/2_0 does not return anything.

Running a fsck did not find anything wrong, but it did
not repair anything too.

Now, I do not know if this was caused by the situation
mentioned above, but for sure is quite fishy...

BTW, unrelated to the topic, any idea on how to fix this?
Is there any tool that can restore the proper ID or else?

Thanks,

bye. 

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
       [not found]                             ` <8754A21825504719B463AD9809E54349@m5>
       [not found]                               ` <20100221194400.GA2570@lazy.lzy>
@ 2010-02-24 19:42                               ` Bill Davidsen
  1 sibling, 0 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-24 19:42 UTC (permalink / raw)
  To: Guy Watkins; +Cc: 'Goswin von Brederlow', 'Asdo', linux-raid

Guy Watkins wrote:
> } open tempfile
> } write tempfile
> } raid1 starts to commit the block
> } write some more changing the block
> }  raid1 writes the 2nd copy of the block
> } delete file
> } fs never recommits the dirty page
> } 
> } Personally I don't really buy that scenario. At least not in the
> } frequency mismatches occur.
> } 
> } 
> } MfG
> }         Goswin
>
> If someone can map a mismatch to a file, the debate would be over.
>   

Simple test: create a three way raid-1. Run it on a heavily used ext3 
file system for a few days, until it has 3-4k mismatch count. Shut it 
down gracefully. Now
- run e2fsck -n on each of the parts, to prove the f/s is not corrupt
- mount on partition, using ext2 type, ro,noatime[1]
- do an md5sum on every file and put the output in a file[2]
  (on another f/s, obviously)
- mount each of the mirrors the same way
- run md5sum -C {saved_file} to check file content
If you get files which don't compare copy to copy you can see that the 
issue is real.

[1] rather than explain to newbies why neither changes to atime nor any 
journal activity doesn't make the file content change, I do it this way.
[2] MD5 is fine to detect file changes. You need sha1 or such only to 
detect malicious changes with intent to hide the change. Because it uses 
little CPU it's as good as any. Use sha256sum or similar if you doubt me.

Having had mismatches on raid-1 and not on raid-6 using the same three 
drives, I question the "hardware error" theory of mismatch origin.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 14:46                 ` Bill Davidsen
  2010-02-24 16:12                   ` Martin K. Petersen
@ 2010-02-24 21:32                   ` Neil Brown
  2010-02-25  7:22                     ` Goswin von Brederlow
  2010-02-25  8:47                     ` John Robinson
  1 sibling, 2 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-24 21:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Steven Haigh, Bryan Mesich, Jon, linux-raid

On Wed, 24 Feb 2010 09:46:23 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> > There is no question of data corruption.
> > When memory changes between being written to one device and to another, this
> > does not cause corruption, only inconsistency.   Either the block will be
> > written again consistently soon, or it will never be read.
> >     
> 
> Just what is it that rewrites the data block? The user program doesn't 
> know it's needed, the filesystem, if any, doesn't know it's needed, and 
> as far as I can tell md doesn't do checksum before issuing the write and 
> after the last write is done. Doesn't make a copy and write from that. 
> So what sees that the data has changed and rewrites it?
> 

The filesystem re-writes the block, though probably it is more accurate to
say 'the page cache' rewrites the block (the page cache is essentially just a
library of code that the filesystem uses).

When a page is changed, its 'Dirty' flag is set.
Before a page is written out, the Dirty flag is cleared.
So if a page is written differently to two devices, then it must have been
changed after the Dirty flag was clear, so the Dirty flag will be set, so the
page cache will try to write it out again (after about 30 seconds or at
unmount time).

When accessing a block device directly ( > /dev/md0 ) the page cache is still
used and will still write out any page that has the Dirty flag set.

If you open /dev/md0 with O_DIRECT there is no page cache involved and so no
setting of Dirty flags.  So you could engineer a situation with O_DIRECT
that writes different data to the two devices, but you would have to try
fairly hard.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 14:54                     ` Bill Davidsen
@ 2010-02-24 21:37                       ` Neil Brown
  2010-02-26 20:48                         ` Bill Davidsen
  0 siblings, 1 reply; 104+ messages in thread
From: Neil Brown @ 2010-02-24 21:37 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon, linux-raid

On Wed, 24 Feb 2010 09:54:17 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> Neil Brown wrote:
> > md is not in a position to lock the page - there is simply no way it can stop
> > the filesystem from changing it.
> > The only thing it could do would be to make a copy, then write the copy out.
> > This would incur a performance cost.
> >
> >     
> Two thoughts on that - one is that for critical data, give me the option 
> at array start time, make the copy, slow the performance and make it 
> more consistent. My second thought is that a checksum of the page before 
> initiating write and after all writes are complete might be less of a 
> performance hit, and still could detect that the buffer had changed.


The idea of calculating a checksum before and after certainly has some merit,
if we could choose a checksum algorithm which was sufficiently strong and
sufficiently fast, though in many cases a large part of the cost would just be
bringing the page contents into cache - twice.

It has the advantage over copying the page of not needing to allocate extra
memory.

If someone wanted to try an prototype this and see how it goes, I'd be happy
to advise....

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 16:12                   ` Martin K. Petersen
  2010-02-24 18:51                     ` Piergiorgio Sartor
@ 2010-02-24 21:39                     ` Neil Brown
       [not found]                       ` <4B8640A2.4060307@shiftmail.org>
  2010-02-28  8:09                       ` Luca Berra
  1 sibling, 2 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-24 21:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bill Davidsen, Steven Haigh, Bryan Mesich, Jon, linux-raid

On Wed, 24 Feb 2010 11:12:09 -0500
"Martin K. Petersen" <martin.petersen@oracle.com> wrote:

> So realistically both disk blocks are wrong and there's a window until
> the new, correct block is written.  That window will only cause problems
> if there is a crash and we'll need to recover.  My main concern here is
> how big the discrepancy between the disks can get, and whether we'll end
> up corrupting the filesystem during recovery because we could
> potentially be matching metadata from one disk with journal entries from
> another.

After a crash, md will only read from one of the devices (the first) until a
resync has completed.  So there should be no room for more confusion than you
would expect on a single device.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 18:51                     ` Piergiorgio Sartor
@ 2010-02-24 22:21                       ` Neil Brown
  2010-02-25  8:41                         ` Piergiorgio Sartor
  0 siblings, 1 reply; 104+ messages in thread
From: Neil Brown @ 2010-02-24 22:21 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Martin K. Petersen, Bill Davidsen, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

On Wed, 24 Feb 2010 19:51:06 +0100
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:

> Hi,
> 
> > So realistically both disk blocks are wrong and there's a window until
> > the new, correct block is written.  That window will only cause problems
> > if there is a crash and we'll need to recover.  My main concern here is
> > how big the discrepancy between the disks can get, and whether we'll end
> > up corrupting the filesystem during recovery because we could
> > potentially be matching metadata from one disk with journal entries from
> > another.
> 
> well, I know already people will not believe me, but
> just this evening, one of the infamous PCs with mismatch
> count going up and down, could not boot anymore.

I certainly believe you.

> 
> Reason: you must specifiy the filesystem type

This suggests that the superblock which lives at an offset of 1K
into the filesystem was sufficiently corrupted that mount couldn't
recognise it.

> 
> So, I started it with a live CD.
> 
> My first idea was a problem with the RAID (type is 10 f2).
> 
> This was assembled fine, so I tried to mount it, but mount
> returned the same error as above.
> So I tried to mount it specifying "-text3" and it was mounted.

That is really odd!  Both the kernel ext3 module (triggered by '-text3')
and the 'mount' program use exactly the same test - look for the magic
number in the superblock at 1K into the device.

It is very hard to see how 'mount' would fail to find something that the ext3
module finds.

> Everything seemed to be fine, I backup the data anyhow.
> 
> Some interesting discoveries:
> 
> tune2fs -l /dev/md/2_0 returns the FS data, no errors.
> blkid /dev/md/2_0 does not return anything.

This sounds very much like tune2fs and blkid are reading two different
things, which is strange.

Would you be able to get the first 4K from each device in the raid10:
   dd if=/dev/whatever of=/tmp/whatever bs=1K count=4

and the tar/gz those up and send them to me.  That might give some clue.
Unless the raid metadata is 1.1 or 1.2 - then I would need blocks further in
the device, as the 'data offset'.
The --detail output of the array might help too.


> 
> Running a fsck did not find anything wrong, but it did
> not repair anything too.

Did you use "fsck -f" ??

> 
> Now, I do not know if this was caused by the situation
> mentioned above, but for sure is quite fishy...
> 
> BTW, unrelated to the topic, any idea on how to fix this?
> Is there any tool that can restore the proper ID or else?
> 

Until we know what is wrong, it is hard to suggest a fix.

NeilBrown


> Thanks,
> 
> bye. 
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 21:32                   ` Neil Brown
@ 2010-02-25  7:22                     ` Goswin von Brederlow
  2010-02-25  7:39                       ` Neil Brown
  2010-02-25  8:47                     ` John Robinson
  1 sibling, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-02-25  7:22 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Steven Haigh, Bryan Mesich, Jon, linux-raid

Neil Brown <neilb@suse.de> writes:

> On Wed, 24 Feb 2010 09:46:23 -0500
> Bill Davidsen <davidsen@tmr.com> wrote:
>
>> > There is no question of data corruption.
>> > When memory changes between being written to one device and to another, this
>> > does not cause corruption, only inconsistency.   Either the block will be
>> > written again consistently soon, or it will never be read.
>> >     
>> 
>> Just what is it that rewrites the data block? The user program doesn't 
>> know it's needed, the filesystem, if any, doesn't know it's needed, and 
>> as far as I can tell md doesn't do checksum before issuing the write and 
>> after the last write is done. Doesn't make a copy and write from that. 
>> So what sees that the data has changed and rewrites it?
>> 
>
> The filesystem re-writes the block, though probably it is more accurate to
> say 'the page cache' rewrites the block (the page cache is essentially just a
> library of code that the filesystem uses).
>
> When a page is changed, its 'Dirty' flag is set.
> Before a page is written out, the Dirty flag is cleared.
> So if a page is written differently to two devices, then it must have been
> changed after the Dirty flag was clear, so the Dirty flag will be set, so the
> page cache will try to write it out again (after about 30 seconds or at
> unmount time).

So maybe MD could check the dirty flag after write and then output a
warning so we can track down the issue. MD could also rewrite the page
prior to setting the disks in-sync until the dirty bit is clear after a
write.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-25  7:22                     ` Goswin von Brederlow
@ 2010-02-25  7:39                       ` Neil Brown
  0 siblings, 0 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-25  7:39 UTC (permalink / raw)
  To: Goswin von Brederlow
  Cc: Bill Davidsen, Steven Haigh, Bryan Mesich, Jon, linux-raid

On Thu, 25 Feb 2010 08:22:10 +0100
Goswin von Brederlow <goswin-v-b@web.de> wrote:

> Neil Brown <neilb@suse.de> writes:
> 
> > On Wed, 24 Feb 2010 09:46:23 -0500
> > Bill Davidsen <davidsen@tmr.com> wrote:
> >
> >> > There is no question of data corruption.
> >> > When memory changes between being written to one device and to another, this
> >> > does not cause corruption, only inconsistency.   Either the block will be
> >> > written again consistently soon, or it will never be read.
> >> >     
> >> 
> >> Just what is it that rewrites the data block? The user program doesn't 
> >> know it's needed, the filesystem, if any, doesn't know it's needed, and 
> >> as far as I can tell md doesn't do checksum before issuing the write and 
> >> after the last write is done. Doesn't make a copy and write from that. 
> >> So what sees that the data has changed and rewrites it?
> >> 
> >
> > The filesystem re-writes the block, though probably it is more accurate to
> > say 'the page cache' rewrites the block (the page cache is essentially just a
> > library of code that the filesystem uses).
> >
> > When a page is changed, its 'Dirty' flag is set.
> > Before a page is written out, the Dirty flag is cleared.
> > So if a page is written differently to two devices, then it must have been
> > changed after the Dirty flag was clear, so the Dirty flag will be set, so the
> > page cache will try to write it out again (after about 30 seconds or at
> > unmount time).
> 
> So maybe MD could check the dirty flag after write and then output a
> warning so we can track down the issue. MD could also rewrite the page
> prior to setting the disks in-sync until the dirty bit is clear after a
> write.

md isn't able to see the dirty bit.

It gets a 'bio', which has a 'biovec' which has a list of pages
with offset and size.
It does not know if the page is in the page cache or not so it cannot know if
the dirty flag on the page means anything or not.

Yes, it technically could check the dirty bit and if it sees any of them set
then it could reschedule the writes. however,
 1- this is a layering violation - it is the wrong thing to do.
 2- it might not work.  The filesystem could keep the 'dirty' status elsewhere
    such as in a 'buffer_head', and only copy it through to the page
    occasionally.
 3- it could cause a live-lock.  If an application is changing a mapped page
    quite regularly, then the current pagecache will write it out every 30
    seconds or so.  Your proposed change would write it out again and again
    as soon as the previous write completes.

So, no:  we cannot do that.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 22:21                       ` Neil Brown
@ 2010-02-25  8:41                         ` Piergiorgio Sartor
  2010-03-02  4:57                           ` Neil Brown
  0 siblings, 1 reply; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-25  8:41 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Martin K. Petersen, Bill Davidsen,
	Steven Haigh, Bryan Mesich, Jon, linux-raid

Hi,

> I certainly believe you.

thank you!

> That is really odd!  Both the kernel ext3 module (triggered by '-text3')
> and the 'mount' program use exactly the same test - look for the magic
> number in the superblock at 1K into the device.

Today I tried: blkid -p /dev/md1 (this time the live CD
autoassembled the md device) and it returned something
like: ambivalent result (probably more than one filesystem...)

Strange thing is that, the HDDs were brand new, no older
partitions or filesystem were there.

Anyway, I've one small correction, the RAID is not 10 f2,
on this PC, but (due to different installation) a RAID-1
with superblock 0.9 and the device partitions are set
to 0xFD (RAID autoassemble).
 
> Would you be able to get the first 4K from each device in the raid10:
>    dd if=/dev/whatever of=/tmp/whatever bs=1K count=4
> 
> and the tar/gz those up and send them to me.  That might give some clue.
> Unless the raid metadata is 1.1 or 1.2 - then I would need blocks further in
> the device, as the 'data offset'.
> The --detail output of the array might help too.

I dumped the first 4K of each device, they're identical
(so no mismatch there, at least), I'll send them to you,
together with the detail output.
 
> > Running a fsck did not find anything wrong, but it did
> > not repair anything too.
> 
> Did you use "fsck -f" ??

Yep.
 
> Until we know what is wrong, it is hard to suggest a fix.

Thanks a lot (also because this could turn out to be unrelated
with this mailing list).

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 21:32                   ` Neil Brown
  2010-02-25  7:22                     ` Goswin von Brederlow
@ 2010-02-25  8:47                     ` John Robinson
  2010-02-25  9:07                       ` Neil Brown
  1 sibling, 1 reply; 104+ messages in thread
From: John Robinson @ 2010-02-25  8:47 UTC (permalink / raw)
  To: linux-raid

On 24/02/2010 21:32, Neil Brown wrote:
[...]
> If you open /dev/md0 with O_DIRECT there is no page cache involved and so no
> setting of Dirty flags.  So you could engineer a situation with O_DIRECT
> that writes different data to the two devices, but you would have to try
> fairly hard.

Hang on. O_DIRECT sets off all sort of alarm bells for me, not that I 
understand it properly. Of course there's O_DIRECT on files too. Linus 
Torvalds is quite outspoken about it: http://kerneltrap.org/node/7563

Could we be seeing mismatches because applications are opening their 
files with O_DIRECT in a (perhaps misguided) attempt to get better 
performance?

Cheers,

John.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-25  8:47                     ` John Robinson
@ 2010-02-25  9:07                       ` Neil Brown
  0 siblings, 0 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-25  9:07 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

On Thu, 25 Feb 2010 08:47:59 +0000
John Robinson <john.robinson@anonymous.org.uk> wrote:

> On 24/02/2010 21:32, Neil Brown wrote:
> [...]
> > If you open /dev/md0 with O_DIRECT there is no page cache involved and so no
> > setting of Dirty flags.  So you could engineer a situation with O_DIRECT
> > that writes different data to the two devices, but you would have to try
> > fairly hard.
> 
> Hang on. O_DIRECT sets off all sort of alarm bells for me, not that I 
> understand it properly. Of course there's O_DIRECT on files too. Linus 
> Torvalds is quite outspoken about it: http://kerneltrap.org/node/7563
> 
> Could we be seeing mismatches because applications are opening their 
> files with O_DIRECT in a (perhaps misguided) attempt to get better 
> performance?

Unlikely.
The app would need to be doing async direct-io, or it would need to be have
multiple threads, and in either case it would need to change the buffer that
was being written while the write was happening.  And that would be a pretty
dumb thing to do unless it almost immediately wrote the same buffer out again.

So not exactly impossible, but probably the least-likely of the various
possible explanations.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
       [not found]                       ` <4B8640A2.4060307@shiftmail.org>
@ 2010-02-25 10:41                         ` Neil Brown
  0 siblings, 0 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-25 10:41 UTC (permalink / raw)
  To: Asdo
  Cc: Martin K. Petersen, Bill Davidsen, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

On Thu, 25 Feb 2010 10:19:30 +0100
Asdo <asdo@shiftmail.org> wrote:

> Neil Brown wrote:
> > On Wed, 24 Feb 2010 11:12:09 -0500
> > "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
> >
> >   
> >> So realistically both disk blocks are wrong and there's a window until
> >> the new, correct block is written.  That window will only cause problems
> >> if there is a crash and we'll need to recover.  My main concern here is
> >> how big the discrepancy between the disks can get, and whether we'll end
> >> up corrupting the filesystem during recovery because we could
> >> potentially be matching metadata from one disk with journal entries from
> >> another.
> >>     
> >
> > After a crash, md will only read from one of the devices (the first) until a
> > resync has completed.  So there should be no room for more confusion than you
> > would expect on a single device.
> Not enough, I'd say.
> The reads are from a single device, the first, but it's the writes which 
> you don't know if they go to firstly to the first device or in the 
> reverse order. So I'd still be concerned by what Martin says.

I'm getting bored of repeating myself, so I won't respond to this.

> 
> In addition in this ML there are people reporting that the mismatches 
> occur even when the system is always on, no crashes. So I think there is 
> another mechanism for mismatches (not sure if in addition or it's the 
> only mechanism).

Ditto

> 
> Besides, if the mechanism for mismatches is correct I'd go for the copy 
> (or page lock if possible). All raids have copy, except raid0 maybe, and 
> they are not slow. Here the copy would only occur on writes, and raid-1 
> is not targeted to be SO fast on writes... Also raid-1's are usually on 
> few disk, like no more than 3, so the copy is not likely to bottleneck 
> the speed of the writes.

I'm sure it would be a measurable slowdown, though < 20%.  Probably < 10%.  I
doubt everyone would be happy with that, though you might.

> 
> What about raid-10? Are there copies for the raid-1 part of raid-10?
> 

No.  Neither raid1 nor raid10 copy the data, only raid456.

NeilBrown



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 21:37                       ` Neil Brown
@ 2010-02-26 20:48                         ` Bill Davidsen
  2010-02-26 21:09                           ` Neil Brown
  0 siblings, 1 reply; 104+ messages in thread
From: Bill Davidsen @ 2010-02-26 20:48 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon, linux-raid

Neil Brown wrote:
> On Wed, 24 Feb 2010 09:54:17 -0500
> Bill Davidsen <davidsen@tmr.com> wrote:
>
>   
>> Neil Brown wrote:
>>     
>>> md is not in a position to lock the page - there is simply no way it can stop
>>> the filesystem from changing it.
>>> The only thing it could do would be to make a copy, then write the copy out.
>>> This would incur a performance cost.
>>>
>>>     
>>>       
>> Two thoughts on that - one is that for critical data, give me the option 
>> at array start time, make the copy, slow the performance and make it 
>> more consistent. My second thought is that a checksum of the page before 
>> initiating write and after all writes are complete might be less of a 
>> performance hit, and still could detect that the buffer had changed.
>>     
>
>
> The idea of calculating a checksum before and after certainly has some merit,
> if we could choose a checksum algorithm which was sufficiently strong and
> sufficiently fast, though in many cases a large part of the cost would just be
> bringing the page contents into cache - twice.
>
> It has the advantage over copying the page of not needing to allocate extra
> memory.
>
> If someone wanted to try an prototype this and see how it goes, I'd be happy
> to advise....
>   

Disagree if you wish, but MD5 should be fine for this. While it is not 
cryptographically strong on files, where the size can be changed and 
evil doers can calculate values to add at the end of the data, it should 
be adequate on data of unchanging size. It's cheap, fast, and readily 
available.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 20:48                         ` Bill Davidsen
@ 2010-02-26 21:09                           ` Neil Brown
  2010-02-26 22:01                             ` Piergiorgio Sartor
                                               ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Neil Brown @ 2010-02-26 21:09 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon, linux-raid

On Fri, 26 Feb 2010 15:48:58 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> >
> > The idea of calculating a checksum before and after certainly has some merit,
> > if we could choose a checksum algorithm which was sufficiently strong and
> > sufficiently fast, though in many cases a large part of the cost would just be
> > bringing the page contents into cache - twice.
> >
> > It has the advantage over copying the page of not needing to allocate extra
> > memory.
> >
> > If someone wanted to try an prototype this and see how it goes, I'd be happy
> > to advise....
> >     
> 
> Disagree if you wish, but MD5 should be fine for this. While it is not 
> cryptographically strong on files, where the size can be changed and 
> evil doers can calculate values to add at the end of the data, it should 
> be adequate on data of unchanging size. It's cheap, fast, and readily 
> available.
> 

Actually, I'm no longer convinced that the checksumming idea would work.
If a mem-mapped page were written, that the app is updating every
millisecond (i.e. less than the write latency), then every time a write
completed the checksum would be different so we would have to reschedule the
write, which would not be the correct behaviour at all.
So I think that the only way to address this in the md layer is to copy
the data and write the copy.  There is already code to copy the data for
write-behind that could possible be leveraged to do a copy always.

Or I could just stop setting mismatch_cnt for raid1 and raid10.  That would
also fix the problem :-)

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 21:09                           ` Neil Brown
@ 2010-02-26 22:01                             ` Piergiorgio Sartor
  2010-02-26 22:15                             ` Bill Davidsen
  2010-02-26 22:20                             ` Asdo
  2 siblings, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-26 22:01 UTC (permalink / raw)
  To: Neil Brown
  Cc: Bill Davidsen, Piergiorgio Sartor, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

Hi,

> So I think that the only way to address this in the md layer is to copy
> the data and write the copy.  There is already code to copy the data for
> write-behind that could possible be leveraged to do a copy always.

actually, I wanted to ask how the write-behind works,
because I was suspecting it copies the data.

BTW, it is possible to set both drives (of a pair) as
write-mostly and some write-behind?

> Or I could just stop setting mismatch_cnt for raid1 and raid10.  That would
> also fix the problem :-)

Well, the "complaining" problem will be fixed... :-)

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 21:09                           ` Neil Brown
  2010-02-26 22:01                             ` Piergiorgio Sartor
@ 2010-02-26 22:15                             ` Bill Davidsen
  2010-02-26 22:21                               ` Piergiorgio Sartor
  2010-02-26 22:20                             ` Asdo
  2 siblings, 1 reply; 104+ messages in thread
From: Bill Davidsen @ 2010-02-26 22:15 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon, linux-raid

Neil Brown wrote:
> On Fri, 26 Feb 2010 15:48:58 -0500
> Bill Davidsen <davidsen@tmr.com> wrote:
>
>   
>>> The idea of calculating a checksum before and after certainly has some merit,
>>> if we could choose a checksum algorithm which was sufficiently strong and
>>> sufficiently fast, though in many cases a large part of the cost would just be
>>> bringing the page contents into cache - twice.
>>>
>>> It has the advantage over copying the page of not needing to allocate extra
>>> memory.
>>>
>>> If someone wanted to try an prototype this and see how it goes, I'd be happy
>>> to advise....
>>>     
>>>       
>> Disagree if you wish, but MD5 should be fine for this. While it is not 
>> cryptographically strong on files, where the size can be changed and 
>> evil doers can calculate values to add at the end of the data, it should 
>> be adequate on data of unchanging size. It's cheap, fast, and readily 
>> available.
>>
>>     
>
> Actually, I'm no longer convinced that the checksumming idea would work.
> If a mem-mapped page were written, that the app is updating every
> millisecond (i.e. less than the write latency), then every time a write
> completed the checksum would be different so we would have to reschedule the
> write, which would not be the correct behaviour at all.
> So I think that the only way to address this in the md layer is to copy
> the data and write the copy.  There is already code to copy the data for
> write-behind that could possible be leveraged to do a copy always.
>
>   
Your point is valid about the possibility, but consider this, if the 
checksum fails, then at that point do the copy and write again.
> Or I could just stop setting mismatch_cnt for raid1 and raid10.  That would
> also fix the problem :-)
>
>   
s/fix/hide/  ;-)

My feeling is that we have many ways to change the data, O_DIRECT, aio, 
threads, mmap, and probably some I haven't found yet. Rather than think 
that you could prevent that without a flaming layer violation, perhaps 
my thought above, to detect the fact that the data has changed, and at 
that point do a copy and write unchanging data to all drives. How that 
plays with O_DIRECT I can't say, but it sounds to me as if it should 
eliminate the mismatches without a huge performance impact. Let me know 
if this addresses your concern with writing forever without taking much 
overhead.

The question is why this happens with raid-1 and doesn't seem to with 
raid-[56]. And I don't see mismatches on my raid-10, although I'm pretty 
sure that neither mmap or O_DIRECT is used on those arrays.

What would seem to be optimal is some COW on the buffer to prevent the 
buffer from being modified while it's being used for actual i/o. Doesn't 
seem hardware supports it, page size, buffer size and sector size all vary.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 21:09                           ` Neil Brown
  2010-02-26 22:01                             ` Piergiorgio Sartor
  2010-02-26 22:15                             ` Bill Davidsen
@ 2010-02-26 22:20                             ` Asdo
  2010-02-27  6:01                               ` Michael Evans
  2 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-02-26 22:20 UTC (permalink / raw)
  To: Neil Brown
  Cc: Bill Davidsen, Piergiorgio Sartor, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

Neil Brown wrote:
> Actually, I'm no longer convinced that the checksumming idea would work.
> If a mem-mapped page were written, that the app is updating every
> millisecond (i.e. less than the write latency), then every time a write
> completed the checksum would be different so we would have to reschedule the
> write, which would not be the correct behaviour at all.
> So I think that the only way to address this in the md layer is to copy
> the data and write the copy.  There is already code to copy the data for
> write-behind that could possible be leveraged to do a copy always.
>   
The concerns of slowdowns with copy could be addressed by making the 
copy a runtime choice triggered by a sysctl interface, a file in 
/sys/block/mdX/md/ interface where one can echo "1" to enable copies for 
this type of raid. Or better 1 could be the default (slower but safer, 
or if not safer, at least to avoid needless questions on mismatches on 
this ML by new users, and to allow detection of REAL mismatches which 
can be due to cabling or defective disks) and echoing 0 would increase 
performances at the cost of seeing lots of false positive mismatches.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 22:15                             ` Bill Davidsen
@ 2010-02-26 22:21                               ` Piergiorgio Sartor
  0 siblings, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-02-26 22:21 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Neil Brown, Piergiorgio Sartor, Steven Haigh, Bryan Mesich, Jon,
	linux-raid

Hi,

> The question is why this happens with raid-1 and doesn't seem to
> with raid-[56]. And I don't see mismatches on my raid-10, although
> I'm pretty sure that neither mmap or O_DIRECT is used on those
> arrays.

I believe Neil mentioned that RAID-5/6 always
makes a copy, while only RAID-1/10 uses the same
page without copying.

I get mismatches on RAID-10, but not on the one
that has LVM on it, only on the one(s) where the
filesystem (ext3) is directly on the RAID volume.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-26 22:20                             ` Asdo
@ 2010-02-27  6:01                               ` Michael Evans
  2010-02-28  0:01                                 ` Bill Davidsen
  0 siblings, 1 reply; 104+ messages in thread
From: Michael Evans @ 2010-02-27  6:01 UTC (permalink / raw)
  To: Asdo
  Cc: Neil Brown, Bill Davidsen, Piergiorgio Sartor, Steven Haigh,
	Bryan Mesich, Jon, linux-raid

On Fri, Feb 26, 2010 at 2:20 PM, Asdo <asdo@shiftmail.org> wrote:
> Neil Brown wrote:
>>
>> Actually, I'm no longer convinced that the checksumming idea would work.
>> If a mem-mapped page were written, that the app is updating every
>> millisecond (i.e. less than the write latency), then every time a write
>> completed the checksum would be different so we would have to reschedule
>> the
>> write, which would not be the correct behaviour at all.
>> So I think that the only way to address this in the md layer is to copy
>> the data and write the copy.  There is already code to copy the data for
>> write-behind that could possible be leveraged to do a copy always.
>>
>
> The concerns of slowdowns with copy could be addressed by making the copy a
> runtime choice triggered by a sysctl interface, a file in /sys/block/mdX/md/
> interface where one can echo "1" to enable copies for this type of raid. Or
> better 1 could be the default (slower but safer, or if not safer, at least
> to avoid needless questions on mismatches on this ML by new users, and to
> allow detection of REAL mismatches which can be due to cabling or defective
> disks) and echoing 0 would increase performances at the cost of seeing lots
> of false positive mismatches.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Isn't there some way of making the page copy-on-write using hardware
and/or an in-kernel structure?  Ideally copying could be avoided
/unless/ there is change.  That way each operation looks like an
atomic commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-27  6:01                               ` Michael Evans
@ 2010-02-28  0:01                                 ` Bill Davidsen
  0 siblings, 0 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-28  0:01 UTC (permalink / raw)
  To: Michael Evans
  Cc: Asdo, Neil Brown, Piergiorgio Sartor, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

Michael Evans wrote:
> On Fri, Feb 26, 2010 at 2:20 PM, Asdo <asdo@shiftmail.org> wrote:
>   
>> Neil Brown wrote:
>>     
>>> Actually, I'm no longer convinced that the checksumming idea would work.
>>> If a mem-mapped page were written, that the app is updating every
>>> millisecond (i.e. less than the write latency), then every time a write
>>> completed the checksum would be different so we would have to reschedule
>>> the
>>> write, which would not be the correct behaviour at all.
>>> So I think that the only way to address this in the md layer is to copy
>>> the data and write the copy.  There is already code to copy the data for
>>> write-behind that could possible be leveraged to do a copy always.
>>>
>>>       
>> The concerns of slowdowns with copy could be addressed by making the copy a
>> runtime choice triggered by a sysctl interface, a file in /sys/block/mdX/md/
>> interface where one can echo "1" to enable copies for this type of raid. Or
>> better 1 could be the default (slower but safer, or if not safer, at least
>> to avoid needless questions on mismatches on this ML by new users, and to
>> allow detection of REAL mismatches which can be due to cabling or defective
>> disks) and echoing 0 would increase performances at the cost of seeing lots
>> of false positive mismatches.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>     
>
> Isn't there some way of making the page copy-on-write using hardware
> and/or an in-kernel structure?  Ideally copying could be avoided
> /unless/ there is change.  That way each operation looks like an
> atomic commit.
>   

As I think about this, one idea was to add a write-in-progress flag, so 
that the filesystem, or library, or whatever would know not to change 
the page. That would mean that every filesystem would need to be 
enhanced, or that the "safe write" would be optional on a per-filesystem 
level. Implementation of O_DIRECT could do it, or not, and there could 
be a safe way to write.

However, it occurs to me that there are several other levels involved, 
and so it could be better but not perfect. While md could flag the start 
and finish of write, you then need to have the next level, the device 
driver, do the same thing, so md knows when the data need not be frozen. 
"But wait, there's more," as they say, the device driver need to track 
when the data are transferred to the actual device, and the device needs 
to report when the data actually hit the platter, or you could still 
have possible mismatches.

All of that reminds us of the discussion of barriers, and flush cache 
commands, and other performance impacting practices. So in the long run 
I think the most effective solution, one which has the highest 
improvement at the lowest cost in performance, is a copy. Now if Neil 
liked my idea of doing a checksum before and after a write, and a copy 
only in the cases where the data had changed, the impact could be pretty 
small.

All that depends on two things, Neil thinking the whole thing is worth 
doing, and no one finding a flaw in my proposal to do a checksum rather 
than a copy each time.

And to return to your original question, no. Hardware COW works on 
memory pages, a buffer could span pages and a write to a page might not 
be in the part of the page used for the i/o buffer. So as nice as that 
would be, I don't think the hardware supports it. And even if you could, 
the COW needs to be done in the layer which tries to change the buffer, 
so md would set COW and the filesystem would have to deal with it. I am 
pretty sure that's a layering violation, big time. The advisory "write 
in progress" flag might be acceptable, it's information the f/s can use 
or not.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-24 21:39                     ` Neil Brown
       [not found]                       ` <4B8640A2.4060307@shiftmail.org>
@ 2010-02-28  8:09                       ` Luca Berra
  2010-03-02  5:01                         ` Neil Brown
  1 sibling, 1 reply; 104+ messages in thread
From: Luca Berra @ 2010-02-28  8:09 UTC (permalink / raw)
  To: linux-raid

On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote:
>On Wed, 24 Feb 2010 11:12:09 -0500
>"Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>
>> So realistically both disk blocks are wrong and there's a window until
>> the new, correct block is written.  That window will only cause problems
>> if there is a crash and we'll need to recover.  My main concern here is
>> how big the discrepancy between the disks can get, and whether we'll end
>> up corrupting the filesystem during recovery because we could
>> potentially be matching metadata from one disk with journal entries from
>> another.
>
>After a crash, md will only read from one of the devices (the first) until a
>resync has completed.  So there should be no room for more confusion than you
>would expect on a single device.

After thinking more about this i could come up with another concern
about write ordering.

example
app writes block A, B, C
md writes A on both disks
md writes B on disk1
app writes B again (B')
md writes B' on disk2
now md would write B' again on both disks, but the system crashes
(note, C is never written due to crash)

Disk 1 contains A and B in the correct order, it is missing C and B' but we
dont care, app should be able to recover from a crash

Disk 2 contains A and B', but they are wrongly ordered because C is
missing

If in the above case A and C are data blocks and B contains a journal
related to A and C, booting from disk 2 could result in inconsistent
data.

can the above really happen?
would using barriers remove the above concern?
am i missing something else?

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-25  8:41                         ` Piergiorgio Sartor
@ 2010-03-02  4:57                           ` Neil Brown
  2010-03-02 18:49                             ` Piergiorgio Sartor
  0 siblings, 1 reply; 104+ messages in thread
From: Neil Brown @ 2010-03-02  4:57 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Martin K. Petersen, Bill Davidsen, Steven Haigh, Bryan Mesich,
	Jon, linux-raid

On Thu, 25 Feb 2010 09:41:41 +0100
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:

> Hi,
> 
> > I certainly believe you.
> 
> thank you!
> 
> > That is really odd!  Both the kernel ext3 module (triggered by '-text3')
> > and the 'mount' program use exactly the same test - look for the magic
> > number in the superblock at 1K into the device.
> 
> Today I tried: blkid -p /dev/md1 (this time the live CD
> autoassembled the md device) and it returned something
> like: ambivalent result (probably more than one filesystem...)
> 
> Strange thing is that, the HDDs were brand new, no older
> partitions or filesystem were there.
> 
> Anyway, I've one small correction, the RAID is not 10 f2,
> on this PC, but (due to different installation) a RAID-1
> with superblock 0.9 and the device partitions are set
> to 0xFD (RAID autoassemble).
>  
> > Would you be able to get the first 4K from each device in the raid10:
> >    dd if=/dev/whatever of=/tmp/whatever bs=1K count=4
> > 
> > and the tar/gz those up and send them to me.  That might give some clue.
> > Unless the raid metadata is 1.1 or 1.2 - then I would need blocks further in
> > the device, as the 'data offset'.
> > The --detail output of the array might help too.
> 
> I dumped the first 4K of each device, they're identical
> (so no mismatch there, at least), I'll send them to you,
> together with the detail output.

Thanks.  I finally had a look at these (sorry for delay).

If you run "file" on one of the dumps, it tells you:

$ file disk1.raw 
disk1.raw: Minix filesystem

Which isn't expected.  I would expect something like
$ file xxx
xxx: Linux rev 1.0 ext3 filesystem data, UUID=fe55fe6f-0412-4a0a-852d-a0e21767aa35 (needs journal recovery) (large files)

for an ext3 filesystem.

Looking at /usr/share/misc/magic, it seems that a Minix filesystem is defined
by:
0x410	leshort		0x137f		Minix filesystem

i.e. the 2 bytes at 0x410 into the device are 0x137f, which exactly what we
find in your dump.

0x410 in an ext3 superblock is the lower bytes of "s_free_inodes_count", the
count of free inodes.
Your actual number is 14881663, which is 0x00E3137F.

So if you just add or remove a file, the number of free inodes should change,
and your filesystem will no longer look like a Minix filesystem and
your problems should go away.

I guess libblkid et-al should do more sanity checks on the superblock before
deciding that it really belongs to some particular filesystem.

But I'm happy - this clearly isn't a raid problem.

NeilBrown


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-02-28  8:09                       ` Luca Berra
@ 2010-03-02  5:01                         ` Neil Brown
  2010-03-02  7:36                           ` Luca Berra
  0 siblings, 1 reply; 104+ messages in thread
From: Neil Brown @ 2010-03-02  5:01 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

On Sun, 28 Feb 2010 09:09:49 +0100
Luca Berra <bluca@comedia.it> wrote:

> On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote:
> >On Wed, 24 Feb 2010 11:12:09 -0500
> >"Martin K. Petersen" <martin.petersen@oracle.com> wrote:
> >
> >> So realistically both disk blocks are wrong and there's a window until
> >> the new, correct block is written.  That window will only cause problems
> >> if there is a crash and we'll need to recover.  My main concern here is
> >> how big the discrepancy between the disks can get, and whether we'll end
> >> up corrupting the filesystem during recovery because we could
> >> potentially be matching metadata from one disk with journal entries from
> >> another.
> >
> >After a crash, md will only read from one of the devices (the first) until a
> >resync has completed.  So there should be no room for more confusion than you
> >would expect on a single device.
> 
> After thinking more about this i could come up with another concern
> about write ordering.
> 
> example
> app writes block A, B, C
> md writes A on both disks
> md writes B on disk1
> app writes B again (B')
> md writes B' on disk2
> now md would write B' again on both disks, but the system crashes
> (note, C is never written due to crash)
> 
> Disk 1 contains A and B in the correct order, it is missing C and B' but we
> dont care, app should be able to recover from a crash
> 
> Disk 2 contains A and B', but they are wrongly ordered because C is
> missing
> 
> If in the above case A and C are data blocks and B contains a journal
> related to A and C, booting from disk 2 could result in inconsistent
> data.
> 
> can the above really happen?
> would using barriers remove the above concern?
> am i missing something else?

These is no inconsistency here that a filesystem would not equally expect
from a single device.
After the crash-while-writing B', it should expect to see either B or B',
and it does, depending on which device is primary.

Nothing to see here.

NeilBrown

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02  5:01                         ` Neil Brown
@ 2010-03-02  7:36                           ` Luca Berra
  2010-03-02 10:04                             ` Michael Evans
  0 siblings, 1 reply; 104+ messages in thread
From: Luca Berra @ 2010-03-02  7:36 UTC (permalink / raw)
  To: linux-raid

On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote:
>On Sun, 28 Feb 2010 09:09:49 +0100
>Luca Berra <bluca@comedia.it> wrote:
>
>> On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote:
>> >On Wed, 24 Feb 2010 11:12:09 -0500
>> >"Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>> >
>> >> So realistically both disk blocks are wrong and there's a window until
>> >> the new, correct block is written.  That window will only cause problems
>> >> if there is a crash and we'll need to recover.  My main concern here is
>> >> how big the discrepancy between the disks can get, and whether we'll end
>> >> up corrupting the filesystem during recovery because we could
>> >> potentially be matching metadata from one disk with journal entries from
>> >> another.
>> >
>> >After a crash, md will only read from one of the devices (the first) until a
>> >resync has completed.  So there should be no room for more confusion than you
>> >would expect on a single device.
>> 
>> After thinking more about this i could come up with another concern
>> about write ordering.
>> 
>> example
>> app writes block A, B, C
>> md writes A on both disks
>> md writes B on disk1
>> app writes B again (B')
>> md writes B' on disk2
>> now md would write B' again on both disks, but the system crashes
>> (note, C is never written due to crash)
>> 
>> Disk 1 contains A and B in the correct order, it is missing C and B' but we
>> dont care, app should be able to recover from a crash
>> 
>> Disk 2 contains A and B', but they are wrongly ordered because C is
>> missing
>> 
>> If in the above case A and C are data blocks and B contains a journal
>> related to A and C, booting from disk 2 could result in inconsistent
>> data.
>> 
>> can the above really happen?
>> would using barriers remove the above concern?
>> am i missing something else?
>
>These is no inconsistency here that a filesystem would not equally expect
>from a single device.
>After the crash-while-writing B', it should expect to see either B or B',
>and it does, depending on which device is primary.
>
>Nothing to see here.
I will try to explain better,
the problem is not related to the confusion between B or B'

the problem is that on one disk we have B' _without_ C.

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02  7:36                           ` Luca Berra
@ 2010-03-02 10:04                             ` Michael Evans
  2010-03-02 11:02                               ` Luca Berra
  0 siblings, 1 reply; 104+ messages in thread
From: Michael Evans @ 2010-03-02 10:04 UTC (permalink / raw)
  To: linux-raid

On Mon, Mar 1, 2010 at 11:36 PM, Luca Berra <bluca@comedia.it> wrote:
> On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote:
>>
>> On Sun, 28 Feb 2010 09:09:49 +0100
>> Luca Berra <bluca@comedia.it> wrote:
>>
>>> On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote:
>>> >On Wed, 24 Feb 2010 11:12:09 -0500
>>> >"Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>>> >
>>> >> So realistically both disk blocks are wrong and there's a window until
>>> >> the new, correct block is written.  That window will only cause
>>> >> problems
>>> >> if there is a crash and we'll need to recover.  My main concern here
>>> >> is
>>> >> how big the discrepancy between the disks can get, and whether we'll
>>> >> end
>>> >> up corrupting the filesystem during recovery because we could
>>> >> potentially be matching metadata from one disk with journal entries
>>> >> from
>>> >> another.
>>> >
>>> >After a crash, md will only read from one of the devices (the first)
>>> > until a
>>> >resync has completed.  So there should be no room for more confusion
>>> > than you
>>> >would expect on a single device.
>>>
>>> After thinking more about this i could come up with another concern
>>> about write ordering.
>>>
>>> example
>>> app writes block A, B, C
>>> md writes A on both disks
>>> md writes B on disk1
>>> app writes B again (B')
>>> md writes B' on disk2
>>> now md would write B' again on both disks, but the system crashes
>>> (note, C is never written due to crash)
>>>
>>> Disk 1 contains A and B in the correct order, it is missing C and B' but
>>> we
>>> dont care, app should be able to recover from a crash
>>>
>>> Disk 2 contains A and B', but they are wrongly ordered because C is
>>> missing
>>>
>>> If in the above case A and C are data blocks and B contains a journal
>>> related to A and C, booting from disk 2 could result in inconsistent
>>> data.
>>>
>>> can the above really happen?
>>> would using barriers remove the above concern?
>>> am i missing something else?
>>
>> These is no inconsistency here that a filesystem would not equally expect
>> from a single device.
>> After the crash-while-writing B', it should expect to see either B or B',
>> and it does, depending on which device is primary.
>>
>> Nothing to see here.
>
> I will try to explain better,
> the problem is not related to the confusion between B or B'
>
> the problem is that on one disk we have B' _without_ C.
>
> Regards,
> L.
>
> --
> Luca Berra -- bluca@comedia.it
>        Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>  X        AGAINST HTML MAIL
>  / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

You're demanding full atomic commits; this is precisely what journals
and /barriers/ are for.

Are you are bypassing them in a quest for performance and paying for
it on crashes?
Or is this a hardware bug?
Or is it some glitch in the block device layering leading to barrier
requests not being honored?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 10:04                             ` Michael Evans
@ 2010-03-02 11:02                               ` Luca Berra
  2010-03-02 12:13                                 ` Michael Evans
  2010-03-02 18:14                                 ` Asdo
  0 siblings, 2 replies; 104+ messages in thread
From: Luca Berra @ 2010-03-02 11:02 UTC (permalink / raw)
  To: linux-raid

On Tue, Mar 02, 2010 at 02:04:47AM -0800, Michael Evans wrote:
>On Mon, Mar 1, 2010 at 11:36 PM, Luca Berra <bluca@comedia.it> wrote:
>> On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote:
>>>> Disk 1 contains A and B in the correct order, it is missing C and B' but
>>>> we
>>>> dont care, app should be able to recover from a crash
>>>>
>>>> Disk 2 contains A and B', but they are wrongly ordered because C is
>>>> missing
>>>>
>>>> If in the above case A and C are data blocks and B contains a journal
>>>> related to A and C, booting from disk 2 could result in inconsistent
>>>> data.
>>>>
>>>> can the above really happen?
>>>> would using barriers remove the above concern?
>>>> am i missing something else?
>>>
>>> These is no inconsistency here that a filesystem would not equally expect
>>> from a single device.
>>> After the crash-while-writing B', it should expect to see either B or B',
>>> and it does, depending on which device is primary.
>>>
>>> Nothing to see here.
>>
>> I will try to explain better,
>> the problem is not related to the confusion between B or B'
>>
>> the problem is that on one disk we have B' _without_ C.
>>
>You're demanding full atomic commits; this is precisely what journals
>and /barriers/ are for.
>
>Are you are bypassing them in a quest for performance and paying for
>it on crashes?
>Or is this a hardware bug?
>Or is it some glitch in the block device layering leading to barrier
>requests not being honored?
I just asked for confirmation that with /barriers/ the scenario above
would not happen.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 11:02                               ` Luca Berra
@ 2010-03-02 12:13                                 ` Michael Evans
  2010-03-02 18:14                                 ` Asdo
  1 sibling, 0 replies; 104+ messages in thread
From: Michael Evans @ 2010-03-02 12:13 UTC (permalink / raw)
  To: linux-raid

On Tue, Mar 2, 2010 at 3:02 AM, Luca Berra <bluca@comedia.it> wrote:
> On Tue, Mar 02, 2010 at 02:04:47AM -0800, Michael Evans wrote:
>>
>> On Mon, Mar 1, 2010 at 11:36 PM, Luca Berra <bluca@comedia.it> wrote:
>>>
>>> On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote:
>>>>>
>>>>> Disk 1 contains A and B in the correct order, it is missing C and B'
>>>>> but
>>>>> we
>>>>> dont care, app should be able to recover from a crash
>>>>>
>>>>> Disk 2 contains A and B', but they are wrongly ordered because C is
>>>>> missing
>>>>>
>>>>> If in the above case A and C are data blocks and B contains a journal
>>>>> related to A and C, booting from disk 2 could result in inconsistent
>>>>> data.
>>>>>
>>>>> can the above really happen?
>>>>> would using barriers remove the above concern?
>>>>> am i missing something else?
>>>>
>>>> These is no inconsistency here that a filesystem would not equally
>>>> expect
>>>> from a single device.
>>>> After the crash-while-writing B', it should expect to see either B or
>>>> B',
>>>> and it does, depending on which device is primary.
>>>>
>>>> Nothing to see here.
>>>
>>> I will try to explain better,
>>> the problem is not related to the confusion between B or B'
>>>
>>> the problem is that on one disk we have B' _without_ C.
>>>
>> You're demanding full atomic commits; this is precisely what journals
>> and /barriers/ are for.
>>
>> Are you are bypassing them in a quest for performance and paying for
>> it on crashes?
>> Or is this a hardware bug?
>> Or is it some glitch in the block device layering leading to barrier
>> requests not being honored?
>
> I just asked for confirmation that with /barriers/ the scenario above
> would not happen.
>
> L.
>
> --
> Luca Berra -- bluca@comedia.it
>        Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>  X        AGAINST HTML MAIL
>  / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Yes, obviously atomic commits require barriers.  Older hardware and
operating systems that didn't allow any form of buffering or out of
order operations (hardware can re-arrange commits internally now)
inherently had a barrier between every operation.  Modern devices and
systems have so many layers of interacting buffers with operation
re-ordering to optimize throughput that such predictability is lacking
unless explicitly requested via the form of a barrier.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 11:02                               ` Luca Berra
  2010-03-02 12:13                                 ` Michael Evans
@ 2010-03-02 18:14                                 ` Asdo
  2010-03-02 18:52                                   ` Piergiorgio Sartor
  2010-03-02 20:17                                   ` Neil Brown
  1 sibling, 2 replies; 104+ messages in thread
From: Asdo @ 2010-03-02 18:14 UTC (permalink / raw)
  To: linux-raid

Luca Berra wrote:
>>>
>>> I will try to explain better,
>>> the problem is not related to the confusion between B or B'
>>>
>>> the problem is that on one disk we have B' _without_ C.
>>>
>> You're demanding full atomic commits; this is precisely what journals
>> and /barriers/ are for.
>>
>> Are you are bypassing them in a quest for performance and paying for
>> it on crashes?
>> Or is this a hardware bug?
>> Or is it some glitch in the block device layering leading to barrier
>> requests not being honored?
> I just asked for confirmation that with /barriers/ the scenario above
> would not happen.
>

I think so, that it would not happen: the filesystem would stay 
consistent. (while the mismatches could still happen)

The problem is that the barriers were introduced in all md raids in the 
2.6.33 (just released), and also I have read XFS has a major performance 
drop with barriers activated. People will be tempted to disable 
barriers. AFAIR the performance drop was visible with 1 disk alone, 
imagine now with RAID. And I expect similar performance drops with other 
filesystems, correct me if I am wrong.

Now it would be interesting to understand why the mismatches don't 
happen when LVM is above MD-raid!?
The mechanisms presented up to now on this ML for mismatches don't 
explain why on LVM the same issue doesn't show up. I think.
So you might want to use raid-1 and raid-10 under LVM, just in case....

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02  4:57                           ` Neil Brown
@ 2010-03-02 18:49                             ` Piergiorgio Sartor
  0 siblings, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-03-02 18:49 UTC (permalink / raw)
  To: Neil Brown
  Cc: Piergiorgio Sartor, Martin K. Petersen, Bill Davidsen,
	Steven Haigh, Bryan Mesich, Jon, linux-raid

Hi,

> Thanks.  I finally had a look at these (sorry for delay).

well, thank you for having a look at the thing.

> If you run "file" on one of the dumps, it tells you:
> 
> $ file disk1.raw 
> disk1.raw: Minix filesystem
> 
> Which isn't expected.  I would expect something like
> $ file xxx
> xxx: Linux rev 1.0 ext3 filesystem data, UUID=fe55fe6f-0412-4a0a-852d-a0e21767aa35 (needs journal recovery) (large files)
> 
> for an ext3 filesystem.
> 
> Looking at /usr/share/misc/magic, it seems that a Minix filesystem is defined
> by:
> 0x410	leshort		0x137f		Minix filesystem
> 
> i.e. the 2 bytes at 0x410 into the device are 0x137f, which exactly what we
> find in your dump.
> 
> 0x410 in an ext3 superblock is the lower bytes of "s_free_inodes_count", the
> count of free inodes.
> Your actual number is 14881663, which is 0x00E3137F.

Ah! But this means there is a bug somewhere...
 
> So if you just add or remove a file, the number of free inodes should change,
> and your filesystem will no longer look like a Minix filesystem and
> your problems should go away.

Uhm, OK, I just re-created the MD and the FS, so I took
also the opportunity to increase the chunk size to 512K
and use RAID-10.
 
> I guess libblkid et-al should do more sanity checks on the superblock before
> deciding that it really belongs to some particular filesystem.

So, should one of us file a bug report somewhere?

I mean, it is not only (lib)blkid, but also "file"
which seems to be confused.
BTW, "file" does not seem to use libblkid.

> But I'm happy - this clearly isn't a raid problem.

That's certainly good news, thanks again for
the explanation, I learned something today!

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 18:14                                 ` Asdo
@ 2010-03-02 18:52                                   ` Piergiorgio Sartor
  2010-03-02 23:27                                     ` Asdo
  2010-03-02 20:17                                   ` Neil Brown
  1 sibling, 1 reply; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-03-02 18:52 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

Hi,

> Now it would be interesting to understand why the mismatches don't
> happen when LVM is above MD-raid!?

well, maybe LVM copies the buffer, and after it
plays it nice with MD, i.e. no changes on the fly.

Or maybe, it is just one system that behaves
properly with LVM over MD.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 18:14                                 ` Asdo
  2010-03-02 18:52                                   ` Piergiorgio Sartor
@ 2010-03-02 20:17                                   ` Neil Brown
  1 sibling, 0 replies; 104+ messages in thread
From: Neil Brown @ 2010-03-02 20:17 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

On Tue, 02 Mar 2010 19:14:25 +0100
Asdo <asdo@shiftmail.org> wrote:

> Luca Berra wrote:
> >>>
> >>> I will try to explain better,
> >>> the problem is not related to the confusion between B or B'
> >>>
> >>> the problem is that on one disk we have B' _without_ C.
> >>>
> >> You're demanding full atomic commits; this is precisely what journals
> >> and /barriers/ are for.
> >>
> >> Are you are bypassing them in a quest for performance and paying for
> >> it on crashes?
> >> Or is this a hardware bug?
> >> Or is it some glitch in the block device layering leading to barrier
> >> requests not being honored?
> > I just asked for confirmation that with /barriers/ the scenario above
> > would not happen.
> >
> 
> I think so, that it would not happen: the filesystem would stay 
> consistent. (while the mismatches could still happen)
> 
> The problem is that the barriers were introduced in all md raids in the 
> 2.6.33 (just released), and also I have read XFS has a major performance 
> drop with barriers activated. People will be tempted to disable 
> barriers. AFAIR the performance drop was visible with 1 disk alone, 
> imagine now with RAID. And I expect similar performance drops with other 
> filesystems, correct me if I am wrong.

The barrier support added in 2.6.33 was for striped md arrays.
RAID1, which is not striped, has had barrier support since about 2.6.16,
as it is much easier to implement.

NeilBrown

> 
> Now it would be interesting to understand why the mismatches don't 
> happen when LVM is above MD-raid!?
> The mechanisms presented up to now on this ML for mismatches don't 
> explain why on LVM the same issue doesn't show up. I think.
> So you might want to use raid-1 and raid-10 under LVM, just in case....
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 18:52                                   ` Piergiorgio Sartor
@ 2010-03-02 23:27                                     ` Asdo
  2010-03-03  9:13                                       ` Piergiorgio Sartor
  0 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-03-02 23:27 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

Piergiorgio Sartor wrote:
> Hi,
>   
>> Now it would be interesting to understand why the mismatches don't
>> happen when LVM is above MD-raid!?
>>     
>
> well, maybe LVM copies the buffer, and after it
> plays it nice with MD, i.e. no changes on the fly.
>   

LVM copies the buffer!?
I don't think so...
LVM is near-zero overhead, so I would be surprised if it copied the buffer.
Also I don't think it was needed in their case, except maybe if there is 
an I/O at the boundary of a logical volume or LVM stripe, which would 
certainly be a mistake at requestor side.
LVM also does not merge requests AFAIR. (visible with mdstat -x 1)

> Or maybe, it is just one system that behaves
> properly with LVM over MD.
>   

hmm maybe...

But me also I have never seen mismatches and the only raid-1's I have 
are above LVM. (except /boot but that's almost never modified)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-02 23:27                                     ` Asdo
@ 2010-03-03  9:13                                       ` Piergiorgio Sartor
  2010-03-03 11:42                                         ` Asdo
  0 siblings, 1 reply; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-03-03  9:13 UTC (permalink / raw)
  To: Asdo; +Cc: Piergiorgio Sartor, linux-raid

Hi,

> LVM copies the buffer!?
> I don't think so...
> LVM is near-zero overhead, so I would be surprised if it copied the buffer.

well, I'm not so sure it is near-zero overhead (I'll exaplain
below), and even if, making copies could be still "near-zero"
overhead, it depends on where the bottlenecks are.

I'm not an LVM insider, so this are just random thoughts.

About the near-zero overhead, maybe this could open a
different thread, but just to give some numbers...

I've a bunch of RAID-6 volumes, made of USB disks, i.e. using
PATA<->USB bridges.
This volumes are aggregated using LVM and, on top of that, there
is a LUKS container.

The raw read perfomance on the RAID-6 is, in the best case,
around about 48MB/s, which is pretty good for USB, I guess it
will be difficult to get more.
The raw read perfomance of the LVM volume is i~38MB/s.
The raw read performance of the LUKS is ~28MB/s (actually
maybe a bit less).

Each further layer loses about 10MB/s.

I guess this is much more visible in USB than in SATA/SAS
situations, since going from 205 to 195 might get unnoticed.

This is not a CPU problem, since the PC is dual core, one
core runs and it never exceeds 30%. The USB is slow enough
to allow all the operations to be performed in real-time.

Nevertheless, LVM is doing something there, in this setup
is has an overhead of about 20%, far from zero.
So, the 10MB/s loss could be, again I've no idea on how LVM
works, caused by copying.
Could also be something else, of course, it would be interesting
to have more information from some expert (also to optimize my
USB setup, if possible).

> Also I don't think it was needed in their case, except maybe if

Maybe, but if the filesystem can play with the buffer while
submitted, then I would rather copy the data.

Again, some expert opinion would be appreciated.

> LVM also does not merge requests AFAIR. (visible with mdstat -x 1)

BTW, what's that? I mean "mdstat -x 1"...
 
> But me also I have never seen mismatches and the only raid-1's I
> have are above LVM. (except /boot but that's almost never modified)

Well, that's good, you confirmed my experience.

I've also RAID-10 on LVM and never got mismatches, while the
plain RAID-10 got sometimes.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-03  9:13                                       ` Piergiorgio Sartor
@ 2010-03-03 11:42                                         ` Asdo
  2010-03-03 12:03                                           ` Piergiorgio Sartor
  0 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-03-03 11:42 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

Piergiorgio Sartor wrote:
> I've a bunch of RAID-6 volumes, made of USB disks, i.e. using
> PATA<->USB bridges.
>   
Don't your bridges ever drop out or break?
What is the brand/model?
Years ago I broke alot of those, just by using the disks intensively. 
They probably kinda overheated then failed. They couldn't last 2 days on 
intense disk activity... They were chinese stuff bought on ebay though.

> This volumes are aggregated using LVM and, on top of that, there
> is a LUKS container.
>
> The raw read perfomance on the RAID-6 is, in the best case,
> around about 48MB/s, which is pretty good for USB, I guess it
> will be difficult to get more.
>   
48MB/sec can be good for 1 disk, but it's bad for many disks attached 
separately to USB ports...

> The raw read perfomance of the LVM volume is i~38MB/s.
> The raw read performance of the LUKS is ~28MB/s (actually
> maybe a bit less).
>   
Might your LVM or partition within it be not aligned, or you didn't set 
readahead?
http://www.beowulf.org/archive/2007-May/018359.html
http://www.mail-archive.com/linux-raid@vger.kernel.org/msg10804.html
People using LVM on arrays giving hundreds of MB/sec see slowdowns of 
the order of percent
http://article.gmane.org/gmane.linux.raid/18302

>> LVM also does not merge requests AFAIR. (visible with mdstat -x 1)    
> BTW, what's that? I mean "mdstat -x 1"...  

I'm sorry I meant " iostat -x 1"

>> But me also I have never seen mismatches and the only raid-1's I
>> have are above LVM. (except /boot but that's almost never modified)
>>     
>
> Well, that's good, you confirmed my experience.
>
> I've also RAID-10 on LVM and never got mismatches, while the
> plain RAID-10 got sometimes.
>
>   
This fact needs further investigation methinks...
We could ask to the LVM people if LVM really copies the buffer.

Regards
A.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-03-03 11:42                                         ` Asdo
@ 2010-03-03 12:03                                           ` Piergiorgio Sartor
  0 siblings, 0 replies; 104+ messages in thread
From: Piergiorgio Sartor @ 2010-03-03 12:03 UTC (permalink / raw)
  To: Asdo; +Cc: Piergiorgio Sartor, linux-raid

Hi,

> >I've a bunch of RAID-6 volumes, made of USB disks, i.e. using
> >PATA<->USB bridges.
> Don't your bridges ever drop out or break?

never had problems of broken bridges, but other problems
I had, like unreliable transfer under certain conditions.

> What is the brand/model?

The best I could find were from Digitus, they've a pretty
standard chipset, JMicron I guess, but they seem to be build
better than others, still with JMicron.
No problems, so far.

I've other two brands, same chipset, which seem reliable
for the SATA part, but the PATA does not work properly.

All the other, from different vendors with different
chipset, had never problems.

I can imagine that the PSU that comes together might be a
weak point, I saw real poor quality units.

On the other hand, it's a bunch of RAID-6 for a reason... :-)

> Years ago I broke alot of those, just by using the disks
> intensively. They probably kinda overheated then failed. They
> couldn't last 2 days on intense disk activity... They were chinese
> stuff bought on ebay though.

My use case is an offline storage, so I'll not use the
box for two days, but for several hours I used it.
 
> 48MB/sec can be good for 1 disk, but it's bad for many disks
> attached separately to USB ports...

Well, maybe I forgot to mention that the HDDs are going to
the PC thru an USB HUB (three 4-1 USB, to be precise), i.e.
one single USB connection.
This can do, in theory, 60MB/s, in practice I never saw
more than 50MB/s, in ideal conditions.
So, in my view, 48MB/s is pretty much the max you can get.

> Might your LVM or partition within it be not aligned, or you didn't
> set readahead?

LVM takes care to align itself, this is in the new version,
and also the readahead seems to be automagically set.

Nevertheless, the LUKS is aligned, by hand.

> http://www.beowulf.org/archive/2007-May/018359.html
> http://www.mail-archive.com/linux-raid@vger.kernel.org/msg10804.html
> People using LVM on arrays giving hundreds of MB/sec see slowdowns
> of the order of percent
> http://article.gmane.org/gmane.linux.raid/18302

Thanks for the links, I'll have a look.
 
> >I've also RAID-10 on LVM and never got mismatches, while the
> >plain RAID-10 got sometimes.
> >
> This fact needs further investigation methinks...
> We could ask to the LVM people if LVM really copies the buffer.

Or, in general, if they have any explanation for this
observation of ours.

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: Why does one get mismatches?
@ 2010-02-01 23:14 Jon Hardcastle
  0 siblings, 0 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-02-01 23:14 UTC (permalink / raw)
  To: Neil Brown, Bill Davidsen; +Cc: Jon, linux-raid

But what if 2 of the three sources say xyz then you can make a guess that has a higher propencity to beng right? I guess this would also work for raid 6.

Incidently the problem of my mismatches was almost certainly badly seated ram... But my tests are ongoing to be sure....



-----Original Message-----
From: Neil Brown <neilb@suse.de>
Sent: 01 February 2010 22:37
To: Bill Davidsen <davidsen@tmr.com>
Cc: Jon@eHardcastle.com; linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?

On Mon, 01 Feb 2010 16:18:23 -0500
Bill Davidsen <davidsen@tmr.com> wrote:

> Comment: when there is a three way RAID-1, why doesn't repair *vote* on 
> the correct value instead of just making a guess?
> 

Because truth is not democratic.

(and I defy you to define "correct" in any general way in this context).

NeilBrown


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-29 15:37             ` Jon Hardcastle
  2010-01-29 23:52               ` Goswin von Brederlow
@ 2010-02-01 21:10               ` Bill Davidsen
  1 sibling, 0 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-01 21:10 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid, Goswin von Brederlow

Jon Hardcastle wrote:
> --- On Thu, 28/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>
>   
>> From: Goswin von Brederlow <goswin-v-b@web.de>
>> Subject: Re: Why does one get mismatches?
>> To: linux-raid@vger.kernel.org
>> Date: Thursday, 28 January, 2010, 20:24
>> "Tirumala Reddy Marri" <tmarri@amcc.com>
>> writes:
>>
>>     
>>> I just tried and miss-match count is zero.
>>>       
>> Interesting, is XOR engine
>>     
>>> doing something wrong ? . Then I ran a test where I
>>>       
>> raw-write the file
>>     
>>> to /dev/md0. Then did the raw-read for the same size.
>>>       
>> In this case XOR
>>     
>>> matched as expected. Then I failed a drive using
>>>       
>> "mdadm -f /dev/md0
>>     
>>> /dev/sda". Then I read the same size data again from
>>>       
>> /dev/md0. And
>>     
>>> checksum matches too. 
>>> What is this mean XOR engine is doing right thing. But
>>>       
>> "chec/repair"
>>     
>>> test is not functioning properly with XOR-engine ?
>>>
>>> Or is this something to do with  how the buffers
>>>       
>> are handled ? may they
>>     
>>> are cached ?
>>>
>>>
>>> -Marri
>>>       
>> No idea. But if everything works without the XOR engine and
>> gives
>> mismatches with then I would think there is a software or
>> hardware error
>> there and not in the cable or disks.
>>
>> MfG
>>         Goswin
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>     
>
> i think my RAM is fecked - does that sound like a possible cause? memtest86 gives PAGES of red errors when run but the POST gives nothing and the machine boots.... it has 512MB
>
> i have some more on order as a speculative.. 
>   

I would bet that your RAM is broken. Any errors indicate bad RAM, no 
errors indicate no persistent error. Scrap that RAM. More will make the 
machine faster, too.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 14:46     ` Brett Russ
@ 2010-02-01 20:48       ` Bill Davidsen
  0 siblings, 0 replies; 104+ messages in thread
From: Bill Davidsen @ 2010-02-01 20:48 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

Brett Russ wrote:
> On 01/20/2010 09:34 AM, Jon Hardcastle wrote:
>> I will gather the information you require, but so it is clear it is a
>> a echo 'check' that is kicking off the ultimate mismatch not from
>> boot.
>
> What do you mean by mismatches detected?  How is this observed?
> -BR

I see it as an error count > 0

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-29 23:52               ` Goswin von Brederlow
@ 2010-01-30 10:39                 ` Jon Hardcastle
  0 siblings, 0 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-30 10:39 UTC (permalink / raw)
  To: Jon, Goswin von Brederlow; +Cc: linux-raid


--- On Fri, 29/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:

> From: Goswin von Brederlow <goswin-v-b@web.de>
> Subject: Re: Why does one get mismatches?
> To: Jon@eHardcastle.com
> Cc: linux-raid@vger.kernel.org
> Date: Friday, 29 January, 2010, 23:52
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> writes:
> 
> > --- On Thu, 28/1/10, Goswin von Brederlow <goswin-v-b@web.de>
> wrote:
> >
> >> From: Goswin von Brederlow <goswin-v-b@web.de>
> >> Subject: Re: Why does one get mismatches?
> >> To: linux-raid@vger.kernel.org
> >> Date: Thursday, 28 January, 2010, 20:24
> >> "Tirumala Reddy Marri" <tmarri@amcc.com>
> >> writes:
> >> 
> >> > I just tried and miss-match count is zero.
> >> Interesting, is XOR engine
> >> > doing something wrong ? . Then I ran a test
> where I
> >> raw-write the file
> >> > to /dev/md0. Then did the raw-read for the
> same size.
> >> In this case XOR
> >> > matched as expected. Then I failed a drive
> using
> >> "mdadm -f /dev/md0
> >> > /dev/sda". Then I read the same size data
> again from
> >> /dev/md0. And
> >> > checksum matches too. 
> >> > What is this mean XOR engine is doing right
> thing. But
> >> "chec/repair"
> >> > test is not functioning properly with
> XOR-engine ?
> >> >
> >> > Or is this something to do with  how the
> buffers
> >> are handled ? may they
> >> > are cached ?
> >> >
> >> >
> >> > -Marri
> >> 
> >> No idea. But if everything works without the XOR
> engine and
> >> gives
> >> mismatches with then I would think there is a
> software or
> >> hardware error
> >> there and not in the cable or disks.
> >> 
> >> MfG
> >>         Goswin
> >> --
> >> To unsubscribe from this list: send the line
> "unsubscribe
> >> linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >
> > i think my RAM is fecked - does that sound like a
> possible cause? memtest86 gives PAGES of red errors when run
> but the POST gives nothing and the machine boots.... it has
> 512MB
> >
> > i have some more on order as a speculative.. 
> 
> If memtest gives errors you certainly have errors. The
> reverse isn't
> allways true.
> 
> MfG
>         Goswin
> 

Might have been a false alarm.. both the new chip and old one in an assortment of slots give the errors - upgrade from memtest86 3.3 to 3.5 and the errors go away.


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-29 15:37             ` Jon Hardcastle
@ 2010-01-29 23:52               ` Goswin von Brederlow
  2010-01-30 10:39                 ` Jon Hardcastle
  2010-02-01 21:10               ` Bill Davidsen
  1 sibling, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-29 23:52 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid

Jon Hardcastle <jd_hardcastle@yahoo.com> writes:

> --- On Thu, 28/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>
>> From: Goswin von Brederlow <goswin-v-b@web.de>
>> Subject: Re: Why does one get mismatches?
>> To: linux-raid@vger.kernel.org
>> Date: Thursday, 28 January, 2010, 20:24
>> "Tirumala Reddy Marri" <tmarri@amcc.com>
>> writes:
>> 
>> > I just tried and miss-match count is zero.
>> Interesting, is XOR engine
>> > doing something wrong ? . Then I ran a test where I
>> raw-write the file
>> > to /dev/md0. Then did the raw-read for the same size.
>> In this case XOR
>> > matched as expected. Then I failed a drive using
>> "mdadm -f /dev/md0
>> > /dev/sda". Then I read the same size data again from
>> /dev/md0. And
>> > checksum matches too. 
>> > What is this mean XOR engine is doing right thing. But
>> "chec/repair"
>> > test is not functioning properly with XOR-engine ?
>> >
>> > Or is this something to do with  how the buffers
>> are handled ? may they
>> > are cached ?
>> >
>> >
>> > -Marri
>> 
>> No idea. But if everything works without the XOR engine and
>> gives
>> mismatches with then I would think there is a software or
>> hardware error
>> there and not in the cable or disks.
>> 
>> MfG
>>         Goswin
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>
> i think my RAM is fecked - does that sound like a possible cause? memtest86 gives PAGES of red errors when run but the POST gives nothing and the machine boots.... it has 512MB
>
> i have some more on order as a speculative.. 

If memtest gives errors you certainly have errors. The reverse isn't
allways true.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-28 20:24           ` Goswin von Brederlow
@ 2010-01-29 15:37             ` Jon Hardcastle
  2010-01-29 23:52               ` Goswin von Brederlow
  2010-02-01 21:10               ` Bill Davidsen
  0 siblings, 2 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-29 15:37 UTC (permalink / raw)
  To: linux-raid, Goswin von Brederlow

--- On Thu, 28/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:

> From: Goswin von Brederlow <goswin-v-b@web.de>
> Subject: Re: Why does one get mismatches?
> To: linux-raid@vger.kernel.org
> Date: Thursday, 28 January, 2010, 20:24
> "Tirumala Reddy Marri" <tmarri@amcc.com>
> writes:
> 
> > I just tried and miss-match count is zero.
> Interesting, is XOR engine
> > doing something wrong ? . Then I ran a test where I
> raw-write the file
> > to /dev/md0. Then did the raw-read for the same size.
> In this case XOR
> > matched as expected. Then I failed a drive using
> "mdadm -f /dev/md0
> > /dev/sda". Then I read the same size data again from
> /dev/md0. And
> > checksum matches too. 
> > What is this mean XOR engine is doing right thing. But
> "chec/repair"
> > test is not functioning properly with XOR-engine ?
> >
> > Or is this something to do with  how the buffers
> are handled ? may they
> > are cached ?
> >
> >
> > -Marri
> 
> No idea. But if everything works without the XOR engine and
> gives
> mismatches with then I would think there is a software or
> hardware error
> there and not in the cable or disks.
> 
> MfG
>         Goswin
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

i think my RAM is fecked - does that sound like a possible cause? memtest86 gives PAGES of red errors when run but the POST gives nothing and the machine boots.... it has 512MB

i have some more on order as a speculative.. 


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-28 19:03         ` Tirumala Reddy Marri
@ 2010-01-28 20:24           ` Goswin von Brederlow
  2010-01-29 15:37             ` Jon Hardcastle
  0 siblings, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-28 20:24 UTC (permalink / raw)
  To: linux-raid

"Tirumala Reddy Marri" <tmarri@amcc.com> writes:

> I just tried and miss-match count is zero. Interesting, is XOR engine
> doing something wrong ? . Then I ran a test where I raw-write the file
> to /dev/md0. Then did the raw-read for the same size. In this case XOR
> matched as expected. Then I failed a drive using "mdadm -f /dev/md0
> /dev/sda". Then I read the same size data again from /dev/md0. And
> checksum matches too. 
> What is this mean XOR engine is doing right thing. But "chec/repair"
> test is not functioning properly with XOR-engine ?
>
> Or is this something to do with  how the buffers are handled ? may they
> are cached ?
>
>
> -Marri

No idea. But if everything works without the XOR engine and gives
mismatches with then I would think there is a software or hardware error
there and not in the cable or disks.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: Why does one get mismatches?
  2010-01-28 18:23       ` Goswin von Brederlow
@ 2010-01-28 19:03         ` Tirumala Reddy Marri
  2010-01-28 20:24           ` Goswin von Brederlow
  0 siblings, 1 reply; 104+ messages in thread
From: Tirumala Reddy Marri @ 2010-01-28 19:03 UTC (permalink / raw)
  To: goswin-v-b; +Cc: linux-raid

I just tried and miss-match count is zero. Interesting, is XOR engine
doing something wrong ? . Then I ran a test where I raw-write the file
to /dev/md0. Then did the raw-read for the same size. In this case XOR
matched as expected. Then I failed a drive using "mdadm -f /dev/md0
/dev/sda". Then I read the same size data again from /dev/md0. And
checksum matches too. 
What is this mean XOR engine is doing right thing. But "chec/repair"
test is not functioning properly with XOR-engine ?

Or is this something to do with  how the buffers are handled ? may they
are cached ?


-Marri



-----Original Message-----
From: goswin-v-b@web.de [mailto:goswin-v-b@web.de] 
Sent: Thursday, January 28, 2010 10:24 AM
To: Tirumala Reddy Marri
Cc: linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?

"Tirumala Reddy Marri" <tmarri@amcc.com> writes:

> I have noticed that if I zero the /dev/md0 using "dd if=/dev/zero
> of=/dev/md0 bs=4k count=64k". Then I run the "echo check >
> /sys/block/md0/md/sync_action", no mismatch_cnt reported.  If I use
> "if=/some/file" then I see miss match count to set to huge number.
>
> I am testing with small RAID-5 size for quick testing.  How a reliable
> is this test ? I am using HW accelerated XOR engine for RAID-5.

Have you tried without?

Maybe the XOR engine creates garbage.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-28 17:20     ` Tirumala Reddy Marri
@ 2010-01-28 18:23       ` Goswin von Brederlow
  2010-01-28 19:03         ` Tirumala Reddy Marri
  0 siblings, 1 reply; 104+ messages in thread
From: Goswin von Brederlow @ 2010-01-28 18:23 UTC (permalink / raw)
  To: Tirumala Reddy Marri; +Cc: linux-raid

"Tirumala Reddy Marri" <tmarri@amcc.com> writes:

> I have noticed that if I zero the /dev/md0 using "dd if=/dev/zero
> of=/dev/md0 bs=4k count=64k". Then I run the "echo check >
> /sys/block/md0/md/sync_action", no mismatch_cnt reported.  If I use
> "if=/some/file" then I see miss match count to set to huge number.
>
> I am testing with small RAID-5 size for quick testing.  How a reliable
> is this test ? I am using HW accelerated XOR engine for RAID-5.

Have you tried without?

Maybe the XOR engine creates garbage.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: Why does one get mismatches?
  2010-01-27 21:54   ` Tirumala Reddy Marri
  2010-01-28  9:16     ` Jon Hardcastle
@ 2010-01-28 17:20     ` Tirumala Reddy Marri
  2010-01-28 18:23       ` Goswin von Brederlow
  1 sibling, 1 reply; 104+ messages in thread
From: Tirumala Reddy Marri @ 2010-01-28 17:20 UTC (permalink / raw)
  To: Tirumala Reddy Marri, Steven Haigh, linux-raid

I have noticed that if I zero the /dev/md0 using "dd if=/dev/zero
of=/dev/md0 bs=4k count=64k". Then I run the "echo check >
/sys/block/md0/md/sync_action", no mismatch_cnt reported.  If I use
"if=/some/file" then I see miss match count to set to huge number.

I am testing with small RAID-5 size for quick testing.  How a reliable
is this test ? I am using HW accelerated XOR engine for RAID-5.



-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Tirumala Reddy
Marri
Sent: Wednesday, January 27, 2010 1:54 PM
To: Steven Haigh; linux-raid@vger.kernel.org
Subject: RE: Why does one get mismatches?

I ran  echo check > /sys/bloc/md0/md/sync_action after I ran the "echo
repair > /sys/block/md0/md/sync_action" .
I am seeing whole bunch of mismatch errors like 1233072 .  I am using
RAID-5 array though.



-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Haigh
Sent: Monday, January 25, 2010 2:49 PM
To: linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?


On 26/01/2010, at 7:43 AM, greg@enjellic.com wrote:

> On Jan 21, 12:48pm, Farkas Levente wrote:
> } Subject: Re: Why does one get mismatches?
> 
> Good afternoon to everyone, hope the week is starting well.
> 
>> On 01/21/2010 11:52 AM, Steven Haigh wrote:
>>> On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org>  wrote:
>>>> Steven Haigh wrote:
>>>>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ<bruss@netezza.com>
>>> wrote:
>>>>> 
>>>>> CUT!
>>>> Might that be a problem of the disks/controllers?
>>>> Jon and Steven, what hardware do you have?
>>> 
>>> I'm running some fairly old hardware on this particular server. It's
a
>>> dual P3 1Ghz.
>>> 
>>> After running a repair on /dev/md2, I now see:
>>> # cat /sys/block/md2/md/mismatch_cnt
>>> 1536
>>> 
>>> Again, no smart errors, nothing to indicate a disk problem at all :(
>>> 
>>> As this really keeps killing the machine and it is a live system -
the
>>> only thing I can really think of doing is to break the RAID and just
rsync
>>> the drives twice daily :\
> 
>> the same happened with many people. and we all hate it since it
>> cause a huge load at all weekend on most of our servers:-( according
>> to redhat it's not a bug:-(
> 
> The RAID check/mismatch_count is an example of well intentioned
> technology suffering from 'featuritis' by the distributions which is,
> as I predicted a couple of times in this forum, causing all sorts of
> angst and problems throughout the world.  I've had some posts on this
> subject but will summarize in the hopes of giving some background
> information which will be useful to people.
> 
> There is an issue in the kernel which causes these mismatches.  The
> problem seems to be particularly bad with RAID1 arrays.  The
> contention is that these mismatches are 'harmless' because they only
> occur in areas of the filesystems which are not being used.
> 
> The best description is that the buffers containing the data to be
> written are not 'pinned' all the way down the I/O stack.  This can
> cause the contents of a buffer to be changed while in transit through
> the I/O stack.  Thus one copy of a mirror gets a buffer written to it
> different then the other side of the mirror.
> 
> I've read reasoned discussions about why this occurs with swap over
> RAID1 and why its harmless.  I've set to see the same type of reasoned
> discussion as to why it is not problematic with a filesystem over
> RAID1.  There has been some discussion that its due to high levels of
> MMAP activity on the filesystem.
> 
> We have confirmed, that at least with RAID1, this all occurs with no
> physical corruption on the 'disk drives'.  We implement geographically
> mirror storage with RAID1 against two separate data-centers.  At each
> data-center the RAID1 'block-device' are RAID5 volumes.  These latter
> volumes check out with no errors/mismatch counts etc.  So the issue is
> at the RAID1 data abstraction layer.
> 
> There do not appear to be any tools which allow one to determine
> 'where' the mismatches are.  Such a tool, or logging by the kernel,
> would be useful for people who want to verify what files, if any, are
> affected by the mismatch.  Otherwise running a 'repair' results in the
> RAID1 code arbitraily deciding which of the two blocks is the
> 'correct' one.
> 
> So thats sort of a thumbnail sketch of what is going on.  The fact the
> distributions chose to implement this without understanding the issues
> it presents is a bit problematic.
> 
>>   Levente                               "Si vis pacem para bellum!"
> 
> Hopefully this information is helpful.
> 
> Greg

Hi Greg and all,

The funny part is that I believe the mismatches aren't happening in the
empty space of the filesystem - as it seems that the errors are causing
the ext3 journal to abort and force the filesystem into readonly in my
particular situation.

It is interesting that I do not get any mismatches on md0, md1 or md3 -
only md2.

md0 = /boot
md1 = swap
md2 = /
md3 = /tmp

I ran weekly checks on the all four RAID1 arrays and ONLY md2 had a
problem with mismatches, which also had a habit of going readonly -
therefore I don't believe the part of common belief that this problem
only affects empty parts of the filesystem.

I have also done just about every test to the disks that I can think of
with no errors to be found - leaving only the md layer to be suspect.

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299






--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-28  9:16     ` Jon Hardcastle
@ 2010-01-28 10:29       ` Asdo
  0 siblings, 0 replies; 104+ messages in thread
From: Asdo @ 2010-01-28 10:29 UTC (permalink / raw)
  To: Jon; +Cc: Steven Haigh, linux-raid, Tirumala Reddy Marri

Jon Hardcastle wrote:
> Well, I finished running my none-destructive badblocks check and ran several smart --long tests I also did a forcefsk on the bad boy and NOW the active md4 (with a DEACTIVE vg on it) returns 0 mismatch_cnt. I haven't rebooted it in days though so I just dont know what casued this. No errors in the log, the pending/reallocated sector count is still 0 on all drives.
>
> I have reactivated my VG and am running it again now it is just bizzare.
>   


Very interesting!

I'm wondering what we can infer...
I'm thinking probably it was either a fault of the disks, fixed with 
smart --long tests, or a fault of the filesystem, fixed with forced fsck.
What do you think?

Tell us if mismatches start to happen again

Thank you

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: Why does one get mismatches?
  2010-01-27 21:54   ` Tirumala Reddy Marri
@ 2010-01-28  9:16     ` Jon Hardcastle
  2010-01-28 10:29       ` Asdo
  2010-01-28 17:20     ` Tirumala Reddy Marri
  1 sibling, 1 reply; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-28  9:16 UTC (permalink / raw)
  To: Steven Haigh, linux-raid, Tirumala Reddy Marri

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org]
> On Behalf Of Steven Haigh
> Sent: Monday, January 25, 2010 2:49 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: Why does one get mismatches?
> 
> 
> On 26/01/2010, at 7:43 AM, greg@enjellic.com
> wrote:
> 
> > On Jan 21, 12:48pm, Farkas Levente wrote:
> > } Subject: Re: Why does one get mismatches?
> > 
> > Good afternoon to everyone, hope the week is starting
> well.
> > 
> >> On 01/21/2010 11:52 AM, Steven Haigh wrote:
> >>> On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org> 
> wrote:
> >>>> Steven Haigh wrote:
> >>>>> On Wed, 20 Jan 2010 17:43:45 -0500,
> Brett Russ<bruss@netezza.com>
> >>> wrote:
> >>>>> 
> >>>>> CUT!
> >>>> Might that be a problem of the
> disks/controllers?
> >>>> Jon and Steven, what hardware do you
> have?
> >>> 
> >>> I'm running some fairly old hardware on this
> particular server. It's
> a
> >>> dual P3 1Ghz.
> >>> 
> >>> After running a repair on /dev/md2, I now
> see:
> >>> # cat /sys/block/md2/md/mismatch_cnt
> >>> 1536
> >>> 
> >>> Again, no smart errors, nothing to indicate a
> disk problem at all :(
> >>> 
> >>> As this really keeps killing the machine and
> it is a live system -
> the
> >>> only thing I can really think of doing is to
> break the RAID and just
> rsync
> >>> the drives twice daily :\
> > 
> >> the same happened with many people. and we all
> hate it since it
> >> cause a huge load at all weekend on most of our
> servers:-( according
> >> to redhat it's not a bug:-(
> > 
> > The RAID check/mismatch_count is an example of well
> intentioned
> > technology suffering from 'featuritis' by the
> distributions which is,
> > as I predicted a couple of times in this forum,
> causing all sorts of
> > angst and problems throughout the world.  I've
> had some posts on this
> > subject but will summarize in the hopes of giving some
> background
> > information which will be useful to people.
> > 
> > There is an issue in the kernel which causes these
> mismatches.  The
> > problem seems to be particularly bad with RAID1
> arrays.  The
> > contention is that these mismatches are 'harmless'
> because they only
> > occur in areas of the filesystems which are not being
> used.
> > 
> > The best description is that the buffers containing
> the data to be
> > written are not 'pinned' all the way down the I/O
> stack.  This can
> > cause the contents of a buffer to be changed while in
> transit through
> > the I/O stack.  Thus one copy of a mirror gets a
> buffer written to it
> > different then the other side of the mirror.
> > 
> > I've read reasoned discussions about why this occurs
> with swap over
> > RAID1 and why its harmless.  I've set to see the
> same type of reasoned
> > discussion as to why it is not problematic with a
> filesystem over
> > RAID1.  There has been some discussion that its
> due to high levels of
> > MMAP activity on the filesystem.
> > 
> > We have confirmed, that at least with RAID1, this all
> occurs with no
> > physical corruption on the 'disk drives'.  We
> implement geographically
> > mirror storage with RAID1 against two separate
> data-centers.  At each
> > data-center the RAID1 'block-device' are RAID5
> volumes.  These latter
> > volumes check out with no errors/mismatch counts
> etc.  So the issue is
> > at the RAID1 data abstraction layer.
> > 
> > There do not appear to be any tools which allow one to
> determine
> > 'where' the mismatches are.  Such a tool, or
> logging by the kernel,
> > would be useful for people who want to verify what
> files, if any, are
> > affected by the mismatch.  Otherwise running a
> 'repair' results in the
> > RAID1 code arbitraily deciding which of the two blocks
> is the
> > 'correct' one.
> > 
> > So thats sort of a thumbnail sketch of what is going
> on.  The fact the
> > distributions chose to implement this without
> understanding the issues
> > it presents is a bit problematic.
> > 
> >>   Levente     
>                
>          "Si vis pacem para
> bellum!"
> > 
> > Hopefully this information is helpful.
> > 
> > Greg
> 
> Hi Greg and all,
> 
> The funny part is that I believe the mismatches aren't
> happening in the
> empty space of the filesystem - as it seems that the errors
> are causing
> the ext3 journal to abort and force the filesystem into
> readonly in my
> particular situation.
> 
> It is interesting that I do not get any mismatches on md0,
> md1 or md3 -
> only md2.
> 
> md0 = /boot
> md1 = swap
> md2 = /
> md3 = /tmp
> 
> I ran weekly checks on the all four RAID1 arrays and ONLY
> md2 had a
> problem with mismatches, which also had a habit of going
> readonly -
> therefore I don't believe the part of common belief that
> this problem
> only affects empty parts of the filesystem.
> 
> I have also done just about every test to the disks that I
> can think of
> with no errors to be found - leaving only the md layer to
> be suspect.
> 
> --
> Steven Haigh
> 
> Email: netwiz@crc.id.au
> Web: http://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
> Fax: (03) 8338 0299
> 

Well, I finished running my none-destructive badblocks check and ran several smart --long tests I also did a forcefsk on the bad boy and NOW the active md4 (with a DEACTIVE vg on it) returns 0 mismatch_cnt. I haven't rebooted it in days though so I just dont know what casued this. No errors in the log, the pending/reallocated sector count is still 0 on all drives.

I have reactivated my VG and am running it again now it is just bizzare.



      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: Why does one get mismatches?
  2010-01-25 22:49 ` Steven Haigh
@ 2010-01-27 21:54   ` Tirumala Reddy Marri
  2010-01-28  9:16     ` Jon Hardcastle
  2010-01-28 17:20     ` Tirumala Reddy Marri
  0 siblings, 2 replies; 104+ messages in thread
From: Tirumala Reddy Marri @ 2010-01-27 21:54 UTC (permalink / raw)
  To: Steven Haigh, linux-raid

I ran  echo check > /sys/bloc/md0/md/sync_action after I ran the "echo
repair > /sys/block/md0/md/sync_action" .
I am seeing whole bunch of mismatch errors like 1233072 .  I am using
RAID-5 array though.



-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Haigh
Sent: Monday, January 25, 2010 2:49 PM
To: linux-raid@vger.kernel.org
Subject: Re: Why does one get mismatches?


On 26/01/2010, at 7:43 AM, greg@enjellic.com wrote:

> On Jan 21, 12:48pm, Farkas Levente wrote:
> } Subject: Re: Why does one get mismatches?
> 
> Good afternoon to everyone, hope the week is starting well.
> 
>> On 01/21/2010 11:52 AM, Steven Haigh wrote:
>>> On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org>  wrote:
>>>> Steven Haigh wrote:
>>>>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ<bruss@netezza.com>
>>> wrote:
>>>>> 
>>>>> CUT!
>>>> Might that be a problem of the disks/controllers?
>>>> Jon and Steven, what hardware do you have?
>>> 
>>> I'm running some fairly old hardware on this particular server. It's
a
>>> dual P3 1Ghz.
>>> 
>>> After running a repair on /dev/md2, I now see:
>>> # cat /sys/block/md2/md/mismatch_cnt
>>> 1536
>>> 
>>> Again, no smart errors, nothing to indicate a disk problem at all :(
>>> 
>>> As this really keeps killing the machine and it is a live system -
the
>>> only thing I can really think of doing is to break the RAID and just
rsync
>>> the drives twice daily :\
> 
>> the same happened with many people. and we all hate it since it
>> cause a huge load at all weekend on most of our servers:-( according
>> to redhat it's not a bug:-(
> 
> The RAID check/mismatch_count is an example of well intentioned
> technology suffering from 'featuritis' by the distributions which is,
> as I predicted a couple of times in this forum, causing all sorts of
> angst and problems throughout the world.  I've had some posts on this
> subject but will summarize in the hopes of giving some background
> information which will be useful to people.
> 
> There is an issue in the kernel which causes these mismatches.  The
> problem seems to be particularly bad with RAID1 arrays.  The
> contention is that these mismatches are 'harmless' because they only
> occur in areas of the filesystems which are not being used.
> 
> The best description is that the buffers containing the data to be
> written are not 'pinned' all the way down the I/O stack.  This can
> cause the contents of a buffer to be changed while in transit through
> the I/O stack.  Thus one copy of a mirror gets a buffer written to it
> different then the other side of the mirror.
> 
> I've read reasoned discussions about why this occurs with swap over
> RAID1 and why its harmless.  I've set to see the same type of reasoned
> discussion as to why it is not problematic with a filesystem over
> RAID1.  There has been some discussion that its due to high levels of
> MMAP activity on the filesystem.
> 
> We have confirmed, that at least with RAID1, this all occurs with no
> physical corruption on the 'disk drives'.  We implement geographically
> mirror storage with RAID1 against two separate data-centers.  At each
> data-center the RAID1 'block-device' are RAID5 volumes.  These latter
> volumes check out with no errors/mismatch counts etc.  So the issue is
> at the RAID1 data abstraction layer.
> 
> There do not appear to be any tools which allow one to determine
> 'where' the mismatches are.  Such a tool, or logging by the kernel,
> would be useful for people who want to verify what files, if any, are
> affected by the mismatch.  Otherwise running a 'repair' results in the
> RAID1 code arbitraily deciding which of the two blocks is the
> 'correct' one.
> 
> So thats sort of a thumbnail sketch of what is going on.  The fact the
> distributions chose to implement this without understanding the issues
> it presents is a bit problematic.
> 
>>   Levente                               "Si vis pacem para bellum!"
> 
> Hopefully this information is helpful.
> 
> Greg

Hi Greg and all,

The funny part is that I believe the mismatches aren't happening in the
empty space of the filesystem - as it seems that the errors are causing
the ext3 journal to abort and force the filesystem into readonly in my
particular situation.

It is interesting that I do not get any mismatches on md0, md1 or md3 -
only md2.

md0 = /boot
md1 = swap
md2 = /
md3 = /tmp

I ran weekly checks on the all four RAID1 arrays and ONLY md2 had a
problem with mismatches, which also had a habit of going readonly -
therefore I don't believe the part of common belief that this problem
only affects empty parts of the filesystem.

I have also done just about every test to the disks that I can think of
with no errors to be found - leaving only the md layer to be suspect.

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299






--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-25 20:43 greg
@ 2010-01-25 22:49 ` Steven Haigh
  2010-01-27 21:54   ` Tirumala Reddy Marri
  0 siblings, 1 reply; 104+ messages in thread
From: Steven Haigh @ 2010-01-25 22:49 UTC (permalink / raw)
  To: linux-raid


On 26/01/2010, at 7:43 AM, greg@enjellic.com wrote:

> On Jan 21, 12:48pm, Farkas Levente wrote:
> } Subject: Re: Why does one get mismatches?
> 
> Good afternoon to everyone, hope the week is starting well.
> 
>> On 01/21/2010 11:52 AM, Steven Haigh wrote:
>>> On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org>  wrote:
>>>> Steven Haigh wrote:
>>>>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ<bruss@netezza.com>
>>> wrote:
>>>>> 
>>>>> CUT!
>>>> Might that be a problem of the disks/controllers?
>>>> Jon and Steven, what hardware do you have?
>>> 
>>> I'm running some fairly old hardware on this particular server. It's a
>>> dual P3 1Ghz.
>>> 
>>> After running a repair on /dev/md2, I now see:
>>> # cat /sys/block/md2/md/mismatch_cnt
>>> 1536
>>> 
>>> Again, no smart errors, nothing to indicate a disk problem at all :(
>>> 
>>> As this really keeps killing the machine and it is a live system - the
>>> only thing I can really think of doing is to break the RAID and just rsync
>>> the drives twice daily :\
> 
>> the same happened with many people. and we all hate it since it
>> cause a huge load at all weekend on most of our servers:-( according
>> to redhat it's not a bug:-(
> 
> The RAID check/mismatch_count is an example of well intentioned
> technology suffering from 'featuritis' by the distributions which is,
> as I predicted a couple of times in this forum, causing all sorts of
> angst and problems throughout the world.  I've had some posts on this
> subject but will summarize in the hopes of giving some background
> information which will be useful to people.
> 
> There is an issue in the kernel which causes these mismatches.  The
> problem seems to be particularly bad with RAID1 arrays.  The
> contention is that these mismatches are 'harmless' because they only
> occur in areas of the filesystems which are not being used.
> 
> The best description is that the buffers containing the data to be
> written are not 'pinned' all the way down the I/O stack.  This can
> cause the contents of a buffer to be changed while in transit through
> the I/O stack.  Thus one copy of a mirror gets a buffer written to it
> different then the other side of the mirror.
> 
> I've read reasoned discussions about why this occurs with swap over
> RAID1 and why its harmless.  I've set to see the same type of reasoned
> discussion as to why it is not problematic with a filesystem over
> RAID1.  There has been some discussion that its due to high levels of
> MMAP activity on the filesystem.
> 
> We have confirmed, that at least with RAID1, this all occurs with no
> physical corruption on the 'disk drives'.  We implement geographically
> mirror storage with RAID1 against two separate data-centers.  At each
> data-center the RAID1 'block-device' are RAID5 volumes.  These latter
> volumes check out with no errors/mismatch counts etc.  So the issue is
> at the RAID1 data abstraction layer.
> 
> There do not appear to be any tools which allow one to determine
> 'where' the mismatches are.  Such a tool, or logging by the kernel,
> would be useful for people who want to verify what files, if any, are
> affected by the mismatch.  Otherwise running a 'repair' results in the
> RAID1 code arbitraily deciding which of the two blocks is the
> 'correct' one.
> 
> So thats sort of a thumbnail sketch of what is going on.  The fact the
> distributions chose to implement this without understanding the issues
> it presents is a bit problematic.
> 
>>   Levente                               "Si vis pacem para bellum!"
> 
> Hopefully this information is helpful.
> 
> Greg

Hi Greg and all,

The funny part is that I believe the mismatches aren't happening in the empty space of the filesystem - as it seems that the errors are causing the ext3 journal to abort and force the filesystem into readonly in my particular situation.

It is interesting that I do not get any mismatches on md0, md1 or md3 - only md2.

md0 = /boot
md1 = swap
md2 = /
md3 = /tmp

I ran weekly checks on the all four RAID1 arrays and ONLY md2 had a problem with mismatches, which also had a habit of going readonly - therefore I don't believe the part of common belief that this problem only affects empty parts of the filesystem.

I have also done just about every test to the disks that I can think of with no errors to be found - leaving only the md layer to be suspect.

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299







^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
@ 2010-01-25 20:43 greg
  2010-01-25 22:49 ` Steven Haigh
  0 siblings, 1 reply; 104+ messages in thread
From: greg @ 2010-01-25 20:43 UTC (permalink / raw)
  To: Farkas Levente, Steven Haigh; +Cc: Asdo, linux-raid

On Jan 21, 12:48pm, Farkas Levente wrote:
} Subject: Re: Why does one get mismatches?

Good afternoon to everyone, hope the week is starting well.

> On 01/21/2010 11:52 AM, Steven Haigh wrote:
> > On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org>  wrote:
> >> Steven Haigh wrote:
> >>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ<bruss@netezza.com>
> > wrote:
> >>>
> >>> CUT!
> >> Might that be a problem of the disks/controllers?
> >> Jon and Steven, what hardware do you have?
> >
> > I'm running some fairly old hardware on this particular server. It's a
> > dual P3 1Ghz.
> >
> > After running a repair on /dev/md2, I now see:
> > # cat /sys/block/md2/md/mismatch_cnt
> > 1536
> >
> > Again, no smart errors, nothing to indicate a disk problem at all :(
> >
> > As this really keeps killing the machine and it is a live system - the
> > only thing I can really think of doing is to break the RAID and just rsync
> > the drives twice daily :\

> the same happened with many people. and we all hate it since it
> cause a huge load at all weekend on most of our servers:-( according
> to redhat it's not a bug:-(

The RAID check/mismatch_count is an example of well intentioned
technology suffering from 'featuritis' by the distributions which is,
as I predicted a couple of times in this forum, causing all sorts of
angst and problems throughout the world.  I've had some posts on this
subject but will summarize in the hopes of giving some background
information which will be useful to people.

There is an issue in the kernel which causes these mismatches.  The
problem seems to be particularly bad with RAID1 arrays.  The
contention is that these mismatches are 'harmless' because they only
occur in areas of the filesystems which are not being used.

The best description is that the buffers containing the data to be
written are not 'pinned' all the way down the I/O stack.  This can
cause the contents of a buffer to be changed while in transit through
the I/O stack.  Thus one copy of a mirror gets a buffer written to it
different then the other side of the mirror.

I've read reasoned discussions about why this occurs with swap over
RAID1 and why its harmless.  I've set to see the same type of reasoned
discussion as to why it is not problematic with a filesystem over
RAID1.  There has been some discussion that its due to high levels of
MMAP activity on the filesystem.

We have confirmed, that at least with RAID1, this all occurs with no
physical corruption on the 'disk drives'.  We implement geographically
mirror storage with RAID1 against two separate data-centers.  At each
data-center the RAID1 'block-device' are RAID5 volumes.  These latter
volumes check out with no errors/mismatch counts etc.  So the issue is
at the RAID1 data abstraction layer.

There do not appear to be any tools which allow one to determine
'where' the mismatches are.  Such a tool, or logging by the kernel,
would be useful for people who want to verify what files, if any, are
affected by the mismatch.  Otherwise running a 'repair' results in the
RAID1 code arbitraily deciding which of the two blocks is the
'correct' one.

So thats sort of a thumbnail sketch of what is going on.  The fact the
distributions chose to implement this without understanding the issues
it presents is a bit problematic.

>    Levente                               "Si vis pacem para bellum!"

Hopefully this information is helpful.

Greg

}-- End of excerpt from Farkas Levente

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"I am returning this otherwise good typing paper to you because
 someone has printed gibberish all over it and put your name at the
 top.
                                -- English Professor, Ohio University

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-22 16:22   ` Jon Hardcastle
  2010-01-22 16:34     ` Asdo
@ 2010-01-22 17:41     ` Brett Russ
  1 sibling, 0 replies; 104+ messages in thread
From: Brett Russ @ 2010-01-22 17:41 UTC (permalink / raw)
  To: linux-raid

On 01/22/2010 11:22 AM, Jon Hardcastle wrote:
> <SNIP>
>>
>> Note that if your md device is not in a read-only mode that the
>> member states may be changing underneath you as you run the above
>> command. Therefore, you should either stop the device then run the
>> commands, or at least have the device in a read-only mode first.
>>
>> -BR
>>
>
> I have just tried this - i umounted all LV and then deactivated the
> VG. I set to read-only but now any attempt to echo check>
> sync_action results in

Sorry for the misunderstanding, I was suggesting putting the array in 
read only mode only for the purposes of doing the 'mdadm --examine' to 
detect if member devices were out of sync with each other.

But, it turns out that the mismatches you're seeing are not a result of 
the member devices being out of sync with each other but rather member 
devices throwing errors.  Sounds like other people see this same 
behavior and it's not necessarily tied to any disk sector read errors. 
If there are also no I/O errors in the kernel log during your 'check' 
operation, you'll need either more verbose md logging during the check 
or a look at the code to see what other kinds of errors bump the 
mismatch counter.

-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-22 16:22   ` Jon Hardcastle
@ 2010-01-22 16:34     ` Asdo
  2010-01-22 17:41     ` Brett Russ
  1 sibling, 0 replies; 104+ messages in thread
From: Asdo @ 2010-01-22 16:34 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid, Brett Russ

Jon Hardcastle wrote:
> <SNIP>
>   
>> Note that if your md device is not in a read-only mode that
>> the member states may be changing underneath you as you run
>> the above command. Therefore, you should either stop the
>> device then run the commands, or at least have the device in
>> a read-only mode first.
>>
>> -BR
>>
>>     
>
> I have just tried this - i umounted all LV and then deactivated the VG. I set to read-only but now any attempt to echo check > sync_action results in
>
> 'write error: device or resource busy'
>
> any clues/
>   

I think running check or repair is not supported with the MD array set 
read-only.
I remember Neil Brown himself saying that, and I seem to recall I have 
also seen this in the source code.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 14:19 ` Brett Russ
  2010-01-20 14:34   ` Jon Hardcastle
@ 2010-01-22 16:22   ` Jon Hardcastle
  2010-01-22 16:34     ` Asdo
  2010-01-22 17:41     ` Brett Russ
  1 sibling, 2 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-22 16:22 UTC (permalink / raw)
  To: linux-raid, Brett Russ

<SNIP>
> 
> Note that if your md device is not in a read-only mode that
> the member states may be changing underneath you as you run
> the above command. Therefore, you should either stop the
> device then run the commands, or at least have the device in
> a read-only mode first.
> 
> -BR
> 

I have just tried this - i umounted all LV and then deactivated the VG. I set to read-only but now any attempt to echo check > sync_action results in

'write error: device or resource busy'

any clues/


      

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-21 11:48                 ` Farkas Levente
@ 2010-01-21 12:15                   ` Jon Hardcastle
  0 siblings, 0 replies; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-21 12:15 UTC (permalink / raw)
  To: Steven Haigh, Farkas Levente; +Cc: Asdo, linux-raid


--- On Thu, 21/1/10, Farkas Levente <lfarkas@lfarkas.org> wrote:

> From: Farkas Levente <lfarkas@lfarkas.org>
> Subject: Re: Why does one get mismatches?
> To: "Steven Haigh" <netwiz@crc.id.au>
> Cc: "Asdo" <asdo@shiftmail.org>, linux-raid@vger.kernel.org
> Date: Thursday, 21 January, 2010, 11:48
> On 01/21/2010 11:52 AM, Steven Haigh
> wrote:
> > On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org> 
> wrote:
> >> Steven Haigh wrote:
> >>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett
> Russ<bruss@netezza.com>
> > wrote:
> >>>
> >>> CUT!
> >> Might that be a problem of the disks/controllers?
> >> Jon and Steven, what hardware do you have?
> >
> > I'm running some fairly old hardware on this
> particular server. It's a
> > dual P3 1Ghz.
> >
> > After running a repair on /dev/md2, I now see:
> > # cat /sys/block/md2/md/mismatch_cnt
> > 1536
> >
> > Again, no smart errors, nothing to indicate a disk
> problem at all :(
> >
> > As this really keeps killing the machine and it is a
> live system - the
> > only thing I can really think of doing is to break the
> RAID and just rsync
> > the drives twice daily :\
> 
> the same happened with many people. and we all hate it
> since it cause a 
> huge load at all weekend on most of our servers:-(
> according to redhat it's not a bug:-(
> 
> -- 
>    Levente         
>                
>      "Si vis pacem para bellum!"
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Well i am running a Semperon based desktop system that has 4 in built sata and 2 IDE, and 2 PCI-E controller cards exposing 2 sata ports each.

I have off the IDE 2 320GB HDD's that is split across 3 md's boot/swap/main. Only the Main is 'check'ed/repaired. Very rarely have a problem here!

On the Sata's I have 7 HDD's of varying size(4x500, 2x750, 1x1TB) and makes(Samsung, Hitachi, Seagate) strung together to form a now raid6 (raid5 until a couple of weeks ago). On top of that i have a VG split into ~6 LV's and in some of those i have mount SquashFS filesystems. until i moved the drive order around at the weekend for access issues. I didn't really have any problems - except the occasional issue - I scrub it weekly currently.

BUT I have only just converted from raid5 to 6 and probably not run that many checks since so it could be related to that!




      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-21 10:52               ` Steven Haigh
@ 2010-01-21 11:48                 ` Farkas Levente
  2010-01-21 12:15                   ` Jon Hardcastle
  0 siblings, 1 reply; 104+ messages in thread
From: Farkas Levente @ 2010-01-21 11:48 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Asdo, linux-raid

On 01/21/2010 11:52 AM, Steven Haigh wrote:
> On Thu, 21 Jan 2010 09:08:42 +0100, Asdo<asdo@shiftmail.org>  wrote:
>> Steven Haigh wrote:
>>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ<bruss@netezza.com>
> wrote:
>>>
>>> CUT!
>> Might that be a problem of the disks/controllers?
>> Jon and Steven, what hardware do you have?
>
> I'm running some fairly old hardware on this particular server. It's a
> dual P3 1Ghz.
>
> After running a repair on /dev/md2, I now see:
> # cat /sys/block/md2/md/mismatch_cnt
> 1536
>
> Again, no smart errors, nothing to indicate a disk problem at all :(
>
> As this really keeps killing the machine and it is a live system - the
> only thing I can really think of doing is to break the RAID and just rsync
> the drives twice daily :\

the same happened with many people. and we all hate it since it cause a 
huge load at all weekend on most of our servers:-(
according to redhat it's not a bug:-(

-- 
   Levente                               "Si vis pacem para bellum!"

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-21  8:08             ` Asdo
@ 2010-01-21 10:52               ` Steven Haigh
  2010-01-21 11:48                 ` Farkas Levente
  0 siblings, 1 reply; 104+ messages in thread
From: Steven Haigh @ 2010-01-21 10:52 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

On Thu, 21 Jan 2010 09:08:42 +0100, Asdo <asdo@shiftmail.org> wrote:
> Steven Haigh wrote:
>> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ <bruss@netezza.com>
wrote:
>>
>> CUT!
> Might that be a problem of the disks/controllers?
> Jon and Steven, what hardware do you have?

I'm running some fairly old hardware on this particular server. It's a
dual P3 1Ghz.

After running a repair on /dev/md2, I now see:
# cat /sys/block/md2/md/mismatch_cnt
1536

Again, no smart errors, nothing to indicate a disk problem at all :(

As this really keeps killing the machine and it is a live system - the
only thing I can really think of doing is to break the RAID and just rsync
the drives twice daily :\

-- 
Steven Haigh
 
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-21  4:17           ` Steven Haigh
@ 2010-01-21  8:08             ` Asdo
  2010-01-21 10:52               ` Steven Haigh
  0 siblings, 1 reply; 104+ messages in thread
From: Asdo @ 2010-01-21  8:08 UTC (permalink / raw)
  To: Steven Haigh; +Cc: linux-raid

Steven Haigh wrote:
> On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ <bruss@netezza.com> wrote:
>
> CUT!
Might that be a problem of the disks/controllers?
Jon and Steven, what hardware do you have?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 22:43         ` Brett Russ
  2010-01-20 23:01           ` Christopher Chen
@ 2010-01-21  4:17           ` Steven Haigh
  2010-01-21  8:08             ` Asdo
  1 sibling, 1 reply; 104+ messages in thread
From: Steven Haigh @ 2010-01-21  4:17 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

On Wed, 20 Jan 2010 17:43:45 -0500, Brett Russ <bruss@netezza.com> wrote:
> On 01/20/2010 05:30 PM, Majed B. wrote:
>> He needs to run a full offline or long test before checking with
>> smartctl -a -- since it won't show any sector errors if those tests
>> weren't run at least once.
> 
> Not sure I agree with that.  The md checks he's been doing will cause a 
> read of all data regions of the relevant partition and if the disk is 
> throwing errors, those sectors should be marked probational.  Then, if a

> subsequent repair ends up remapping them, those sectors will show up as 
> remapped.
> 
> The grep will show both probational and remapped sector counts for each 
> drive.
> 
> BTW, the cmd should also include an echo so it's easy to tell which 
> drive is being reported:
> 
> for di in a b c d e f g; do echo $di; smartctl -a /dev/sd$di | grep -i 
> _sect; done

Interestingly enough, I'm struggling with a system on this matter too... I
can never seem to get rid of mismatches.

# for di in a b c d e f g; do echo $di; smartctl -a /dev/hd$di | grep -i
sect; done
a
=== START OF INFORMATION SECTION ===
=== START OF READ SMART DATA SECTION ===
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always  
   -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always  
   -       0
b
c
=== START OF INFORMATION SECTION ===
=== START OF READ SMART DATA SECTION ===
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always  
   -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always  
   -       0

Full offline tests of both drives less than 400 power on hours ago all
came up clean. No read errors. Just mismatches.

I can run a repair on them and STILL have mismatches again after a check.
At the moment:

# cat /sys/block/md2/md/mismatch_cnt
1024

It's in the middle of a repair now - as quite often the filesystem on
/dev/md2 will go read-only due to a journal error. I've tried everything
except replacing hardware to figure out what's going on here - but it will
do this like clockwork every month. A reboot later and it'll run an fsck,
find no errors, then between 21 and 30 days later it will go readonly
again.

It's annoying as hell and I wish I could get to the bottom of it!

-- 
Steven Haigh
 
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 22:43         ` Brett Russ
@ 2010-01-20 23:01           ` Christopher Chen
  2010-01-21  4:17           ` Steven Haigh
  1 sibling, 0 replies; 104+ messages in thread
From: Christopher Chen @ 2010-01-20 23:01 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

I keep misreading the the subject of this email thread as:

"Why does one get mustaches?"

cc

On Wed, Jan 20, 2010 at 2:43 PM, Brett Russ <bruss@netezza.com> wrote:
> On 01/20/2010 05:30 PM, Majed B. wrote:
>>
>> He needs to run a full offline or long test before checking with
>> smartctl -a -- since it won't show any sector errors if those tests
>> weren't run at least once.
>
> Not sure I agree with that.  The md checks he's been doing will cause a read
> of all data regions of the relevant partition and if the disk is throwing
> errors, those sectors should be marked probational.  Then, if a subsequent
> repair ends up remapping them, those sectors will show up as remapped.
>
> The grep will show both probational and remapped sector counts for each
> drive.
>
> BTW, the cmd should also include an echo so it's easy to tell which drive is
> being reported:
>
> for di in a b c d e f g; do echo $di; smartctl -a /dev/sd$di | grep -i
> _sect; done
>
> -BR
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Chris Chen <muffaleta@gmail.com>
"The fact that yours is better than anyone else's
is not a guarantee that it's any good."
-- Motivational Poster
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 22:30       ` Majed B.
@ 2010-01-20 22:43         ` Brett Russ
  2010-01-20 23:01           ` Christopher Chen
  2010-01-21  4:17           ` Steven Haigh
  0 siblings, 2 replies; 104+ messages in thread
From: Brett Russ @ 2010-01-20 22:43 UTC (permalink / raw)
  To: linux-raid

On 01/20/2010 05:30 PM, Majed B. wrote:
> He needs to run a full offline or long test before checking with
> smartctl -a -- since it won't show any sector errors if those tests
> weren't run at least once.

Not sure I agree with that.  The md checks he's been doing will cause a 
read of all data regions of the relevant partition and if the disk is 
throwing errors, those sectors should be marked probational.  Then, if a 
subsequent repair ends up remapping them, those sectors will show up as 
remapped.

The grep will show both probational and remapped sector counts for each 
drive.

BTW, the cmd should also include an echo so it's easy to tell which 
drive is being reported:

for di in a b c d e f g; do echo $di; smartctl -a /dev/sd$di | grep -i 
_sect; done

-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 22:25     ` Brett Russ
@ 2010-01-20 22:30       ` Majed B.
  2010-01-20 22:43         ` Brett Russ
  0 siblings, 1 reply; 104+ messages in thread
From: Majed B. @ 2010-01-20 22:30 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

He needs to run a full offline or long test before checking with
smartctl -a -- since it won't show any sector errors if those tests
weren't run at least once.

On Thu, Jan 21, 2010 at 1:25 AM, Brett Russ <bruss@netezza.com> wrote:
> On 01/20/2010 03:44 PM, Majed B. wrote:
>>
>> You can find details on these parameters and their values in md's
>> documentation: http://www.mjmwired.net/kernel/Documentation/md.txt
>>
>> A mismatch count can be checked by writing "check" to the proper file
>> as stated in line 376:
>> http://www.mjmwired.net/kernel/Documentation/md.txt#376
>
> Sounds like Jon may have a flaky HDD if certain members continually throw
> errors.  Jon, can you check SMART stats on your drives?
>
> for di in a b c d e f g; do smartctl -a /dev/sd$di | grep -i _sect; done
>
> you may have to add another option to smartctl, I forget.
>
> -BR
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 20:44   ` Majed B.
@ 2010-01-20 22:25     ` Brett Russ
  2010-01-20 22:30       ` Majed B.
  0 siblings, 1 reply; 104+ messages in thread
From: Brett Russ @ 2010-01-20 22:25 UTC (permalink / raw)
  To: linux-raid

On 01/20/2010 03:44 PM, Majed B. wrote:
> You can find details on these parameters and their values in md's
> documentation: http://www.mjmwired.net/kernel/Documentation/md.txt
>
> A mismatch count can be checked by writing "check" to the proper file
> as stated in line 376:
> http://www.mjmwired.net/kernel/Documentation/md.txt#376

Sounds like Jon may have a flaky HDD if certain members continually 
throw errors.  Jon, can you check SMART stats on your drives?

for di in a b c d e f g; do smartctl -a /dev/sd$di | grep -i _sect; done

you may have to add another option to smartctl, I forget.

-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 15:34 ` Brett Russ
@ 2010-01-20 20:44   ` Majed B.
  2010-01-20 22:25     ` Brett Russ
  0 siblings, 1 reply; 104+ messages in thread
From: Majed B. @ 2010-01-20 20:44 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

You can find details on these parameters and their values in md's
documentation: http://www.mjmwired.net/kernel/Documentation/md.txt

A mismatch count can be checked by writing "check" to the proper file
as stated in line 376:
http://www.mjmwired.net/kernel/Documentation/md.txt#376

On Wed, Jan 20, 2010 at 6:34 PM, Brett Russ <bruss@netezza.com> wrote:
> On 01/20/2010 10:03 AM, Jon Hardcastle wrote:
>>
>> --- On Wed, 20/1/10, Brett Russ<bruss@netezza.com>  wrote:
>>>
>>> What do you mean by mismatches detected?  How is this observed?
>>
>> cat /sys/block/md4/md/mismatch_cnt
>
> Someone else will need to comment on what this value pertains to and how it
> should behave.
>
>> I have been able to get the info you asked for using mdadm -E but not
>> with the array stopped. I cant stop it just yet. What would you be
>> looking for in this data?
>
> Simply that the "Update Time" and "Events" values matched across all
> members, which they do.
>
> -BR
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 15:03 Jon Hardcastle
@ 2010-01-20 15:34 ` Brett Russ
  2010-01-20 20:44   ` Majed B.
  0 siblings, 1 reply; 104+ messages in thread
From: Brett Russ @ 2010-01-20 15:34 UTC (permalink / raw)
  To: linux-raid

On 01/20/2010 10:03 AM, Jon Hardcastle wrote:
> --- On Wed, 20/1/10, Brett Russ<bruss@netezza.com>  wrote:
>> What do you mean by mismatches detected?  How is this observed?
>
> cat /sys/block/md4/md/mismatch_cnt

Someone else will need to comment on what this value pertains to and how 
it should behave.

> I have been able to get the info you asked for using mdadm -E but not
> with the array stopped. I cant stop it just yet. What would you be
> looking for in this data?

Simply that the "Update Time" and "Events" values matched across all 
members, which they do.

-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
@ 2010-01-20 15:03 Jon Hardcastle
  2010-01-20 15:34 ` Brett Russ
  0 siblings, 1 reply; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-20 15:03 UTC (permalink / raw)
  To: linux-raid, Brett Russ

[-- Attachment #1: Type: text/plain, Size: 1512 bytes --]

--- On Wed, 20/1/10, Brett Russ <bruss@netezza.com> wrote:

> From: Brett Russ <bruss@netezza.com>
> Subject: Re: Why does one get mismatches?
> To: linux-raid@vger.kernel.org
> Date: Wednesday, 20 January, 2010, 14:46
> On 01/20/2010 09:34 AM, Jon
> Hardcastle wrote:
> > I will gather the information you require, but so it
> is clear it is a
> > a echo 'check' that is kicking off the ultimate
> mismatch not from
> > boot.
> 
> What do you mean by mismatches detected?  How is this
> observed?
> -BR
> 


cat /sys/block/md4/md/mismatch_cnt

I have a script that i use that runs a 'check' then looks at this value once the check is complete. If it is > 0 it reports this number to me via email and then starts a repair. I am under the impression that the repair - when complete should report (in an ideal world) an identical number indicating that it did indeed find x errors and repaired them. But i am getting differing amount check shows 8 repair shows 12 next run it'll be 24 and 6 so something is up.

I have been able to get the info you asked for using mdadm -E but not with the array stopped. I cant stop it just yet. What would you be looking for in this data?

-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'

***********
Please note, I am phasing out jd_hardcastle AT yahoo.com and replacing it with jon AT eHardcastle.com
***********

-----------------------


      

[-- Attachment #2: mdadm-EsdX1.txt --]
[-- Type: text/plain, Size: 8603 bytes --]

/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468ec2 - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8        1        5      active sync   /dev/sda1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468ece - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       17        3      active sync   /dev/sdb1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468eda - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468ef0 - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       49        4      active sync   /dev/sdd1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sde1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468ef8 - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468f14 - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       81        6      active sync   /dev/sdf1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1
/dev/sdg1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7438efd1:9e6ca2b5:d6b88274:7003b1d3
  Creation Time : Thu Oct 11 00:01:49 2007
     Raid Level : raid6
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 4

    Update Time : Wed Jan 20 12:54:16 2010
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b2468f1c - correct
         Events : 1834225

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       97        2      active sync   /dev/sdg1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       97        2      active sync   /dev/sdg1
   3     3       8       17        3      active sync   /dev/sdb1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8        1        5      active sync   /dev/sda1
   6     6       8       81        6      active sync   /dev/sdf1

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 14:34   ` Jon Hardcastle
@ 2010-01-20 14:46     ` Brett Russ
  2010-02-01 20:48       ` Bill Davidsen
  0 siblings, 1 reply; 104+ messages in thread
From: Brett Russ @ 2010-01-20 14:46 UTC (permalink / raw)
  To: linux-raid

On 01/20/2010 09:34 AM, Jon Hardcastle wrote:
> I will gather the information you require, but so it is clear it is a
> a echo 'check' that is kicking off the ultimate mismatch not from
> boot.

What do you mean by mismatches detected?  How is this observed?
-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-20 14:19 ` Brett Russ
@ 2010-01-20 14:34   ` Jon Hardcastle
  2010-01-20 14:46     ` Brett Russ
  2010-01-22 16:22   ` Jon Hardcastle
  1 sibling, 1 reply; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-20 14:34 UTC (permalink / raw)
  To: linux-raid, Brett Russ


--- On Wed, 20/1/10, Brett Russ <bruss@netezza.com> wrote:

> From: Brett Russ <bruss@netezza.com>
> Subject: Re: Why does one get mismatches?
> To: linux-raid@vger.kernel.org
> Date: Wednesday, 20 January, 2010, 14:19
> On 01/19/2010 05:04 AM, Jon
> Hardcastle wrote:
> > I kicked off a check/repair cycle on my machine after
> i moved the
> > phyiscal ordering of my drives around and I am now on
> my second
> > check/repair cycle and it has kept finding
> mismatches.
> > 
> > Is it correct that the mismatch value after a repair
> was needed
> > should equal the value present after a check? What if
> it doesn't?
> > What does it mean if another check STILL reveals
> mismatches?
> > 
> > I had something similar after i reshaped from raid 5
> to 6 i had to
> > run check/repair/check/repair several times before i
> got my 0.
> 
> I think to diagnose this you'll need to show us the results
> of running 'mdadm -E /dev/[hs]dX#' (i.e. /dev/sda2) for each
> member device in the md device you're trying to assemble
> *before* attempting to start the md device.  This will
> report on the state of that specific member device
> (partition) and will show why a resync/repair would/would
> not be needed.
> 
> Note that if your md device is not in a read-only mode that
> the member states may be changing underneath you as you run
> the above command. Therefore, you should either stop the
> device then run the commands, or at least have the device in
> a read-only mode first.
> 
> -BR
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I will gather the information you require, but so it is clear it is a a echo 'check' that is kicking off the ultimate mismatch not from boot.

Also, I have never marked the array as read-only whilst i have done it historically - I can but never have and this is a data storage array and isn't actually(shouldn't be) in use really whilst i am not there (can that be tested?!) the main OS drive md3 runs and completes without a problem..

-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'

***********
Please note, I am phasing out jd_hardcastle AT yahoo.com and replacing it with jon AT eHardcastle.com
***********

-----------------------


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Why does one get mismatches?
  2010-01-19 10:04 Jon Hardcastle
@ 2010-01-20 14:19 ` Brett Russ
  2010-01-20 14:34   ` Jon Hardcastle
  2010-01-22 16:22   ` Jon Hardcastle
  0 siblings, 2 replies; 104+ messages in thread
From: Brett Russ @ 2010-01-20 14:19 UTC (permalink / raw)
  To: linux-raid

On 01/19/2010 05:04 AM, Jon Hardcastle wrote:
> I kicked off a check/repair cycle on my machine after i moved the
> phyiscal ordering of my drives around and I am now on my second
> check/repair cycle and it has kept finding mismatches.
>
> Is it correct that the mismatch value after a repair was needed
> should equal the value present after a check? What if it doesn't?
> What does it mean if another check STILL reveals mismatches?
>
> I had something similar after i reshaped from raid 5 to 6 i had to
> run check/repair/check/repair several times before i got my 0.

I think to diagnose this you'll need to show us the results of running 
'mdadm -E /dev/[hs]dX#' (i.e. /dev/sda2) for each member device in the 
md device you're trying to assemble *before* attempting to start the md 
device.  This will report on the state of that specific member device 
(partition) and will show why a resync/repair would/would not be needed.

Note that if your md device is not in a read-only mode that the member 
states may be changing underneath you as you run the above command. 
Therefore, you should either stop the device then run the commands, or 
at least have the device in a read-only mode first.

-BR


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Why does one get mismatches?
@ 2010-01-19 10:04 Jon Hardcastle
  2010-01-20 14:19 ` Brett Russ
  0 siblings, 1 reply; 104+ messages in thread
From: Jon Hardcastle @ 2010-01-19 10:04 UTC (permalink / raw)
  To: linux-raid

Hi,

I kicked off a check/repair cycle on my machine after i moved the phyiscal ordering of my drives around and I am now on my second check/repair cycle and it has kept finding mismatches.

Is it correct that the mismatch value after a repair was needed should equal the value present after a check? What if it doesn't? What does it mean if another check STILL reveals mismatches?

I had something similar after i reshaped from raid 5 to 6 i had to run check/repair/check/repair several times before i got my 0.


-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'

***********
Please note, I am phasing out jd_hardcastle AT yahoo.com and replacing it with jon AT eHardcastle.com
***********

-----------------------


      

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2010-03-03 12:03 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-20 11:52 Fw: Why does one get mismatches? Jon Hardcastle
2010-01-22 18:13 ` Goswin von Brederlow
2010-01-24 17:40   ` Jon Hardcastle
2010-01-24 21:52     ` Roger Heflin
2010-01-24 23:13     ` Goswin von Brederlow
2010-01-25 10:07       ` Jon Hardcastle
2010-01-25 10:37         ` Goswin von Brederlow
2010-01-25 10:52           ` Jon Hardcastle
2010-01-25 17:32             ` Goswin von Brederlow
2010-01-25 19:32             ` Iustin Pop
2010-02-01 21:18 ` Bill Davidsen
2010-02-01 22:37   ` Neil Brown
2010-02-02 15:11     ` Bill Davidsen
2010-02-03 11:17       ` Goswin von Brederlow
2010-02-11  5:14       ` Neil Brown
2010-02-11 17:51         ` Bryan Mesich
2010-02-16 21:25           ` Bill Davidsen
2010-02-16 21:38             ` Steven Haigh
2010-02-17  3:19               ` Bryan Mesich
2010-02-17 23:05               ` Neil Brown
2010-02-19 15:18                 ` Piergiorgio Sartor
2010-02-19 22:02                   ` Neil Brown
2010-02-19 22:37                     ` Piergiorgio Sartor
2010-02-19 23:34                     ` Asdo
2010-02-20  4:27                       ` Goswin von Brederlow
2010-02-20 11:12                         ` Asdo
2010-02-21 11:13                           ` Goswin von Brederlow
     [not found]                             ` <8754A21825504719B463AD9809E54349@m5>
     [not found]                               ` <20100221194400.GA2570@lazy.lzy>
2010-02-22 13:01                                 ` Asdo
2010-02-22 13:30                                   ` Piergiorgio Sartor
2010-02-22 13:44                                   ` Piergiorgio Sartor
2010-02-24 19:42                               ` Bill Davidsen
2010-02-20  4:23                     ` Goswin von Brederlow
2010-02-24 14:54                     ` Bill Davidsen
2010-02-24 21:37                       ` Neil Brown
2010-02-26 20:48                         ` Bill Davidsen
2010-02-26 21:09                           ` Neil Brown
2010-02-26 22:01                             ` Piergiorgio Sartor
2010-02-26 22:15                             ` Bill Davidsen
2010-02-26 22:21                               ` Piergiorgio Sartor
2010-02-26 22:20                             ` Asdo
2010-02-27  6:01                               ` Michael Evans
2010-02-28  0:01                                 ` Bill Davidsen
2010-02-24 14:46                 ` Bill Davidsen
2010-02-24 16:12                   ` Martin K. Petersen
2010-02-24 18:51                     ` Piergiorgio Sartor
2010-02-24 22:21                       ` Neil Brown
2010-02-25  8:41                         ` Piergiorgio Sartor
2010-03-02  4:57                           ` Neil Brown
2010-03-02 18:49                             ` Piergiorgio Sartor
2010-02-24 21:39                     ` Neil Brown
     [not found]                       ` <4B8640A2.4060307@shiftmail.org>
2010-02-25 10:41                         ` Neil Brown
2010-02-28  8:09                       ` Luca Berra
2010-03-02  5:01                         ` Neil Brown
2010-03-02  7:36                           ` Luca Berra
2010-03-02 10:04                             ` Michael Evans
2010-03-02 11:02                               ` Luca Berra
2010-03-02 12:13                                 ` Michael Evans
2010-03-02 18:14                                 ` Asdo
2010-03-02 18:52                                   ` Piergiorgio Sartor
2010-03-02 23:27                                     ` Asdo
2010-03-03  9:13                                       ` Piergiorgio Sartor
2010-03-03 11:42                                         ` Asdo
2010-03-03 12:03                                           ` Piergiorgio Sartor
2010-03-02 20:17                                   ` Neil Brown
2010-02-24 21:32                   ` Neil Brown
2010-02-25  7:22                     ` Goswin von Brederlow
2010-02-25  7:39                       ` Neil Brown
2010-02-25  8:47                     ` John Robinson
2010-02-25  9:07                       ` Neil Brown
2010-02-11 18:12         ` Piergiorgio Sartor
  -- strict thread matches above, loose matches on Subject: below --
2010-02-01 23:14 Jon Hardcastle
2010-01-25 20:43 greg
2010-01-25 22:49 ` Steven Haigh
2010-01-27 21:54   ` Tirumala Reddy Marri
2010-01-28  9:16     ` Jon Hardcastle
2010-01-28 10:29       ` Asdo
2010-01-28 17:20     ` Tirumala Reddy Marri
2010-01-28 18:23       ` Goswin von Brederlow
2010-01-28 19:03         ` Tirumala Reddy Marri
2010-01-28 20:24           ` Goswin von Brederlow
2010-01-29 15:37             ` Jon Hardcastle
2010-01-29 23:52               ` Goswin von Brederlow
2010-01-30 10:39                 ` Jon Hardcastle
2010-02-01 21:10               ` Bill Davidsen
2010-01-20 15:03 Jon Hardcastle
2010-01-20 15:34 ` Brett Russ
2010-01-20 20:44   ` Majed B.
2010-01-20 22:25     ` Brett Russ
2010-01-20 22:30       ` Majed B.
2010-01-20 22:43         ` Brett Russ
2010-01-20 23:01           ` Christopher Chen
2010-01-21  4:17           ` Steven Haigh
2010-01-21  8:08             ` Asdo
2010-01-21 10:52               ` Steven Haigh
2010-01-21 11:48                 ` Farkas Levente
2010-01-21 12:15                   ` Jon Hardcastle
2010-01-19 10:04 Jon Hardcastle
2010-01-20 14:19 ` Brett Russ
2010-01-20 14:34   ` Jon Hardcastle
2010-01-20 14:46     ` Brett Russ
2010-02-01 20:48       ` Bill Davidsen
2010-01-22 16:22   ` Jon Hardcastle
2010-01-22 16:34     ` Asdo
2010-01-22 17:41     ` Brett Russ

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.