* Suggestion needed for fixing RAID6
@ 2010-04-22 10:09 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
` (2 more replies)
0 siblings, 3 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 10:09 UTC (permalink / raw)
To: linux-raid
Hello Neil, list,
I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
Actually the array have 1 missing drive, and 3 wich have some bad sectors!
Genearlly because it is RAID6 there is no data lost, because the bad sectors
are not in one address line, but i can't rebuild the missing drive, because
the kernel drops out the bad sector-drives one by one during the rebuild
process.
My question is, there is any way, to force the array to keep the members in
even if have some reading errors?
Or is there a way to re-add the bad sector drives after the kernel dropped
out without stopping the rebuild process?
In normal way after 18 hour sync, @ 97.9% the 3rd drive is always dropped
out and the rebuild stops.
Thanks,
Janos Haar
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
@ 2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12 ` Janos Haar
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-23 6:51 ` Luca Berra
2 siblings, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-22 15:00 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On Thu, 22 Apr 2010, Janos Haar wrote:
> My question is, there is any way, to force the array to keep the members in
> even if have some reading errors?
What version of the kernel are you running? If it's running anywhere
recent kernel it shouldn't kick drives upon read error but instead
recreate from parity. You should probably send "repair" to the md device
(echo repair > /sys/block/mdX/md/sync_action) and see if that fixes the
bad blocks. I believe this came in 2.6.15 or something like that (google
if you're in that neighbourhood, if you're in 2.6.26 or alike then you
should be fine).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 15:00 ` Mikael Abrahamsson
@ 2010-04-22 15:12 ` Janos Haar
2010-04-22 15:18 ` Mikael Abrahamsson
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-22 15:12 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
Hi,
----- Original Message -----
From: "Mikael Abrahamsson" <swmike@swm.pp.se>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 22, 2010 5:00 PM
Subject: Re: Suggestion needed for fixing RAID6
> On Thu, 22 Apr 2010, Janos Haar wrote:
>
>> My question is, there is any way, to force the array to keep the members
>> in even if have some reading errors?
>
> What version of the kernel are you running? If it's running anywhere
> recent kernel it shouldn't kick drives upon read error but instead
> recreate from parity. You should probably send "repair" to the md device
> (echo repair > /sys/block/mdX/md/sync_action) and see if that fixes the
> bad blocks. I believe this came in 2.6.15 or something like that (google
> if you're in that neighbourhood, if you're in 2.6.26 or alike then you
> should be fine).
The kernel is. 2.6.28.10.
I am just tested one of the badblock-hdds, and the bad blocks comes
periodicaly, like a little and short scratch, and the drive can't correct
these by write.
Maybe this is why the kernel kicsk it out...
But anyway, the problem is still here, i want to rebuild the missing disk
(prior to replace the badblocked drives one by one), but the kernel kicks
out more 2 drive during the rebuild.
Thanks for the idea,
Janos
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 15:12 ` Janos Haar
@ 2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
0 siblings, 2 replies; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-22 15:18 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On Thu, 22 Apr 2010, Janos Haar wrote:
> I am just tested one of the badblock-hdds, and the bad blocks comes
> periodicaly, like a little and short scratch, and the drive can't correct
> these by write.
Oh, if you get write errors on the drive then you're in bigger trouble.
> Maybe this is why the kernel kicsk it out...
Yes, a write error to the drive is a kick:able offence. What does smartctl
say about the drives?
> But anyway, the problem is still here, i want to rebuild the missing disk
> (prior to replace the badblocked drives one by one), but the kernel kicks out
> more 2 drive during the rebuild.
I don't have a good idea that assures your data, unfortunately. One way
would be to dd the defective drives to working ones, but that will most
likely cause you to have data loss on the defective sectors (since md has
no idea that these sectors should be re-created from parity).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 15:18 ` Mikael Abrahamsson
@ 2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 16:25 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
----- Original Message -----
From: "Mikael Abrahamsson" <swmike@swm.pp.se>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 22, 2010 5:18 PM
Subject: Re: Suggestion needed for fixing RAID6
> On Thu, 22 Apr 2010, Janos Haar wrote:
>
>> I am just tested one of the badblock-hdds, and the bad blocks comes
>> periodicaly, like a little and short scratch, and the drive can't correct
>> these by write.
>
> Oh, if you get write errors on the drive then you're in bigger trouble.
I am planning to replace all the defective drives, but first i need to
rebuild the missing part.
I don't care about wich is the problem, the first drive have 123 unredable
sectors, and i have tried to rewrite one but not works.
This will goes to RMA, but first i need to solve the problem.
>
>> Maybe this is why the kernel kicsk it out...
>
> Yes, a write error to the drive is a kick:able offence. What does smartctl
> say about the drives?
The smart healt is good. (not wondering...)
But the drive have some offline unc sectors and some pendings.
>
>> But anyway, the problem is still here, i want to rebuild the missing disk
>> (prior to replace the badblocked drives one by one), but the kernel kicks
>> out more 2 drive during the rebuild.
>
> I don't have a good idea that assures your data, unfortunately. One way
> would be to dd the defective drives to working ones, but that will most
> likely cause you to have data loss on the defective sectors (since md has
> no idea that these sectors should be re-created from parity).
Exactly.
This is why i ask here. :-)
Because i don't want to make some KB errors on the array wich have all the
needed information.
Any good idea?
Thanks a lot,
Janos
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
@ 2010-04-22 16:32 ` Peter Rabbitson
1 sibling, 0 replies; 48+ messages in thread
From: Peter Rabbitson @ 2010-04-22 16:32 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: Janos Haar, linux-raid
Mikael Abrahamsson wrote:
> On Thu, 22 Apr 2010, Janos Haar wrote:
>
> I don't have a good idea that assures your data, unfortunately. One way
> would be to dd the defective drives to working ones, but that will most
> likely cause you to have data loss on the defective sectors (since md
> has no idea that these sectors should be re-created from parity).
>
There was a thread[1] some time ago, where HPA confirmed that the RAID6
data is sufficient to write an algorithm which will be able to determine
which sector is in fact the offending one. There wasn't any interest to
incorporate this into the sync_action/repair function though :(
[1] http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07327.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
@ 2010-04-22 20:48 ` Janos Haar
0 siblings, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 20:48 UTC (permalink / raw)
To: st0ff; +Cc: linux-raid
Hi,
----- Original Message -----
From: "Stefan /*St0fF*/ Hübner" <stefan.huebner@stud.tu-ilmenau.de>
To: "Janos Haar" <janos.haar@netcenter.hu>
Sent: Thursday, April 22, 2010 10:18 PM
Subject: Re: Suggestion needed for fixing RAID6
> Hi Janos,
>
> I'd ddrescue the failing drives one by one to replacement drives. Set a
> very high retry-count for this action.
I know what am i doing, trust me. ;-)
I have much more professional tools for this than the ddrescue, and i have
the list of defective sectors as well.
Now i am imaging the second of the failing drives, and this one have >1800
failing sectors.
>
> The logfile ddrescue creates shows the unreadable sectors afterwards.
> The hard part would now be to incorporate the raid-algorithm into some
> tool to just restore the missing sectors...
I can do that, but it is not a good game for 15TB array or even some hundred
of sectors to fix by hand....
The linux md knows how to recalculate these errors, i want to find this
way....somehow...
I am thinking of making RAID1 from the defective drives, and if the kernel
will re-write the sectors, the copy will get it.
But i don't know how to prevent the copy to read it. :-/
Thanks for your suggestions,
Janos
>
> I hope this helps a bit.
> Stefan
>
> Am 22.04.2010 12:09, schrieb Janos Haar:
>> Hello Neil, list,
>>
>> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
>> Actually the array have 1 missing drive, and 3 wich have some bad
>> sectors!
>> Genearlly because it is RAID6 there is no data lost, because the bad
>> sectors are not in one address line, but i can't rebuild the missing
>> drive, because the kernel drops out the bad sector-drives one by one
>> during the rebuild process.
>>
>> My question is, there is any way, to force the array to keep the members
>> in even if have some reading errors?
>> Or is there a way to re-add the bad sector drives after the kernel
>> dropped out without stopping the rebuild process?
>> In normal way after 18 hour sync, @ 97.9% the 3rd drive is always
>> dropped out and the rebuild stops.
>>
>> Thanks,
>> Janos Haar
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
@ 2010-04-23 6:51 ` Luca Berra
2010-04-23 8:47 ` Janos Haar
2 siblings, 1 reply; 48+ messages in thread
From: Luca Berra @ 2010-04-23 6:51 UTC (permalink / raw)
To: linux-raid
On Thu, Apr 22, 2010 at 12:09:08PM +0200, Janos Haar wrote:
> Hello Neil, list,
>
> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
> Actually the array have 1 missing drive, and 3 wich have some bad sectors!
> Genearlly because it is RAID6 there is no data lost, because the bad
> sectors are not in one address line, but i can't rebuild the missing drive,
> because the kernel drops out the bad sector-drives one by one during the
> rebuild process.
I would seriously consider moving the data out of that array and dumping
all drives from that batch, and this is gonna be painful because you
must watch drives being dropped and add them back, and yes you need the
resources to store the data.
ddrescue won't obviously work, since it will mask read errors and turn
those into data corruption
the raid 1 trick wont work, as you noted
another option could be using the device mapper snapshot-merge target
(writable snapshot), which iirc is a 2.6.33+ feature
look at
http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
for hints.
btw i have no clue how the scsi error will travel thru the dm layer.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-23 6:51 ` Luca Berra
@ 2010-04-23 8:47 ` Janos Haar
2010-04-23 12:34 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-23 8:47 UTC (permalink / raw)
To: Luca Berra; +Cc: linux-raid
----- Original Message -----
From: "Luca Berra" <bluca@comedia.it>
To: <linux-raid@vger.kernel.org>
Sent: Friday, April 23, 2010 8:51 AM
Subject: Re: Suggestion needed for fixing RAID6
> On Thu, Apr 22, 2010 at 12:09:08PM +0200, Janos Haar wrote:
>> Hello Neil, list,
>>
>> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
>> Actually the array have 1 missing drive, and 3 wich have some bad
>> sectors!
>> Genearlly because it is RAID6 there is no data lost, because the bad
>> sectors are not in one address line, but i can't rebuild the missing
>> drive, because the kernel drops out the bad sector-drives one by one
>> during the rebuild process.
>
> I would seriously consider moving the data out of that array and dumping
> all drives from that batch, and this is gonna be painful because you
> must watch drives being dropped and add them back, and yes you need the
> resources to store the data.
>
> ddrescue won't obviously work, since it will mask read errors and turn
> those into data corruption
>
> the raid 1 trick wont work, as you noted
>
> another option could be using the device mapper snapshot-merge target
> (writable snapshot), which iirc is a 2.6.33+ feature
> look at
> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
> for hints.
> btw i have no clue how the scsi error will travel thru the dm layer.
> L.
...or cowloop! :-)
This is a good idea! :-)
Thank you.
I have another one:
re-create the array (--assume-clean) with external bitmap, than drop the
missing drive.
Than manually manipulate the bitmap file to re-sync only the last 10% wich
is good enough for me...
Thanks again,
Janos
>
> --
> Luca Berra -- bluca@comedia.it
> Communication Media & Services S.r.l.
> /"\
> \ / ASCII RIBBON CAMPAIGN
> X AGAINST HTML MAIL
> / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-23 8:47 ` Janos Haar
@ 2010-04-23 12:34 ` MRK
2010-04-24 19:36 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-23 12:34 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/23/2010 10:47 AM, Janos Haar wrote:
>
> ----- Original Message ----- From: "Luca Berra" <bluca@comedia.it>
> To: <linux-raid@vger.kernel.org>
> Sent: Friday, April 23, 2010 8:51 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> another option could be using the device mapper snapshot-merge target
>> (writable snapshot), which iirc is a 2.6.33+ feature
>> look at
>> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
>>
>> for hints.
>> btw i have no clue how the scsi error will travel thru the dm layer.
>> L.
>
> ...or cowloop! :-)
> This is a good idea! :-)
> Thank you.
>
> I have another one:
> re-create the array (--assume-clean) with external bitmap, than drop
> the missing drive.
> Than manually manipulate the bitmap file to re-sync only the last 10%
> wich is good enough for me...
Cowloop is kinda deprecated in favour of DM, says wikipedia, and messing
with the bitmap looks complicated to me.
I think Luca's is a great suggestion. You can use 3 files with
loop-device so to store the COW devices for the 3 disks which are
faulty. So that writes go there and you can complete the resync.
Then you would fail the cow devices one by one from mdadm and replicate
to spares.
But this will work ONLY if read errors are still be reported across the
DM-snapshot thingo. Otherwise (if it e.g. returns a block of zeroes
without error) you are eventually going to get data corruption when
replacing drives.
You can check if read errors are reported, by looking at the dmesg
during the resync. If you see many "read error corrected..." it works,
while if it's silent it means it hasn't received read errors which means
that it doesn't work. If it doesn't work DO NOT go ahead replacing
drives, or you will get data corruption.
So you need an initial test which just performs a resync but *without*
replicating to a spare. So I suggest you first remove all the spares
from the array, then create the COW snapshots, then assemble the array,
perform a resync, look at the dmesg. If it works: add the spares back,
fail one drive, etc.
If this technique works this would be useful for everybody, so pls keep
us informed!!
Thank you
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-23 12:34 ` MRK
@ 2010-04-24 19:36 ` Janos Haar
2010-04-24 22:47 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-24 19:36 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: "linux-raid" <linux-raid@vger.kernel.org>
Sent: Friday, April 23, 2010 2:34 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/23/2010 10:47 AM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "Luca Berra" <bluca@comedia.it>
>> To: <linux-raid@vger.kernel.org>
>> Sent: Friday, April 23, 2010 8:51 AM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>>
>>> another option could be using the device mapper snapshot-merge target
>>> (writable snapshot), which iirc is a 2.6.33+ feature
>>> look at
>>> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
>>> for hints.
>>> btw i have no clue how the scsi error will travel thru the dm layer.
>>> L.
>>
>> ...or cowloop! :-)
>> This is a good idea! :-)
>> Thank you.
>>
>> I have another one:
>> re-create the array (--assume-clean) with external bitmap, than drop the
>> missing drive.
>> Than manually manipulate the bitmap file to re-sync only the last 10%
>> wich is good enough for me...
>
>
> Cowloop is kinda deprecated in favour of DM, says wikipedia, and messing
> with the bitmap looks complicated to me.
Hi,
I think i will like again this idea... :-D
> I think Luca's is a great suggestion. You can use 3 files with loop-device
> so to store the COW devices for the 3 disks which are faulty. So that
> writes go there and you can complete the resync.
> Then you would fail the cow devices one by one from mdadm and replicate to
> spares.
>
> But this will work ONLY if read errors are still be reported across the
> DM-snapshot thingo. Otherwise (if it e.g. returns a block of zeroes
> without error) you are eventually going to get data corruption when
> replacing drives.
>
> You can check if read errors are reported, by looking at the dmesg during
> the resync. If you see many "read error corrected..." it works, while if
> it's silent it means it hasn't received read errors which means that it
> doesn't work. If it doesn't work DO NOT go ahead replacing drives, or you
> will get data corruption.
>
> So you need an initial test which just performs a resync but *without*
> replicating to a spare. So I suggest you first remove all the spares from
> the array, then create the COW snapshots, then assemble the array, perform
> a resync, look at the dmesg. If it works: add the spares back, fail one
> drive, etc.
>
> If this technique works this would be useful for everybody, so pls keep us
> informed!!
Ok, i am doing it.
I think i have found some interesting, what is unexpected:
After 99.9% (and another 1800minute) the array is dropped the dm-snapshot
structure!
ata5.00: exception Emask 0x0 SAct 0x7fa1 SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/d8:38:1d:e7:90/00:00:ae:00:00/40 tag 7 ncq 110592 in
res 41/40:7a:7b:e7:90/6c:00:ae:00:00/40 Emask 0x409 (media error)
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
ata5: EH complete
...
sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/d8:38:1d:e7:90/00:00:ae:00:00/40 tag 7 ncq 110592 in
res 41/40:7a:7b:e7:90/6c:00:ae:00:00/40 Emask 0x409 (media error)
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
sd 4:0:0:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 4:0:0:0: [sde] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
ae 90 e7 7b
sd 4:0:0:0: [sde] Add. Sense: Unrecovered read error - auto reallocate
failed
end_request: I/O error, dev sde, sector 2928732027
__ratelimit: 16 callbacks suppressed
raid5:md3: read error not correctable (sector 2923767936 on dm-0).
raid5: Disk failure on dm-0, disabling device.
raid5: Operation continuing on 9 devices.
md: md3: recovery done.
raid5:md3: read error not correctable (sector 2923767944 on dm-0).
raid5:md3: read error not correctable (sector 2923767952 on dm-0).
raid5:md3: read error not correctable (sector 2923767960 on dm-0).
raid5:md3: read error not correctable (sector 2923767968 on dm-0).
raid5:md3: read error not correctable (sector 2923767976 on dm-0).
raid5:md3: read error not correctable (sector 2923767984 on dm-0).
raid5:md3: read error not correctable (sector 2923767992 on dm-0).
raid5:md3: read error not correctable (sector 2923768000 on dm-0).
raid5:md3: read error not correctable (sector 2923768008 on dm-0).
ata5: EH complete
sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
ata5.00: exception Emask 0x0 SAct 0x1e1 SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/00:28:f5:e8:90/01:00:ae:00:00/40 tag 5 ncq 131072 in
res 41/40:27:ce:e9:90/6c:00:ae:00:00/40 Emask 0x409 (media error)
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
ata5: EH complete
sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
RAID5 conf printout:
--- rd:12 wd:9
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 3, o:1, dev:sdd4
disk 4, o:0, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:9
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 4, o:0, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:9
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 4, o:0, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:9
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
So, the dm-0 is dropped only for _READ_ error!
kernel 2.6.28.10
Now i am trying to do a repair-resync solution before rebuild the missing
drive...
Cheers,
Janos
> Thank you
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-24 19:36 ` Janos Haar
@ 2010-04-24 22:47 ` MRK
2010-04-25 10:00 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-24 22:47 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/24/2010 09:36 PM, Janos Haar wrote:
>
> Ok, i am doing it.
>
> I think i have found some interesting, what is unexpected:
> After 99.9% (and another 1800minute) the array is dropped the
> dm-snapshot structure!
>
> ...[CUT]...
>
> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>
> ...[CUT]...
>
> So, the dm-0 is dropped only for _READ_ error!
Actually no, it is being dropped for "uncorrectable read error" which
means, AFAIK, that the read error was received, then the block was
recomputed from the other disks, then a rewrite of the damaged block was
attempted, and such *write* failed. So it is being dropped for a *write*
error. People correct me if I'm wrong.
This is strange because the write should have gone to the cow device.
Are you sure you did everything correctly with DM? Could you post here
how you created the dm-0 device?
We might ask to the DM people why it's not working maybe. Anyway there
is one good news, and it's that the read error apparently does travel
through the DM stack.
Thanks for your work
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-24 22:47 ` MRK
@ 2010-04-25 10:00 ` Janos Haar
2010-04-26 10:24 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-25 10:00 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, April 25, 2010 12:47 AM
Subject: Re: Suggestion needed for fixing RAID6
Just a little note:
The repair-sync action failed similar way too. :-(
> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>
>> Ok, i am doing it.
>>
>> I think i have found some interesting, what is unexpected:
>> After 99.9% (and another 1800minute) the array is dropped the dm-snapshot
>> structure!
>>
>> ...[CUT]...
>>
>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>
>> ...[CUT]...
>>
>> So, the dm-0 is dropped only for _READ_ error!
>
> Actually no, it is being dropped for "uncorrectable read error" which
> means, AFAIK, that the read error was received, then the block was
> recomputed from the other disks, then a rewrite of the damaged block was
> attempted, and such *write* failed. So it is being dropped for a *write*
> error. People correct me if I'm wrong.
I think i can try:
# dd_rescue -v /dev/zero -S $((2923767944 / 2))k /dev/mapper/cow -m 4k
dd_rescue: (info): about to transfer 4.0 kBytes from /dev/zero to
/dev/mapper/cow
dd_rescue: (info): blocksizes: soft 65536, hard 512
dd_rescue: (info): starting positions: in 0.0k, out 1461883972.0k
dd_rescue: (info): Logfile: (none), Maxerr: 0
dd_rescue: (info): Reverse: no , Trunc: no , interactive: no
dd_rescue: (info): abort on Write errs: no , spArse write: if err
dd_rescue: (info): ipos: 0.0k, opos:1461883972.0k, xferd:
0.0k
errs: 0, errxfer: 0.0k, succxfer:
0.0k
+curr.rate: 0kB/s, avg.rate: 0kB/s, avg.load:
0.0%
Summary for /dev/zero -> /dev/mapper/cow:
dd_rescue: (info): ipos: 4.0k, opos:1461883976.0k, xferd:
4.0k
errs: 0, errxfer: 0.0k, succxfer:
4.0k
+curr.rate: 203kB/s, avg.rate: 203kB/s, avg.load:
0.0%
>
> This is strange because the write should have gone to the cow device. Are
> you sure you did everything correctly with DM? Could you post here how you
> created the dm-0 device?
echo 0 $(blockdev --getsize /dev/sde4) \
snapshot /dev/sde4 /dev/loop3 p 8 | \
dmsetup create cow
]# losetup /dev/loop3
/dev/loop3: [0901]:55091517 (/snapshot.bin)
/snapshot.bin is a sparse file with 2000G seeked size.
I have 3.6GB free space in / so the out of space is not an option. :-)
I think this is correct. :-)
But anyway, i have pre-tested it with fdisk and works.
>
> We might ask to the DM people why it's not working maybe. Anyway there is
> one good news, and it's that the read error apparently does travel through
> the DM stack.
For me, this looks like md's bug not dm's problem.
The "uncorrectable read error" means exactly the drive can't correct the
damaged sector with ECC, and this is an unreadable sector. (pending in smart
table)
The auto read reallocation failed not meas the sector is not re-allocatable
by rewriting it!
The most of the drives doesn't do read-reallocation only write-reallocation.
These drives wich does read reallocation, does it because the sector was
hard to re-calculate (maybe needed more rotation, more repositioning, too
much time) and moved automatically, BUT those sectors ARE NOT reported to
the pc as read-error (UNC), so must NOT appear in the log...
I am glad if i can help to fix this but, but please keep this in mind, this
raid array is a productive system, and my customer gets more and more
nervous day by day...
I need a good solution for fixing this array to safely replace the bad
drives without any data lost!
Somebody have any good idea wich is not copy the entire (15TB) array?
Thanks a lot,
Janos Haat
>
> Thanks for your work
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-25 10:00 ` Janos Haar
@ 2010-04-26 10:24 ` MRK
2010-04-26 12:52 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-26 10:24 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/25/2010 12:00 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Sunday, April 25, 2010 12:47 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
> Just a little note:
>
> The repair-sync action failed similar way too. :-(
>
>
>> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>>
>>> Ok, i am doing it.
>>>
>>> I think i have found some interesting, what is unexpected:
>>> After 99.9% (and another 1800minute) the array is dropped the
>>> dm-snapshot structure!
>>>
>>> ...[CUT]...
>>>
>>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>>
>>> ...[CUT]...
>>>
>
Remember this exact error message: "read error not correctable"
>
>>
>> This is strange because the write should have gone to the cow device.
>> Are you sure you did everything correctly with DM? Could you post
>> here how you created the dm-0 device?
>
> echo 0 $(blockdev --getsize /dev/sde4) \
> snapshot /dev/sde4 /dev/loop3 p 8 | \
> dmsetup create cow
>
Seems correct to me...
> ]# losetup /dev/loop3
> /dev/loop3: [0901]:55091517 (/snapshot.bin)
>
This line comes BEFORE the other one, right?
> /snapshot.bin is a sparse file with 2000G seeked size.
> I have 3.6GB free space in / so the out of space is not an option. :-)
>
>
[...]
>
>>
>> We might ask to the DM people why it's not working maybe. Anyway
>> there is one good news, and it's that the read error apparently does
>> travel through the DM stack.
>
> For me, this looks like md's bug not dm's problem.
> The "uncorrectable read error" means exactly the drive can't correct
> the damaged sector with ECC, and this is an unreadable sector.
> (pending in smart table)
> The auto read reallocation failed not meas the sector is not
> re-allocatable by rewriting it!
> The most of the drives doesn't do read-reallocation only
> write-reallocation.
>
> These drives wich does read reallocation, does it because the sector
> was hard to re-calculate (maybe needed more rotation, more
> repositioning, too much time) and moved automatically, BUT those
> sectors ARE NOT reported to the pc as read-error (UNC), so must NOT
> appear in the log...
>
No the error message really comes from MD. Can you read C code? Go into
the kernel source and look this file:
linux_source_dir/drivers/md/raid5.c
(file raid5.c is also for raid6) search for "read error not correctable"
What you see there is the reason for failure. You see the line "if
(conf->mddev->degraded)" just above? I think your mistake was that you
did the DM COW trick only on the last device, or anyway one device only,
instead you should have done it on all 3 devices which were failing.
It did not work for you because at the moment you got the read error on
the last disk, two disks were already dropped from the array, the array
was doubly degraded, and it's not possible to correct a read error if
the array is degraded because you don't have enough parity information
to recover the data for that sector.
You should have prevented also the first two disks from dropping. Do the
DM trick on all of them simultaneously, or at least on 2 of them (if you
are sure only 3 disks have problems), start the array making sure it
starts with all devices online i.e. nondegraded, then start the resync,
and I think it will work.
> I am glad if i can help to fix this but, but please keep this in mind,
> this raid array is a productive system, and my customer gets more and
> more nervous day by day...
> I need a good solution for fixing this array to safely replace the bad
> drives without any data lost!
>
> Somebody have any good idea wich is not copy the entire (15TB) array?
I don't think there is another way. You need to make this work.
Good luck
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-26 10:24 ` MRK
@ 2010-04-26 12:52 ` Janos Haar
2010-04-26 16:53 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-26 12:52 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 12:24 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/25/2010 12:00 PM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>> To: "Janos Haar" <janos.haar@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Sunday, April 25, 2010 12:47 AM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>> Just a little note:
>>
>> The repair-sync action failed similar way too. :-(
>>
>>
>>> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>>>
>>>> Ok, i am doing it.
>>>>
>>>> I think i have found some interesting, what is unexpected:
>>>> After 99.9% (and another 1800minute) the array is dropped the
>>>> dm-snapshot structure!
>>>>
>>>> ...[CUT]...
>>>>
>>>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>>>
>>>> ...[CUT]...
>>>>
>>
>
> Remember this exact error message: "read error not correctable"
>
>>
>>>
>>> This is strange because the write should have gone to the cow device.
>>> Are you sure you did everything correctly with DM? Could you post
>>> here how you created the dm-0 device?
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>> snapshot /dev/sde4 /dev/loop3 p 8 | \
>> dmsetup create cow
>>
>
> Seems correct to me...
>
>> ]# losetup /dev/loop3
>> /dev/loop3: [0901]:55091517 (/snapshot.bin)
>>
> This line comes BEFORE the other one, right?
>
>> /snapshot.bin is a sparse file with 2000G seeked size.
>> I have 3.6GB free space in / so the out of space is not an option. :-)
>>
>>
> [...]
>>
>>>
>>> We might ask to the DM people why it's not working maybe. Anyway
>>> there is one good news, and it's that the read error apparently does
>>> travel through the DM stack.
>>
>> For me, this looks like md's bug not dm's problem.
>> The "uncorrectable read error" means exactly the drive can't correct
>> the damaged sector with ECC, and this is an unreadable sector.
>> (pending in smart table)
>> The auto read reallocation failed not meas the sector is not
>> re-allocatable by rewriting it!
>> The most of the drives doesn't do read-reallocation only
>> write-reallocation.
>>
>> These drives wich does read reallocation, does it because the sector
>> was hard to re-calculate (maybe needed more rotation, more
>> repositioning, too much time) and moved automatically, BUT those
>> sectors ARE NOT reported to the pc as read-error (UNC), so must NOT
>> appear in the log...
>>
>
> No the error message really comes from MD. Can you read C code? Go into
> the kernel source and look this file:
>
> linux_source_dir/drivers/md/raid5.c
>
> (file raid5.c is also for raid6) search for "read error not correctable"
>
> What you see there is the reason for failure. You see the line "if
> (conf->mddev->degraded)" just above? I think your mistake was that you
> did the DM COW trick only on the last device, or anyway one device only,
> instead you should have done it on all 3 devices which were failing.
>
> It did not work for you because at the moment you got the read error on
> the last disk, two disks were already dropped from the array, the array
> was doubly degraded, and it's not possible to correct a read error if
> the array is degraded because you don't have enough parity information
> to recover the data for that sector.
Oops, you are right!
It was my mistake.
Sorry, i will try it again, to support 2 drives with dm-cow.
I will try it.
Thanks again.
Janos
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-26 12:52 ` Janos Haar
@ 2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
2010-04-27 15:50 ` Janos Haar
0 siblings, 2 replies; 48+ messages in thread
From: MRK @ 2010-04-26 16:53 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/26/2010 02:52 PM, Janos Haar wrote:
>
> Oops, you are right!
> It was my mistake.
> Sorry, i will try it again, to support 2 drives with dm-cow.
> I will try it.
Great! post here the results... the dmesg in particular.
The dmesg should contain multiple lines like this "raid5:md3: read error
corrected ....."
then you know it worked.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-26 16:53 ` MRK
@ 2010-04-26 22:39 ` Janos Haar
2010-04-26 23:06 ` Michael Evans
2010-04-27 15:50 ` Janos Haar
1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-26 22:39 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 6:53 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>
>> Oops, you are right!
>> It was my mistake.
>> Sorry, i will try it again, to support 2 drives with dm-cow.
>> I will try it.
>
> Great! post here the results... the dmesg in particular.
> The dmesg should contain multiple lines like this "raid5:md3: read error
> corrected ....."
> then you know it worked.
md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9]
[UUU__UU_UUUU]
[>....................] recovery = 1.5% (22903832/1462653888)
finish=3188383.4min speed=7K/sec
Khm.... :-D
It is working on something or stopped with 3 missing drive? : ^ )
(I have found the cause of the 2 dm's failure.
Now retry runs...)
Cheers,
Janos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-26 22:39 ` Janos Haar
@ 2010-04-26 23:06 ` Michael Evans
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
0 siblings, 1 reply; 48+ messages in thread
From: Michael Evans @ 2010-04-26 23:06 UTC (permalink / raw)
To: Janos Haar; +Cc: MRK, linux-raid
On Mon, Apr 26, 2010 at 3:39 PM, Janos Haar <janos.haar@netcenter.hu> wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, April 26, 2010 6:53 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>
>>> Oops, you are right!
>>> It was my mistake.
>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>> I will try it.
>>
>> Great! post here the results... the dmesg in particular.
>> The dmesg should contain multiple lines like this "raid5:md3: read error
>> corrected ....."
>> then you know it worked.
>
> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
> sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9] [UUU__UU_UUUU]
> [>....................] recovery = 1.5% (22903832/1462653888)
> finish=3188383.4min speed=7K/sec
>
> Khm.... :-D
> It is working on something or stopped with 3 missing drive? : ^ )
>
> (I have found the cause of the 2 dm's failure.
> Now retry runs...)
>
> Cheers,
> Janos
>
>
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
What is displayed there seems like it can't be correct. Please run
mdadm -Evvs
mdadm -Dvvs
and provide the results for us.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
@ 2010-04-27 0:04 ` Michael Evans
0 siblings, 0 replies; 48+ messages in thread
From: Michael Evans @ 2010-04-27 0:04 UTC (permalink / raw)
To: linux-raid
On Mon, Apr 26, 2010 at 4:29 PM, Janos Haar <janos.haar@netcenter.hu> wrote:
>
> ----- Original Message ----- From: "Michael Evans" <mjevans1983@gmail.com>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: "MRK" <mrk@shiftmail.org>; <linux-raid@vger.kernel.org>
> Sent: Tuesday, April 27, 2010 1:06 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On Mon, Apr 26, 2010 at 3:39 PM, Janos Haar <janos.haar@netcenter.hu>
>> wrote:
>>>
>>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>>> To: "Janos Haar" <janos.haar@netcenter.hu>
>>> Cc: <linux-raid@vger.kernel.org>
>>> Sent: Monday, April 26, 2010 6:53 PM
>>> Subject: Re: Suggestion needed for fixing RAID6
>>>
>>>
>>>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>>>
>>>>> Oops, you are right!
>>>>> It was my mistake.
>>>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>>>> I will try it.
>>>>
>>>> Great! post here the results... the dmesg in particular.
>>>> The dmesg should contain multiple lines like this "raid5:md3: read error
>>>> corrected ....."
>>>> then you know it worked.
>>>
>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
>>> sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
>>> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9] [UUU__UU_UUUU]
>>> [>....................] recovery = 1.5% (22903832/1462653888)
>>> finish=3188383.4min speed=7K/sec
>>>
>>> Khm.... :-D
>>> It is working on something or stopped with 3 missing drive? : ^ )
>>>
>>> (I have found the cause of the 2 dm's failure.
>>> Now retry runs...)
>>>
>>> Cheers,
>>> Janos
>>>
>>>
>>>
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> What is displayed there seems like it can't be correct. Please run
>>
>> mdadm -Evvs
>>
>> mdadm -Dvvs
>>
>> and provide the results for us.
>
> I have wrongly assigned the dm devices (cross-linked) and the sync process
> is freezed.
> The snapshot is grown to the maximum of space, than both failed with write
> error at the same time with out of space.
> The md_sync process is freezed.
> (I have to push the reset.)
>
> I think this is correct what we can see, because the process is freezed
> before exit and can't change the state to failed.
>
> Cheers,
> Janos
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
Please reply to all.
It sounds like you need a LOT more space. Please carefully try again.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
@ 2010-04-27 15:50 ` Janos Haar
2010-04-27 23:02 ` MRK
1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-27 15:50 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 6:53 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>
>> Oops, you are right!
>> It was my mistake.
>> Sorry, i will try it again, to support 2 drives with dm-cow.
>> I will try it.
>
> Great! post here the results... the dmesg in particular.
> The dmesg should contain multiple lines like this "raid5:md3: read error
> corrected ....."
> then you know it worked.
I am affraid i am still right about that....
...
end_request: I/O error, dev sdh, sector 1667152256
raid5:md3: read error not correctable (sector 1662188168 on dm-1).
raid5: Disk failure on dm-1, disabling device.
raid5: Operation continuing on 10 devices.
raid5:md3: read error not correctable (sector 1662188176 on dm-1).
raid5:md3: read error not correctable (sector 1662188184 on dm-1).
raid5:md3: read error not correctable (sector 1662188192 on dm-1).
raid5:md3: read error not correctable (sector 1662188200 on dm-1).
raid5:md3: read error not correctable (sector 1662188208 on dm-1).
raid5:md3: read error not correctable (sector 1662188216 on dm-1).
raid5:md3: read error not correctable (sector 1662188224 on dm-1).
raid5:md3: read error not correctable (sector 1662188232 on dm-1).
raid5:md3: read error not correctable (sector 1662188240 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata8.00: port_status 0x20200000
ata8.00: cmd 25/00:f8:f5:ba:5e/00:03:63:00:00/e0 tag 0 dma 520192 in
res 51/40:00:ef:bb:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
ata8.00: status: { DRDY ERR }
ata8.00: error: { UNC }
ata8.00: configured for UDMA/133
ata8: EH complete
....
....
sd 7:0:0:0: [sdh] Add. Sense: Unrecovered read error - auto reallocate
failed
end_request: I/O error, dev sdh, sector 1667152879
__ratelimit: 36 callbacks suppressed
raid5:md3: read error not correctable (sector 1662188792 on dm-1).
raid5:md3: read error not correctable (sector 1662188800 on dm-1).
md: md3: recovery done.
raid5:md3: read error not correctable (sector 1662188808 on dm-1).
raid5:md3: read error not correctable (sector 1662188816 on dm-1).
raid5:md3: read error not correctable (sector 1662188824 on dm-1).
raid5:md3: read error not correctable (sector 1662188832 on dm-1).
raid5:md3: read error not correctable (sector 1662188840 on dm-1).
raid5:md3: read error not correctable (sector 1662188848 on dm-1).
raid5:md3: read error not correctable (sector 1662188856 on dm-1).
raid5:md3: read error not correctable (sector 1662188864 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO
or FUA
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata8.00: port_status 0x20200000
....
....
res 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
ata8.00: status: { DRDY ERR }
ata8.00: error: { UNC }
ata8.00: configured for UDMA/133
sd 7:0:0:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 7:0:0:0: [sdh] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
63 5e c0 27
sd 7:0:0:0: [sdh] Add. Sense: Unrecovered read error - auto reallocate
failed
end_request: I/O error, dev sdh, sector 1667153959
__ratelimit: 86 callbacks suppressed
raid5:md3: read error not correctable (sector 1662189872 on dm-1).
raid5:md3: read error not correctable (sector 1662189880 on dm-1).
raid5:md3: read error not correctable (sector 1662189888 on dm-1).
raid5:md3: read error not correctable (sector 1662189896 on dm-1).
raid5:md3: read error not correctable (sector 1662189904 on dm-1).
raid5:md3: read error not correctable (sector 1662189912 on dm-1).
raid5:md3: read error not correctable (sector 1662189920 on dm-1).
raid5:md3: read error not correctable (sector 1662189928 on dm-1).
raid5:md3: read error not correctable (sector 1662189936 on dm-1).
raid5:md3: read error not correctable (sector 1662189944 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO
or FUA
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO
or FUA
RAID5 conf printout:
--- rd:12 wd:10
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 3, o:1, dev:sdd4
disk 4, o:1, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 7, o:0, dev:dm-1
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:10
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 3, o:1, dev:sdd4
disk 4, o:1, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 7, o:0, dev:dm-1
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:10
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 3, o:1, dev:sdd4
disk 4, o:1, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 7, o:0, dev:dm-1
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
RAID5 conf printout:
--- rd:12 wd:10
disk 0, o:1, dev:sda4
disk 1, o:1, dev:sdb4
disk 2, o:1, dev:sdc4
disk 3, o:1, dev:sdd4
disk 4, o:1, dev:dm-0
disk 5, o:1, dev:sdf4
disk 6, o:1, dev:sdg4
disk 8, o:1, dev:sdi4
disk 9, o:1, dev:sdj4
disk 10, o:1, dev:sdk4
disk 11, o:1, dev:sdl4
md: recovery of RAID array md3
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec)
for recovery.
md: using 128k window, over a total of 1462653888 blocks.
md: resuming recovery of md3 from checkpoint.
md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
sdg4[6] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
[UUU_UUU_UUUU]
[===============>.....] recovery = 75.3% (1101853312/1462653888)
finish=292.3min speed=20565K/sec
du -h /sna*
1.1M /snapshot2.bin
1.1M /snapshot.bin
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 19G 16G 3.5G 82% /
/dev/md0 99M 34M 60M 36% /boot
tmpfs 2.0G 0 2.0G 0% /dev/shm
This is the actual state. :-(
In this way, the sync will stop again at 97.9%.
Another idea?
Or how to solve this dm-snapshot thing?
I think i know how can this be:
If i am right, the sync uses normal block size like usually wich is 4Kbyte
in linux.
But the bad blocks are 512 bytes.
lets see for example one 4K window:
[BGBGBBGG] B: bad G: good sector
The sync reads up the block, the reported state is UNC because the drive
reported UNC for some sector in this area.
The md recalculates the first 512byte bad block because the address is the
same like the 4K block, than re-write it.
Than re-read the 4K block wich is still UNC because the 3rd sector is bad.
Can this be the issue?
Thanks,
Janos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-27 15:50 ` Janos Haar
@ 2010-04-27 23:02 ` MRK
2010-04-28 1:37 ` Neil Brown
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-27 23:02 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid, Neil Brown
On 04/27/2010 05:50 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, April 26, 2010 6:53 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>
>>> Oops, you are right!
>>> It was my mistake.
>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>> I will try it.
>>
>> Great! post here the results... the dmesg in particular.
>> The dmesg should contain multiple lines like this "raid5:md3: read
>> error corrected ....."
>> then you know it worked.
>
> I am affraid i am still right about that....
>
> ...
> end_request: I/O error, dev sdh, sector 1667152256
> raid5:md3: read error not correctable (sector 1662188168 on dm-1).
> raid5: Disk failure on dm-1, disabling device.
> raid5: Operation continuing on 10 devices.
I think I can see a problem here:
You had 11 active devices over 12 when you received the read error.
At 11 devices over 12 your array is singly-degraded and this should be
enough for raid6 to recompute the block from parity and perform the
rewrite, correcting the read-error, but instead MD declared that it's
impossible to correct the error, and dropped one more device (going to
doubly-degraded).
I think this is an MD bug, and I think I know where it is:
--- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
@@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&rdev->read_errors);
- if (conf->mddev->degraded)
+ if (conf->mddev->degraded == conf->max_degraded)
printk_rl(KERN_WARNING
"raid5:%s: read error not correctable "
"(sector %llu on %s).\n",
------------------------------------------------------
(This is just compile-tested so try at your risk)
I'd like to hear what Neil thinks of this...
The problem here (apart from the erroneous error message) is that if
execution goes inside that "if" clause, it will eventually reach the
md_error() statement some 30 lines below there, which will have the
effect of dropping one further device further worsening the situation
instead of recovering it, and this is not the correct behaviour in this
case as far as I understand.
At the current state raid6 behaves like if it was a raid5, effectively
supporting only one failed disk.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-27 23:02 ` MRK
@ 2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 12:57 ` MRK
0 siblings, 2 replies; 48+ messages in thread
From: Neil Brown @ 2010-04-28 1:37 UTC (permalink / raw)
To: MRK; +Cc: Janos Haar, linux-raid
On Wed, 28 Apr 2010 01:02:14 +0200
MRK <mrk@shiftmail.org> wrote:
> On 04/27/2010 05:50 PM, Janos Haar wrote:
> >
> > ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> > To: "Janos Haar" <janos.haar@netcenter.hu>
> > Cc: <linux-raid@vger.kernel.org>
> > Sent: Monday, April 26, 2010 6:53 PM
> > Subject: Re: Suggestion needed for fixing RAID6
> >
> >
> >> On 04/26/2010 02:52 PM, Janos Haar wrote:
> >>>
> >>> Oops, you are right!
> >>> It was my mistake.
> >>> Sorry, i will try it again, to support 2 drives with dm-cow.
> >>> I will try it.
> >>
> >> Great! post here the results... the dmesg in particular.
> >> The dmesg should contain multiple lines like this "raid5:md3: read
> >> error corrected ....."
> >> then you know it worked.
> >
> > I am affraid i am still right about that....
> >
> > ...
> > end_request: I/O error, dev sdh, sector 1667152256
> > raid5:md3: read error not correctable (sector 1662188168 on dm-1).
> > raid5: Disk failure on dm-1, disabling device.
> > raid5: Operation continuing on 10 devices.
>
> I think I can see a problem here:
> You had 11 active devices over 12 when you received the read error.
> At 11 devices over 12 your array is singly-degraded and this should be
> enough for raid6 to recompute the block from parity and perform the
> rewrite, correcting the read-error, but instead MD declared that it's
> impossible to correct the error, and dropped one more device (going to
> doubly-degraded).
>
> I think this is an MD bug, and I think I know where it is:
>
>
> --- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
> 19:52:17.000000000 +0100
> +++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>
> clear_bit(R5_UPTODATE, &sh->dev[i].flags);
> atomic_inc(&rdev->read_errors);
> - if (conf->mddev->degraded)
> + if (conf->mddev->degraded == conf->max_degraded)
> printk_rl(KERN_WARNING
> "raid5:%s: read error not correctable "
> "(sector %llu on %s).\n",
>
> ------------------------------------------------------
> (This is just compile-tested so try at your risk)
>
> I'd like to hear what Neil thinks of this...
I think you've found a real bug - thanks.
It would make the test '>=' rather than '==' as that is safer, otherwise I
agree.
> - if (conf->mddev->degraded)
> + if (conf->mddev->degraded >= conf->max_degraded)
Thanks,
NeilBrown
>
> The problem here (apart from the erroneous error message) is that if
> execution goes inside that "if" clause, it will eventually reach the
> md_error() statement some 30 lines below there, which will have the
> effect of dropping one further device further worsening the situation
> instead of recovering it, and this is not the correct behaviour in this
> case as far as I understand.
> At the current state raid6 behaves like if it was a raid5, effectively
> supporting only one failed disk.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 1:37 ` Neil Brown
@ 2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 2:12 ` Neil Brown
2010-04-28 12:57 ` MRK
1 sibling, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-28 2:02 UTC (permalink / raw)
To: Neil Brown; +Cc: MRK, Janos Haar, linux-raid
On Wed, 28 Apr 2010, Neil Brown wrote:
>> I think I can see a problem here:
>> You had 11 active devices over 12 when you received the read error.
>> At 11 devices over 12 your array is singly-degraded and this should be
>> enough for raid6 to recompute the block from parity and perform the
>> rewrite, correcting the read-error, but instead MD declared that it's
>> impossible to correct the error, and dropped one more device (going to
>> doubly-degraded).
>>
>> I think this is an MD bug, and I think I know where it is:
>>
>>
>> --- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
>> 19:52:17.000000000 +0100
>> +++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>
>> clear_bit(R5_UPTODATE, &sh->dev[i].flags);
>> atomic_inc(&rdev->read_errors);
>> - if (conf->mddev->degraded)
>> + if (conf->mddev->degraded == conf->max_degraded)
>> printk_rl(KERN_WARNING
>> "raid5:%s: read error not correctable "
>> "(sector %llu on %s).\n",
>>
>> ------------------------------------------------------
>> (This is just compile-tested so try at your risk)
>>
>> I'd like to hear what Neil thinks of this...
>
> I think you've found a real bug - thanks.
>
> It would make the test '>=' rather than '==' as that is safer, otherwise I
> agree.
>
>> - if (conf->mddev->degraded)
>> + if (conf->mddev->degraded >= conf->max_degraded)
If a raid6 device handling can reach this code path, could I also point
out that the message says "raid5" and that this is confusing if it's
referring to a degraded raid6?
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 2:02 ` Mikael Abrahamsson
@ 2010-04-28 2:12 ` Neil Brown
2010-04-28 2:30 ` Mikael Abrahamsson
0 siblings, 1 reply; 48+ messages in thread
From: Neil Brown @ 2010-04-28 2:12 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: MRK, Janos Haar, linux-raid
On Wed, 28 Apr 2010 04:02:39 +0200 (CEST)
Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Wed, 28 Apr 2010, Neil Brown wrote:
>
> >> I think I can see a problem here:
> >> You had 11 active devices over 12 when you received the read error.
> >> At 11 devices over 12 your array is singly-degraded and this should be
> >> enough for raid6 to recompute the block from parity and perform the
> >> rewrite, correcting the read-error, but instead MD declared that it's
> >> impossible to correct the error, and dropped one more device (going to
> >> doubly-degraded).
> >>
> >> I think this is an MD bug, and I think I know where it is:
> >>
> >>
> >> --- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
> >> 19:52:17.000000000 +0100
> >> +++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
> >> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
> >>
> >> clear_bit(R5_UPTODATE, &sh->dev[i].flags);
> >> atomic_inc(&rdev->read_errors);
> >> - if (conf->mddev->degraded)
> >> + if (conf->mddev->degraded == conf->max_degraded)
> >> printk_rl(KERN_WARNING
> >> "raid5:%s: read error not correctable "
> >> "(sector %llu on %s).\n",
> >>
> >> ------------------------------------------------------
> >> (This is just compile-tested so try at your risk)
> >>
> >> I'd like to hear what Neil thinks of this...
> >
> > I think you've found a real bug - thanks.
> >
> > It would make the test '>=' rather than '==' as that is safer, otherwise I
> > agree.
> >
> >> - if (conf->mddev->degraded)
> >> + if (conf->mddev->degraded >= conf->max_degraded)
>
> If a raid6 device handling can reach this code path, could I also point
> out that the message says "raid5" and that this is confusing if it's
> referring to a degraded raid6?
>
You could....
There are lots of places that say "raid5" where it could apply to raid4
or raid6 as well. Maybe I should change them all to 'raid456'...
NeilBrown
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 2:12 ` Neil Brown
@ 2010-04-28 2:30 ` Mikael Abrahamsson
2010-05-03 2:29 ` Neil Brown
0 siblings, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-28 2:30 UTC (permalink / raw)
To: Neil Brown; +Cc: MRK, Janos Haar, linux-raid
On Wed, 28 Apr 2010, Neil Brown wrote:
> There are lots of places that say "raid5" where it could apply to raid4
> or raid6 as well. Maybe I should change them all to 'raid456'...
That sounds like a good idea, or just call it "raid:" or "raid4/5/6".
Don't know where we are in the stable kernel release cycle, but it would
be super if this could make it in by next cycle, this code is handling the
fault scenario that made me go from raid5 to raid6 :)
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
@ 2010-04-28 12:57 ` MRK
2010-04-28 13:32 ` Janos Haar
1 sibling, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-28 12:57 UTC (permalink / raw)
To: Neil Brown; +Cc: Janos Haar, linux-raid
On 04/28/2010 03:37 AM, Neil Brown wrote:
>> --- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
>> 19:52:17.000000000 +0100
>> +++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>
>> clear_bit(R5_UPTODATE,&sh->dev[i].flags);
>> atomic_inc(&rdev->read_errors);
>> - if (conf->mddev->degraded)
>> + if (conf->mddev->degraded == conf->max_degraded)
>> printk_rl(KERN_WARNING
>> "raid5:%s: read error not correctable "
>> "(sector %llu on %s).\n",
>>
>> ------------------------------------------------------
>> (This is just compile-tested so try at your risk)
>>
>> I'd like to hear what Neil thinks of this...
>>
> I think you've found a real bug - thanks.
>
> It would make the test '>=' rather than '==' as that is safer, otherwise I
> agree.
>
>
>> - if (conf->mddev->degraded)
>> + if (conf->mddev->degraded>= conf->max_degraded)
>>
Right, agreed...
> Thanks,
> NeilBrown
>
Ok then I'll post a more official patch in a separate email shortly, thanks
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 12:57 ` MRK
@ 2010-04-28 13:32 ` Janos Haar
2010-04-28 14:19 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-28 13:32 UTC (permalink / raw)
To: MRK; +Cc: linux-raid, Neil Brown
MRK, Neil,
Please let me have one wish:
Please write down my name to the kernel tree with a note i was who reported
and helped to track down this. :-)
Thanks.
Janos Haar
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Neil Brown" <neilb@suse.de>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-raid@vger.kernel.org>
Sent: Wednesday, April 28, 2010 2:57 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/28/2010 03:37 AM, Neil Brown wrote:
>>> --- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
>>> 19:52:17.000000000 +0100
>>> +++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000
>>> +0200
>>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>>
>>> clear_bit(R5_UPTODATE,&sh->dev[i].flags);
>>> atomic_inc(&rdev->read_errors);
>>> - if (conf->mddev->degraded)
>>> + if (conf->mddev->degraded == conf->max_degraded)
>>> printk_rl(KERN_WARNING
>>> "raid5:%s: read error not
>>> correctable "
>>> "(sector %llu on %s).\n",
>>>
>>> ------------------------------------------------------
>>> (This is just compile-tested so try at your risk)
>>>
>>> I'd like to hear what Neil thinks of this...
>>>
>> I think you've found a real bug - thanks.
>>
>> It would make the test '>=' rather than '==' as that is safer, otherwise
>> I
>> agree.
>>
>>
>>> - if (conf->mddev->degraded)
>>> + if (conf->mddev->degraded>= conf->max_degraded)
>>>
>
> Right, agreed...
>
>> Thanks,
>> NeilBrown
>>
>
> Ok then I'll post a more official patch in a separate email shortly,
> thanks
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 13:32 ` Janos Haar
@ 2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
0 siblings, 2 replies; 48+ messages in thread
From: MRK @ 2010-04-28 14:19 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid, Neil Brown
On 04/28/2010 03:32 PM, Janos Haar wrote:
> MRK, Neil,
>
> Please let me have one wish:
> Please write down my name to the kernel tree with a note i was who
> reported and helped to track down this. :-)
>
> Thanks.
> Janos Haar
Ok I did
However it would be nice if you can actually test the patch and confirm
that it solves your problem, starting with the raid6 array in
singly-degraded mode like you did yesterday. Then I think we can add one
further line on top:
Tested-by: Janos Haar <janos.haar@netcenter.hu>
before Neil (hopefully) acks it. Testing is needed anyway before pushing
it to mainline, I think...
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 14:19 ` MRK
@ 2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-28 14:51 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <gabriele.trombetti@gmail.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>; "Neil Brown" <neilb@suse.de>
Sent: Wednesday, April 28, 2010 4:19 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/28/2010 03:32 PM, Janos Haar wrote:
>> MRK, Neil,
>>
>> Please let me have one wish:
>> Please write down my name to the kernel tree with a note i was who
>> reported and helped to track down this. :-)
>>
>> Thanks.
>> Janos Haar
>
> Ok I did
> However it would be nice if you can actually test the patch and confirm
> that it solves your problem, starting with the raid6 array in
> singly-degraded mode like you did yesterday. Then I think we can add one
> further line on top:
>
> Tested-by: Janos Haar <janos.haar@netcenter.hu>
>
> before Neil (hopefully) acks it. Testing is needed anyway before pushing
> it to mainline, I think...
I am allready working on......
Please give me some time....
Janos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
@ 2010-04-29 7:55 ` Janos Haar
2010-04-29 15:22 ` MRK
1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-29 7:55 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
sdg4[6
] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
[UUU_UUU_UUUU]
[===========>.........] recovery = 56.8% (831095108/1462653888)
finish=50
19.8min speed=2096K/sec
Drive dropped again with this patch!
+ the kernel freezed.
(I will try to get more info...)
Janos
----- Original Message -----
From: "MRK" <**************>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>; "Neil Brown" <neilb@suse.de>
Sent: Wednesday, April 28, 2010 4:19 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/28/2010 03:32 PM, Janos Haar wrote:
>> MRK, Neil,
>>
>> Please let me have one wish:
>> Please write down my name to the kernel tree with a note i was who
>> reported and helped to track down this. :-)
>>
>> Thanks.
>> Janos Haar
>
> Ok I did
> However it would be nice if you can actually test the patch and confirm
> that it solves your problem, starting with the raid6 array in
> singly-degraded mode like you did yesterday. Then I think we can add one
> further line on top:
>
> Tested-by: Janos Haar <janos.haar@netcenter.hu>
>
> before Neil (hopefully) acks it. Testing is needed anyway before pushing
> it to mainline, I think...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-29 7:55 ` Janos Haar
@ 2010-04-29 15:22 ` MRK
2010-04-29 21:07 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-29 15:22 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/29/2010 09:55 AM, Janos Haar wrote:
>
> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8]
> dm-1[13](F) sdg4[6
> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
> [UUU_UUU_UUUU]
> [===========>.........] recovery = 56.8% (831095108/1462653888)
> finish=50
> 19.8min speed=2096K/sec
>
> Drive dropped again with this patch!
> + the kernel freezed.
> (I will try to get more info...)
>
> Janos
Hmm too bad :-( it seems it still doesn't work, sorry for that
I suppose the kernel didn't freeze immediately after disabling the drive
or you wouldn't have had the chance to cat /proc/mdstat...
Hence dmesg messages might have gone to /var/log/messages or something.
Can you look there to see if there is any interesting message to post here?
Did the COW device fill up at least a bit?
Also: you know that if you disable graphics on the server
("/etc/init.d/gdm stop" or something like that) you usually can see the
stack trace of the kernel panic on screen when it hangs (unless terminal
was blank for powersaving, which you can disable too). You can take a
photo of that one (or write it down but it will be long) to so maybe
somebody can understand why it hanged. You might be even obtain the
stack trace through a serial port but that will take more effort.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-29 15:22 ` MRK
@ 2010-04-29 21:07 ` Janos Haar
2010-04-29 23:00 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-29 21:07 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 29, 2010 5:22 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>
>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
>> sdg4[6
>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
>> [UUU_UUU_UUUU]
>> [===========>.........] recovery = 56.8% (831095108/1462653888)
>> finish=50
>> 19.8min speed=2096K/sec
>>
>> Drive dropped again with this patch!
>> + the kernel freezed.
>> (I will try to get more info...)
>>
>> Janos
>
> Hmm too bad :-( it seems it still doesn't work, sorry for that
>
> I suppose the kernel didn't freeze immediately after disabling the drive
> or you wouldn't have had the chance to cat /proc/mdstat...
this was this command in putty.exe window:
watch "cat /proc/mdstat ; du -h /snap*"
I think it have crashed soon.
I had no time to recognize what happened and exit from the watch.
>
> Hence dmesg messages might have gone to /var/log/messages or something.
> Can you look there to see if there is any interesting message to post
> here?
Yes, i know that.
The crash was not written up unfortunately.
But there is some info:
(some UNC reported from sdh)
....
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: res
51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key : Medium
Error [current] [descriptor]
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with sense
descriptors (in hex):
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 72 03 11 04 00 00 00 0c 00
0a 80 00 00 00 00 00
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 63 5e c0 27
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense:
Unrecovered read error - auto reallocate failed
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev sdh,
sector 1667153959
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189872 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189880 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189888 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189896 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189904 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189912 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189920 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189928 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189936 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189944 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect is
off
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168
512-byte hardware sectors: (1.50 TB/1.36 TiB)
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect is
off
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.
> Did the COW device fill up at least a bit?
The initial size is 1.1MB, and what we wants to see is only some kbytes...
I don't know exactly.
Next time i will try to reduce the initial size to 16KByte.
>
> Also: you know that if you disable graphics on the server
> ("/etc/init.d/gdm stop" or something like that) you usually can see the
> stack trace of the kernel panic on screen when it hangs (unless terminal
> was blank for powersaving, which you can disable too). You can take a
> photo of that one (or write it down but it will be long) to so maybe
> somebody can understand why it hanged. You might be even obtain the stack
> trace through a serial port but that will take more effort.
This pc based server have no graphic card at all. :-) (this is one of my
freak ideas)
And the terminal is redirected to the com1.
If i really want, i can catch this with serial cable, but i think the log
should be enough from the messages file.
Thanks,
Janos
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-29 21:07 ` Janos Haar
@ 2010-04-29 23:00 ` MRK
2010-04-30 6:17 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-29 23:00 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/29/2010 11:07 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Thursday, April 29, 2010 5:22 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>>
>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8]
>>> dm-1[13](F) sdg4[6
>>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>>> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
>>> [UUU_UUU_UUUU]
>>> [===========>.........] recovery = 56.8%
>>> (831095108/1462653888) finish=50
>>> 19.8min speed=2096K/sec
>>>
>>> Drive dropped again with this patch!
>>> + the kernel freezed.
>>> (I will try to get more info...)
>>>
>>> Janos
>>
>> Hmm too bad :-( it seems it still doesn't work, sorry for that
>>
>> I suppose the kernel didn't freeze immediately after disabling the
>> drive or you wouldn't have had the chance to cat /proc/mdstat...
>
> this was this command in putty.exe window:
> watch "cat /proc/mdstat ; du -h /snap*"
>
good idea...
> I think it have crashed soon.
> I had no time to recognize what happened and exit from the watch.
>
>>
>> Hence dmesg messages might have gone to /var/log/messages or
>> something. Can you look there to see if there is any interesting
>> message to post here?
>
> Yes, i know that.
> The crash was not written up unfortunately.
> But there is some info:
>
> (some UNC reported from sdh)
> ....
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: res
> 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key :
> Medium Error [current] [descriptor]
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with
> sense descriptors (in hex):
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 72 03 11 04 00 00 00
> 0c 00 0a 80 00 00 00 00 00
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 63 5e c0 27
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense:
> Unrecovered read error - auto reallocate failed
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev
> sdh, sector 1667153959
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189872 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189880 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189888 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189896 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189904 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189912 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189920 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189928 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189936 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189944 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write
> Protect is off
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168
> 512-byte hardware sectors: (1.50 TB/1.36 TiB)
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write
> Protect is off
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
> Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.
Hmm what strange...
I don't see the message "Disk failure on %s, disabling device" \n
"Operation continuing on %d devices" in your log.
In MD raid456 the ONLY place where a disk is set faulty is this (file
raid5.c):
----------------------
set_bit(Faulty, &rdev->flags);
printk(KERN_ALERT
"raid5: Disk failure on %s, disabling device.\n"
"raid5: Operation continuing on %d devices.\n",
bdevname(rdev->bdev,b), conf->raid_disks -
mddev->degraded);
----------------------
( which is called by md_error() )
As you can see, just after disabling the device it prints the dmesg message.
I don't understand how you could catch a cat /proc/mdstat already
reporting the disk as failed, and still not seeing the message in the
/var/log/messages .
But you do see messages that should come chronologically after that one.
The errors like:
"Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189872 on dm-1)."
can now (after the patch) be generated only after raid-6 is in
doubly-degraded state. I don't understand how those errors could become
visible before the message telling that MD is disabling the device.
To make the thing more strange, if raid-6 is in doubly-degraded state it
means dm-1/sdh is disabled, but if dm-1/sdh is disabled MD should not
have read anything from there. I mean there shouldn't have been any read
error because there shouldn't have been any read.
You are sure that
a) this dmesg you reported really is from your last run of the resync
b) above or below the messages you report there is no "Disk failure on
..., disabling device" string?
Last thing, your system might have crashed because of the sd / SATA
driver (instead of that being a direct bug of MD). You see, those are
the last messages before the reboot, and the message about write cache
is repeated. The driver might have tried to reset the drive, maybe
quickly more than once. I'm not sure... but that could be a reason.
Exactly what kernel version are you running now, after applying my patch?
At the moment I don't have more ideas, sorry. I hope somebody else replies.
In the meanwhile you might run it through the serial cable if you have
some time. Maybe you can get more dmesg stuff that couldn't make it
through /var/log/messages. And you would also get the kernel panic.
Actually for the dmesg I think you can try with a "watch dmesg -c" via
putty.
Good luck
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-29 23:00 ` MRK
@ 2010-04-30 6:17 ` Janos Haar
2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
0 siblings, 2 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-30 6:17 UTC (permalink / raw)
To: MRK; +Cc: Neil Brown, linux-raid
Hello,
OK, MRK you are right (again).
There was some line in the messages wich avoids my attention.
The entire log is here:
http://download.netcenter.hu/bughunt/20100430/messages
The dm founds invalid my cow devices, but i don't know why at this time.
My setup script looks like this: "create-cow":
rm -f /snapshot.bin
rm -f /snapshot2.bin
dd_rescue -v /dev/zero /snapshot.bin -m 4k -S 2000G
dd_rescue -v /dev/zero /snapshot2.bin -m 4k -S 2000G
losetup /dev/loop3 /snapshot.bin
losetup /dev/loop4 /snapshot2.bin
dd if=/dev/zero of=/dev/loop3 bs=1M count=1
dd if=/dev/zero of=/dev/loop4 bs=1M count=1
echo 0 $(blockdev --getsize /dev/sde4) \
snapshot /dev/sde4 /dev/loop3 p 8 | \
dmsetup create cow
echo 0 $(blockdev --getsize /dev/sdh4) \
snapshot /dev/sdh4 /dev/loop4 p 8 | \
dmsetup create cow2
Now i have the last state, and there is more space left on the disk, and the
snapshots are smalls:
du -h /snapshot*
1.1M /snapshot2.bin
1.1M /snapshot.bin
My new kernel is the same like the old one, only diff is the md-patch.
Additionally i need to note, my kernel have only one additional patch wich
differs from the normal tree, this patch is the pdflush-patch.
(I can set the number of pdflushd's number in the proc.)
I can try again, if there is any new idea, but it would be really good to do
some trick with bitmaps or set the recovery's start point or something
similar, because every time i need >16 hour to get the first poit where the
raid do something interesting....
Neil,
Can you say something useful about this?
Thanks again,
Janos
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, April 30, 2010 1:00 AM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/29/2010 11:07 PM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>> To: "Janos Haar" <janos.haar@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Thursday, April 29, 2010 5:22 PM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>>
>>> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>>>
>>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8]
>>>> dm-1[13](F) sdg4[6
>>>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>>>> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
>>>> [UUU_UUU_UUUU]
>>>> [===========>.........] recovery = 56.8% (831095108/1462653888)
>>>> finish=50
>>>> 19.8min speed=2096K/sec
>>>>
>>>> Drive dropped again with this patch!
>>>> + the kernel freezed.
>>>> (I will try to get more info...)
>>>>
>>>> Janos
>>>
>>> Hmm too bad :-( it seems it still doesn't work, sorry for that
>>>
>>> I suppose the kernel didn't freeze immediately after disabling the drive
>>> or you wouldn't have had the chance to cat /proc/mdstat...
>>
>> this was this command in putty.exe window:
>> watch "cat /proc/mdstat ; du -h /snap*"
>>
>
> good idea...
>
>> I think it have crashed soon.
>> I had no time to recognize what happened and exit from the watch.
>>
>>>
>>> Hence dmesg messages might have gone to /var/log/messages or something.
>>> Can you look there to see if there is any interesting message to post
>>> here?
>>
>> Yes, i know that.
>> The crash was not written up unfortunately.
>> But there is some info:
>>
>> (some UNC reported from sdh)
>> ....
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: res
>> 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result:
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key :
>> Medium Error [current] [descriptor]
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with sense
>> descriptors (in hex):
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 72 03 11 04 00 00 00 0c
>> 00 0a 80 00 00 00 00 00
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: 63 5e c0 27
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense:
>> Unrecovered read error - auto reallocate failed
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev sdh,
>> sector 1667153959
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189872 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189880 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189888 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189896 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189904 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189912 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189920 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189928 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189936 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
>> correctable (sector 1662189944 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect
>> is off
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
>> enabled, read cache: enabled, doesn't support DPO or FUA
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168
>> 512-byte hardware sectors: (1.50 TB/1.36 TiB)
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect
>> is off
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache:
>> enabled, read cache: enabled, doesn't support DPO or FUA
>> Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.
>
> Hmm what strange...
> I don't see the message "Disk failure on %s, disabling device" \n
> "Operation continuing on %d devices" in your log.
>
> In MD raid456 the ONLY place where a disk is set faulty is this (file
> raid5.c):
>
> ----------------------
> set_bit(Faulty, &rdev->flags);
> printk(KERN_ALERT
> "raid5: Disk failure on %s, disabling device.\n"
> "raid5: Operation continuing on %d devices.\n",
> bdevname(rdev->bdev,b), conf->raid_disks -
> mddev->degraded);
> ----------------------
> ( which is called by md_error() )
>
> As you can see, just after disabling the device it prints the dmesg
> message.
> I don't understand how you could catch a cat /proc/mdstat already
> reporting the disk as failed, and still not seeing the message in the
> /var/log/messages .
>
> But you do see messages that should come chronologically after that one.
> The errors like:
> "Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189872 on dm-1)."
> can now (after the patch) be generated only after raid-6 is in
> doubly-degraded state. I don't understand how those errors could become
> visible before the message telling that MD is disabling the device.
>
> To make the thing more strange, if raid-6 is in doubly-degraded state it
> means dm-1/sdh is disabled, but if dm-1/sdh is disabled MD should not have
> read anything from there. I mean there shouldn't have been any read error
> because there shouldn't have been any read.
>
> You are sure that
> a) this dmesg you reported really is from your last run of the resync
> b) above or below the messages you report there is no "Disk failure on
> ..., disabling device" string?
>
> Last thing, your system might have crashed because of the sd / SATA driver
> (instead of that being a direct bug of MD). You see, those are the last
> messages before the reboot, and the message about write cache is repeated.
> The driver might have tried to reset the drive, maybe quickly more than
> once. I'm not sure... but that could be a reason.
>
> Exactly what kernel version are you running now, after applying my patch?
>
> At the moment I don't have more ideas, sorry. I hope somebody else
> replies.
> In the meanwhile you might run it through the serial cable if you have
> some time. Maybe you can get more dmesg stuff that couldn't make it
> through /var/log/messages. And you would also get the kernel panic.
> Actually for the dmesg I think you can try with a "watch dmesg -c" via
> putty.
>
> Good luck
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-30 6:17 ` Janos Haar
@ 2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
1 sibling, 0 replies; 48+ messages in thread
From: MRK @ 2010-04-30 23:54 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 04/30/2010 08:17 AM, Janos Haar wrote:
> Hello,
>
> OK, MRK you are right (again).
> There was some line in the messages wich avoids my attention.
> The entire log is here:
> http://download.netcenter.hu/bughunt/20100430/messages
>
Ah here we go:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10 devices.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.
Firstly I'm not totally sure of how DM passed the information of the
device failing to MD. There is no error message about this on MD. If it
was a read error, MD should have performed the rewrite but this
apparently did not happen (the error message for a failed rewrite by MD
I think is "read error NOT corrected!!"). But anyway...
> The dm founds invalid my cow devices, but i don't know why at this time.
>
I have just had a brief look ad DM code. I understand like 1% of it
right now, however I am thinking that in a not-perfectly-optimized way
of doing things, if you specified 8 sectors (8x512b = 4k, which you did)
granularity during the creation of your cow and cow2 devices, whenever
you write to the COW device, DM might do the thing in 2 steps:
1- copy 8 (or multiple of 8) sectors from the HD to the cow device,
enough to cover the area to which you are writing
2- overwrite such 8 sectors with the data coming from MD.
Of course this is not optimal in case you are writing exactly 8 sectors
with MD, and these are aligned to the ones that DM uses (both things I
think are true in your case) because DM could have skipped #1 in this case.
However supposing DM is not so smart and it indeed does not skip step
#1, then I think I understand why it disables the device: it's because
#1 fails with read error and DM does not know how to handle the
situation in that case in general. If you had written a smaller amount
with MD such as 512 bytes, if step #1 fails, what do you write in the
other 7 sectors around it? The right semantics is not obvious so they
disable the device.
Firstly you could try with 1 sector granularity instead of 8, during the
creation of dm cow devices. This MIGHT work around the issue if DM is at
least a bit smart. Right now it's not obvious to me where in the is code
the logic for the COW copying. Maybe tomorrow I will understand this.
If this doesn't work, the best thing is probably if you can write to the
DM mailing list asking why it behaves like this and if they can guess a
workaround. You can keep me in cc, I'm interested.
> [CUT]
>
> echo 0 $(blockdev --getsize /dev/sde4) \
> snapshot /dev/sde4 /dev/loop3 p 8 | \
> dmsetup create cow
>
> echo 0 $(blockdev --getsize /dev/sdh4) \
> snapshot /dev/sdh4 /dev/loop4 p 8 | \
> dmsetup create cow2
See, you are creating it with 8 sectors granularity... try with 1.
> I can try again, if there is any new idea, but it would be really good
> to do some trick with bitmaps or set the recovery's start point or
> something similar, because every time i need >16 hour to get the first
> poit where the raid do something interesting....
>
> Neil,
> Can you say something useful about this?
>
I just looked into this and it seems this feature is already there.
See if you have these files:
/sys/block/md3/md/sync_min and sync_max
Those are the starting and ending sector.
But keep in mind you have to enter them in multiples of the chunk size
so if your chunk is e.g. 1024k then you need to enter multiples of 2048
(sectors).
Enter the value before starting the sync. Or stop the sync by entering
"idle" in sync_action, then change the sync_min value, then restart the
sync entering "check" in sync_action. It should work, I just tried it on
my comp.
Good luck
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
@ 2010-05-01 9:37 ` Janos Haar
2010-05-01 17:17 ` MRK
0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-05-01 9:37 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
Hello,
Now i am tried with 1 sector snapshot size.
the result was the same
first the snapshot have been invalidated, than DM dropped from the raid.
The next was this:
md3 : active raid6 sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[12](F) sdg4[6]
sdf4[5]
dm-0[4] sdc4[2] sdb4[1] sda4[0]
14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
[UUU_UUU_UUUU]
[===================>.] resync = 99.9% (1462653628/1462653888)
finish=0.0
min speed=2512K/sec
The sync progress bar jumped from 58.8% to 99.9% the speed falls, the
1462653628/1462653888 is freezed in this point.
I can do dmesg once by hand, than save the dmesg output to file, but the
system crashed after this.
The entire story was about 1 minute.
Whoever, the sync_min option generally solves my problem, becasue i can
build up the missing disk from the 90% wich is good enough for me. :-)
If somebody is interested about playing more with this system, i still have
some days for it, but i am not interested anymore to trace the md-dm
behavior in this situation....
Additionally, i don't want to put in risk the data if not really needed....
Thanks a lot,
Janos Haar
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, May 01, 2010 1:54 AM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/30/2010 08:17 AM, Janos Haar wrote:
>> Hello,
>>
>> OK, MRK you are right (again).
>> There was some line in the messages wich avoids my attention.
>> The entire log is here:
>> http://download.netcenter.hu/bughunt/20100430/messages
>>
>
> Ah here we go:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10
> devices.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.
>
> Firstly I'm not totally sure of how DM passed the information of the
> device failing to MD. There is no error message about this on MD. If it
> was a read error, MD should have performed the rewrite but this apparently
> did not happen (the error message for a failed rewrite by MD I think is
> "read error NOT corrected!!"). But anyway...
>
>> The dm founds invalid my cow devices, but i don't know why at this time.
>>
>
> I have just had a brief look ad DM code. I understand like 1% of it right
> now, however I am thinking that in a not-perfectly-optimized way of doing
> things, if you specified 8 sectors (8x512b = 4k, which you did)
> granularity during the creation of your cow and cow2 devices, whenever you
> write to the COW device, DM might do the thing in 2 steps:
>
> 1- copy 8 (or multiple of 8) sectors from the HD to the cow device, enough
> to cover the area to which you are writing
> 2- overwrite such 8 sectors with the data coming from MD.
>
> Of course this is not optimal in case you are writing exactly 8 sectors
> with MD, and these are aligned to the ones that DM uses (both things I
> think are true in your case) because DM could have skipped #1 in this
> case.
> However supposing DM is not so smart and it indeed does not skip step #1,
> then I think I understand why it disables the device: it's because #1
> fails with read error and DM does not know how to handle the situation in
> that case in general. If you had written a smaller amount with MD such as
> 512 bytes, if step #1 fails, what do you write in the other 7 sectors
> around it? The right semantics is not obvious so they disable the device.
>
> Firstly you could try with 1 sector granularity instead of 8, during the
> creation of dm cow devices. This MIGHT work around the issue if DM is at
> least a bit smart. Right now it's not obvious to me where in the is code
> the logic for the COW copying. Maybe tomorrow I will understand this.
>
> If this doesn't work, the best thing is probably if you can write to the
> DM mailing list asking why it behaves like this and if they can guess a
> workaround. You can keep me in cc, I'm interested.
>
>
>> [CUT]
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>> snapshot /dev/sde4 /dev/loop3 p 8 | \
>> dmsetup create cow
>>
>> echo 0 $(blockdev --getsize /dev/sdh4) \
>> snapshot /dev/sdh4 /dev/loop4 p 8 | \
>> dmsetup create cow2
>
> See, you are creating it with 8 sectors granularity... try with 1.
>
>> I can try again, if there is any new idea, but it would be really good to
>> do some trick with bitmaps or set the recovery's start point or something
>> similar, because every time i need >16 hour to get the first poit where
>> the raid do something interesting....
>>
>> Neil,
>> Can you say something useful about this?
>>
>
> I just looked into this and it seems this feature is already there.
> See if you have these files:
> /sys/block/md3/md/sync_min and sync_max
> Those are the starting and ending sector.
> But keep in mind you have to enter them in multiples of the chunk size so
> if your chunk is e.g. 1024k then you need to enter multiples of 2048
> (sectors).
> Enter the value before starting the sync. Or stop the sync by entering
> "idle" in sync_action, then change the sync_min value, then restart the
> sync entering "check" in sync_action. It should work, I just tried it on
> my comp.
>
> Good luck
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-01 9:37 ` Janos Haar
@ 2010-05-01 17:17 ` MRK
2010-05-01 21:44 ` Janos Haar
0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-05-01 17:17 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 05/01/2010 11:37 AM, Janos Haar wrote:
> Whoever, the sync_min option generally solves my problem, becasue i
> can build up the missing disk from the 90% wich is good enough for me. :-)
Are you sure? How do you do that?
Resyncing a specific part is easy, replicating to a spare a specific
part is not. If the disk you want to replace was 100% made of parity
data that would be easy, you do that with a resync after replacing the
disk, maybe multiple resyncs region by region, but in your case it is
not made of only parity data. Only raid3 and 4 separate parity data from
actual data, raid6 instead finely interleaves them.
If you are thinking about replacing a disk with a new one (full of
zeroes) and then resyncing manually region by region, you will destroy
your data. Because in those chunks where the new disk acts as "actual
data" the parity will be recomputed based on your newly introduced
zeroes, and it will overwrite the parity data you had on the good disks,
making recovery impossible from that point on.
You really need to do the replication to a spare as a single step, from
the beginning to the end. You cannot use sync_min and sync_max for that
purpose.
I think... unless bitmaps really do some magic in this, flagging the
newly introduced disk as more recent than parity data... but do they
really do this? people correct me if I'm wrong.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-01 17:17 ` MRK
@ 2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
0 siblings, 2 replies; 48+ messages in thread
From: Janos Haar @ 2010-05-01 21:44 UTC (permalink / raw)
To: MRK; +Cc: linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, May 01, 2010 7:17 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 05/01/2010 11:37 AM, Janos Haar wrote:
>> Whoever, the sync_min option generally solves my problem, becasue i can
>> build up the missing disk from the 90% wich is good enough for me. :-)
>
> Are you sure? How do you do that?
> Resyncing a specific part is easy, replicating to a spare a specific part
> is not. If the disk you want to replace was 100% made of parity data that
> would be easy, you do that with a resync after replacing the disk, maybe
> multiple resyncs region by region, but in your case it is not made of only
> parity data. Only raid3 and 4 separate parity data from actual data, raid6
> instead finely interleaves them.
> If you are thinking about replacing a disk with a new one (full of zeroes)
> and then resyncing manually region by region, you will destroy your data.
> Because in those chunks where the new disk acts as "actual data" the
> parity will be recomputed based on your newly introduced zeroes, and it
> will overwrite the parity data you had on the good disks, making recovery
> impossible from that point on.
> You really need to do the replication to a spare as a single step, from
> the beginning to the end. You cannot use sync_min and sync_max for that
> purpose.
You are right again, or at least close. :-)
I have the missing sdd4 wich is 98% correctly rebuilt allready.
But you are right, because the sync_min option not works for rebuilding
disks, only for resyncing. (it is too smart to do the trick for me)
> I think... unless bitmaps really do some magic in this, flagging the newly
> introduced disk as more recent than parity data... but do they really do
> this? people correct me if I'm wrong.
Bitmap manipulation should work.
I think i know how to do that, but the data is more important than try it on
my own.
I want to wait until somebody support this.
... or somebody have another good idea?
The general problem is, i have one single-degraded RAID6 + 2 badblock disk
inside wich have bads in different location.
The big question is how to keep the integrity or how to do the rebuild by 2
step instead of one continous?
Thanks again
Janos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-01 21:44 ` Janos Haar
@ 2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
1 sibling, 0 replies; 48+ messages in thread
From: MRK @ 2010-05-02 23:05 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid
On 05/01/2010 11:44 PM, Janos Haar wrote:
>
> But you are right, because the sync_min option not works for
> rebuilding disks, only for resyncing. (it is too smart to do the trick
> for me)
>
>> I think... unless bitmaps really do some magic in this, flagging the
>> newly introduced disk as more recent than parity data... but do they
>> really do this? people correct me if I'm wrong.
>
> Bitmap manipulation should work.
> I think i know how to do that, but the data is more important than try
> it on my own.
> I want to wait until somebody support this.
> ... or somebody have another good idea?
Firstly: do you have any backup of your data? If not, before doing any
experiment I suggest that you back up important stuff. This can be done
with rsync, and reassembling the array every time it goes down. I
suggest to put the array in readonly mode (mdadm --readonly /dev/md3):
this should prevent resyncs from starting automatically, and AFAIR even
prevent drives being dropped because of read errors (but you can't use
it during resyncs or rebuilds). Resyncs are bad because they will
eventually bring down your array. Don't use DM when doing this.
Now, for the real thing, instead of experimenting with bitmaps, I
suggest you try and see if the normal MD resync works now. If that works
then you can do the normal rebuild.
*Pls note that: DM should not be needed!* - I know that you have tried
resyncing with DM COW under MD and that one doesn't work well in this
case, but in fact DM should not be needed.
We pointed you to DM around Apr 23rd because at that time we thought
that your drives were dropping for uncorrectable read error, but we had
guessed wrong.
The general MD phylosophy is that if there is enough parity
informations, drives are not dropped just for a read error. Upon read
error MD recomputes the value of the sector from the parity information,
and then it attempts rewriting the block in place. During this rewrite
the drive performs a reallocation, moving the block to a hidden spare
region. If this rewrite fails it means that the drive is out of spare
sectors and this is considered to be a major failure for MD, and only at
that point the drive is dropped.
So we thought this was the reason also in your case, but we were wrong,
in your case it was because of an MD bug, which is the one for which I
submitted the patch.
So it should work now (without DM). And I think this is the safest thing
you can try. Having a backup is always better though.
So start the resync without DM and see if it goes through to the end
without dropping drives. You can use sync_min to cut the dead times.
For max safety you could first try resyncing only one chunk from the
region of the damaged sectors, so to provoke only a minimum amount of
rewrites. Set the sync_min to the location of the errors, and sync_max
to just one chunk above. See what happens...
If it rewrites correctly and the drive is not dropped, then run "check"
again on the same region and see if "cat /sys/block/md3/md/mismatch_cnt"
still returns zero (or the value it was before the rewrite). If it is
zero (or anyway has not changed value) it means the block was really
rewritten with the correct value: recovery of one sector really works
for raid6 in singly-degraded state. Then the procedure is safe, as far
as I understand, and you can go ahead on the other chunks.
When all damaged sectors are reallocated, there are no more read errors,
and the mismatch_cnt is still at zero, you can go ahead replacing the
defective drive.
There are a few reasons that can still make the resync fail if we are
really unlucky, but dmesg should point us to the right direction in that
case.
Also remember that the patch still needs testing... currently it is not
really tested because DM drops the drive before MD. We would need to
know if raid6 is behaving like a raid6 now or it's still behaving like a
raid5...
Thank you
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
@ 2010-05-03 2:17 ` Neil Brown
2010-05-03 10:04 ` MRK
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
1 sibling, 2 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 2:17 UTC (permalink / raw)
To: Janos Haar; +Cc: MRK, linux-raid
On Sat, 1 May 2010 23:44:04 +0200
"Janos Haar" <janos.haar@netcenter.hu> wrote:
> The general problem is, i have one single-degraded RAID6 + 2 badblock disk
> inside wich have bads in different location.
> The big question is how to keep the integrity or how to do the rebuild by 2
> step instead of one continous?
Once you have the fix that has already been discussed in this thread, the
only other problem I can see with this situation is if attempts to write good
data over the read-errors results in a write-error which causes the device to
be evicted from the array. And I think you have reported getting write
errors.
The following patch should address this issue for you. It is *not* a
general-purpose fix, but a specific fix to address an issue you are having.
It might be appropriate to make this configurable via sysfs, or possibly even
to try to auto-detect the situation and don't bother writing.
Longer term I want to add support for storing a bad-block-list per device
so that a write error just fails that block, not the whole device. I just
need to organise my time so that I make progress on that project.
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c181438..fd73929 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3427,6 +3427,12 @@ static void handle_stripe6(struct stripe_head *sh)
&& !test_bit(R5_LOCKED, &dev->flags)
&& test_bit(R5_UPTODATE, &dev->flags)
) {
+#if 1
+ /* We have recovered the data, but don't
+ * trust the device enough to write back
+ */
+ clear_bit(R5_ReadError, &dev->flags);
+#else
if (!test_bit(R5_ReWrite, &dev->flags)) {
set_bit(R5_Wantwrite, &dev->flags);
set_bit(R5_ReWrite, &dev->flags);
@@ -3438,6 +3444,7 @@ static void handle_stripe6(struct stripe_head *sh)
set_bit(R5_LOCKED, &dev->flags);
s.locked++;
}
+#endif
}
}
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-04-28 2:30 ` Mikael Abrahamsson
@ 2010-05-03 2:29 ` Neil Brown
0 siblings, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 2:29 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: MRK, Janos Haar, linux-raid
On Wed, 28 Apr 2010 04:30:05 +0200 (CEST)
Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Wed, 28 Apr 2010, Neil Brown wrote:
>
> > There are lots of places that say "raid5" where it could apply to raid4
> > or raid6 as well. Maybe I should change them all to 'raid456'...
>
> That sounds like a good idea, or just call it "raid:" or "raid4/5/6".
>
> Don't know where we are in the stable kernel release cycle, but it would
> be super if this could make it in by next cycle, this code is handling the
> fault scenario that made me go from raid5 to raid6 :)
>
We are very close to release of 2.6.34. I won't submit this before 2.6.34 is
released as it is not a regression and not technically a data-corruption
bug. However it will go into 2.6.35-rc1 and but submitted to -stable for
2.6.34.1 and probably other -stable kernels.
NeilBrown
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-03 2:17 ` Neil Brown
@ 2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
2010-05-03 21:02 ` Neil Brown
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
1 sibling, 2 replies; 48+ messages in thread
From: MRK @ 2010-05-03 10:04 UTC (permalink / raw)
To: Neil Brown; +Cc: Janos Haar, linux-raid
On 05/03/2010 04:17 AM, Neil Brown wrote:
> On Sat, 1 May 2010 23:44:04 +0200
> "Janos Haar"<janos.haar@netcenter.hu> wrote:
>
>
>> The general problem is, i have one single-degraded RAID6 + 2 badblock disk
>> inside wich have bads in different location.
>> The big question is how to keep the integrity or how to do the rebuild by 2
>> step instead of one continous?
>>
> Once you have the fix that has already been discussed in this thread, the
> only other problem I can see with this situation is if attempts to write good
> data over the read-errors results in a write-error which causes the device to
> be evicted from the array.
>
> And I think you have reported getting write
> errors.
>
His dmesg AFAIR has never reported any error of the kind "raid5:%s: read
error NOT corrected!! " (the error message you get on failed rewrite AFAIU)
Up to now (after my patch) he only tried with MD above DM-COW and DM was
dropping the drive on read error so I think MD didn't get any
opportunity to rewrite.
It is not clear to me what kind of error MD got from DM:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
I don't understand from what place the md_error() is called...
but also in this case it doesn't look like a rewrite error...
I think without DM COW it should probably work in his case.
Your new patch skips the rewriting and keeps the unreadable sectors,
right? So that the drive isn't dropped on rewrite...
> The following patch should address this issue for you.
> It is*not* a general-purpose fix, but a specific fix
[CUT]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
@ 2010-05-03 10:20 ` Janos Haar
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-05-03 10:20 UTC (permalink / raw)
To: MRK; +Cc: Neil Brown, linux-raid
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Neil Brown" <neilb@suse.de>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-raid@vger.kernel.org>
Sent: Monday, May 03, 2010 12:04 PM
Subject: Re: Suggestion needed for fixing RAID6
> On 05/03/2010 04:17 AM, Neil Brown wrote:
>> On Sat, 1 May 2010 23:44:04 +0200
>> "Janos Haar"<janos.haar@netcenter.hu> wrote:
>>
>>
>>> The general problem is, i have one single-degraded RAID6 + 2 badblock
>>> disk
>>> inside wich have bads in different location.
>>> The big question is how to keep the integrity or how to do the rebuild
>>> by 2
>>> step instead of one continous?
>>>
>> Once you have the fix that has already been discussed in this thread, the
>> only other problem I can see with this situation is if attempts to write
>> good
>> data over the read-errors results in a write-error which causes the
>> device to
>> be evicted from the array.
>>
>> And I think you have reported getting write
>> errors.
>>
>
> His dmesg AFAIR has never reported any error of the kind "raid5:%s: read
> error NOT corrected!! " (the error message you get on failed rewrite
> AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM was
> dropping the drive on read error so I think MD didn't get any opportunity
> to rewrite.
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
>
> I don't understand from what place the md_error() is called...
> but also in this case it doesn't look like a rewrite error...
>
> I think without DM COW it should probably work in his case.
>
> Your new patch skips the rewriting and keeps the unreadable sectors,
> right? So that the drive isn't dropped on rewrite...
>
>> The following patch should address this issue for you.
>> It is*not* a general-purpose fix, but a specific fix
> [CUT]
Just a little note:
I have 2 bad drives, one wich have bads at 54%, have >2500 UNC sectors, wich
is too much for trying it to repair, this drive is really failing....
The other have only 123 bads at 99% wich is a very small scratch on the
platter, now i am trying to fix this drive instead.
The repair-check sync process now runs, i will reply soon again...
Thanks,
Janos
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-03 10:04 ` MRK
@ 2010-05-03 10:21 ` MRK
2010-05-03 21:04 ` Neil Brown
2010-05-03 21:02 ` Neil Brown
1 sibling, 1 reply; 48+ messages in thread
From: MRK @ 2010-05-03 10:21 UTC (permalink / raw)
To: MRK, Neil Brown; +Cc: Janos Haar, linux-raid
On 05/03/2010 12:04 PM, MRK wrote:
> On 05/03/2010 04:17 AM, Neil Brown wrote:
>> On Sat, 1 May 2010 23:44:04 +0200
>> "Janos Haar"<janos.haar@netcenter.hu> wrote:
>>
>>> The general problem is, i have one single-degraded RAID6 + 2
>>> badblock disk
>>> inside wich have bads in different location.
>>> The big question is how to keep the integrity or how to do the
>>> rebuild by 2
>>> step instead of one continous?
>> Once you have the fix that has already been discussed in this thread,
>> the
>> only other problem I can see with this situation is if attempts to
>> write good
>> data over the read-errors results in a write-error which causes the
>> device to
>> be evicted from the array.
>>
>> And I think you have reported getting write
>> errors.
>
> His dmesg AFAIR has never reported any error of the kind "raid5:%s:
> read error NOT corrected!! " (the error message you get on failed
> rewrite AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM
> was dropping the drive on read error so I think MD didn't get any
> opportunity to rewrite.
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
>
> I don't understand from what place the md_error() is called...
> [CUT]
Oh and there is another issue I wanted to expose:
His last dmesg:
http://download.netcenter.hu/bughunt/20100430/messages
Much after the line:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
disabling device.
there are many lines like this:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
correctable (sector 1662189872 on dm-1).
How come MD still wants to read from a device it has disabled?
looks like a problem to me...
MD also scrubs failed devices during check?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
@ 2010-05-03 21:02 ` Neil Brown
1 sibling, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 21:02 UTC (permalink / raw)
To: MRK; +Cc: Janos Haar, linux-raid
On Mon, 03 May 2010 12:04:38 +0200
MRK <mrk@shiftmail.org> wrote:
> On 05/03/2010 04:17 AM, Neil Brown wrote:
> > On Sat, 1 May 2010 23:44:04 +0200
> > "Janos Haar"<janos.haar@netcenter.hu> wrote:
> >
> >
> >> The general problem is, i have one single-degraded RAID6 + 2 badblock disk
> >> inside wich have bads in different location.
> >> The big question is how to keep the integrity or how to do the rebuild by 2
> >> step instead of one continous?
> >>
> > Once you have the fix that has already been discussed in this thread, the
> > only other problem I can see with this situation is if attempts to write good
> > data over the read-errors results in a write-error which causes the device to
> > be evicted from the array.
> >
> > And I think you have reported getting write
> > errors.
> >
>
> His dmesg AFAIR has never reported any error of the kind "raid5:%s: read
> error NOT corrected!! " (the error message you get on failed rewrite AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM was
> dropping the drive on read error so I think MD didn't get any
> opportunity to rewrite.
Hmmm... fair enough.
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
>
> I don't understand from what place the md_error() is called...
I suspect it is from raid5_end_write_request. It looks like we don't print
any message when the re-write fails. Only if the read after the rewrite
fails.
> but also in this case it doesn't look like a rewrite error...
>
... so I suspect it is a rewrite error. Unless I missed something. What
message did you expect to see in the case of a re-write error?
> I think without DM COW it should probably work in his case.
>
> Your new patch skips the rewriting and keeps the unreadable sectors,
> right? So that the drive isn't dropped on rewrite...
Correct.
>
> > The following patch should address this issue for you.
> > It is*not* a general-purpose fix, but a specific fix
> [CUT]
NeilBrown
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6
2010-05-03 10:21 ` MRK
@ 2010-05-03 21:04 ` Neil Brown
0 siblings, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 21:04 UTC (permalink / raw)
To: MRK; +Cc: Janos Haar, linux-raid
On Mon, 03 May 2010 12:21:08 +0200
MRK <mrk@shiftmail.org> wrote:
> On 05/03/2010 12:04 PM, MRK wrote:
> > On 05/03/2010 04:17 AM, Neil Brown wrote:
> >> On Sat, 1 May 2010 23:44:04 +0200
> >> "Janos Haar"<janos.haar@netcenter.hu> wrote:
> >>
> >>> The general problem is, i have one single-degraded RAID6 + 2
> >>> badblock disk
> >>> inside wich have bads in different location.
> >>> The big question is how to keep the integrity or how to do the
> >>> rebuild by 2
> >>> step instead of one continous?
> >> Once you have the fix that has already been discussed in this thread,
> >> the
> >> only other problem I can see with this situation is if attempts to
> >> write good
> >> data over the read-errors results in a write-error which causes the
> >> device to
> >> be evicted from the array.
> >>
> >> And I think you have reported getting write
> >> errors.
> >
> > His dmesg AFAIR has never reported any error of the kind "raid5:%s:
> > read error NOT corrected!! " (the error message you get on failed
> > rewrite AFAIU)
> > Up to now (after my patch) he only tried with MD above DM-COW and DM
> > was dropping the drive on read error so I think MD didn't get any
> > opportunity to rewrite.
> >
> > It is not clear to me what kind of error MD got from DM:
> >
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> > Invalidating snapshot: Error reading/writing.
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> > disabling device.
> >
> > I don't understand from what place the md_error() is called...
> > [CUT]
>
> Oh and there is another issue I wanted to expose:
>
> His last dmesg:
> http://download.netcenter.hu/bughunt/20100430/messages
>
> Much after the line:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
>
> there are many lines like this:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189872 on dm-1).
>
> How come MD still wants to read from a device it has disabled?
> looks like a problem to me...
There are often many IO requests in flight at the same time. When one
returns with an error we might fail the device but there are still lots more
that have not yet completed. As they complete we might write messages about
them - even after we have reported the device as 'failed'. But we never
initiate an IO after the device has been marked 'faulty'.
NeilBrown
> MD also scrubs failed devices during check?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6 [SOLVED]
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20 ` Janos Haar
@ 2010-05-05 15:24 ` Janos Haar
2010-05-05 19:27 ` MRK
1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-05-05 15:24 UTC (permalink / raw)
To: MRK; +Cc: linux-raid, Neil Brown
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
>
> I don't understand from what place the md_error() is called...
> but also in this case it doesn't look like a rewrite error...
>
> I think without DM COW it should probably work in his case.
First sorry for delay.
Without DM, the original behavior-fix patch worked very well.
Neil is generally right about the drive should reallocate the bad sectors on
rewrite, but this is the ideal scenario wich is far from the real world
unfortunately....
I needed to repeat 4 times the "repair" sync methode on the better HDD (wich
have only 123 bads) to get readable again.
The another hdd have >2500 bads wich looks like have no chance to fix this
way.
>
> Your new patch skips the rewriting and keeps the unreadable sectors,
> right? So that the drive isn't dropped on rewrite...
>
>> The following patch should address this issue for you.
>> It is*not* a general-purpose fix, but a specific fix
> [CUT]
Neil, i think this patch should be in the sysfs or in the proc to be
inactive by default, and of course will be good for recover bad cases like
mine.
There is a lot of hdd problems wich can make really uncorrectable sectors
wich can't be good again even on rewrite....
Thanks a lot for all who helped me to solve this....
And MRK, please don't forget to write in my name. :-)
Cheers,
Janos
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Suggestion needed for fixing RAID6 [SOLVED]
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
@ 2010-05-05 19:27 ` MRK
0 siblings, 0 replies; 48+ messages in thread
From: MRK @ 2010-05-05 19:27 UTC (permalink / raw)
To: Janos Haar; +Cc: linux-raid, Neil Brown
On 05/05/2010 05:24 PM, Janos Haar wrote:
>> I think without DM COW it should probably work in his case.
>
> First sorry for delay.
> Without DM, the original behavior-fix patch worked very well.
Great!
Ok I have just resubmitted the patch (v2) which includes a "Tested-by:
Janos Haar <janos.haar@netcenter.hu>" line and a few fixes on the
description.
> [CUT]
> Thanks a lot for all who helped me to solve this....
>
> And MRK, please don't forget to write in my name. :-)
I did it. Now it's in Neil's hands, hopefully he acks it and pushes it
to mainline.
Thanks everybody,
GAT
^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2010-05-05 19:27 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12 ` Janos Haar
2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-22 20:48 ` Janos Haar
2010-04-23 6:51 ` Luca Berra
2010-04-23 8:47 ` Janos Haar
2010-04-23 12:34 ` MRK
2010-04-24 19:36 ` Janos Haar
2010-04-24 22:47 ` MRK
2010-04-25 10:00 ` Janos Haar
2010-04-26 10:24 ` MRK
2010-04-26 12:52 ` Janos Haar
2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
2010-04-26 23:06 ` Michael Evans
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
2010-04-27 0:04 ` Michael Evans
2010-04-27 15:50 ` Janos Haar
2010-04-27 23:02 ` MRK
2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 2:12 ` Neil Brown
2010-04-28 2:30 ` Mikael Abrahamsson
2010-05-03 2:29 ` Neil Brown
2010-04-28 12:57 ` MRK
2010-04-28 13:32 ` Janos Haar
2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
2010-04-29 15:22 ` MRK
2010-04-29 21:07 ` Janos Haar
2010-04-29 23:00 ` MRK
2010-04-30 6:17 ` Janos Haar
2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
2010-05-01 9:37 ` Janos Haar
2010-05-01 17:17 ` MRK
2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
2010-05-03 21:04 ` Neil Brown
2010-05-03 21:02 ` Neil Brown
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20 ` Janos Haar
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
2010-05-05 19:27 ` MRK
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.