All of lore.kernel.org
 help / color / mirror / Atom feed
* feature re-quest for "re-write"
@ 2014-02-21 18:09 Mikael Abrahamsson
  2014-02-24  1:30 ` Brad Campbell
  2014-02-24  2:24 ` Brad Campbell
  0 siblings, 2 replies; 22+ messages in thread
From: Mikael Abrahamsson @ 2014-02-21 18:09 UTC (permalink / raw)
  To: linux-raid


Hi,

we have "check", "repair", "replacement" and other operations on raid 
volumes.

I am not a programmer, but I was wondering how much work it would require 
to take current code and implement "rewrite", basically re-writing every 
block in the md raid level. Since "repair" and "check" doesn't seem to 
properly detect a few errors, wouldn't it make sense to try least 
existance / easiest implementation route to just re-write all data on the 
entire array? If reads fail, re-calculate from parity, if reads work, just 
write again.

The goal of this new mode would be to eradicate pending sectors by 
re-writing everything on the drive.

If this doesn't seem like a sensible approach, what would be a sensible 
approach to avoid having pending sectors keep being "pending" even after 
"check" and "repair"?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-21 18:09 feature re-quest for "re-write" Mikael Abrahamsson
@ 2014-02-24  1:30 ` Brad Campbell
  2014-02-24  1:46   ` Eyal Lebedinsky
  2014-02-24  2:42   ` Mikael Abrahamsson
  2014-02-24  2:24 ` Brad Campbell
  1 sibling, 2 replies; 22+ messages in thread
From: Brad Campbell @ 2014-02-24  1:30 UTC (permalink / raw)
  To: Mikael Abrahamsson, linux-raid

On 22/02/14 02:09, Mikael Abrahamsson wrote:
>

> If this doesn't seem like a sensible approach, what would be a sensible
> approach to avoid having pending sectors keep being "pending" even after
> "check" and "repair"?
>

The only reason I've ever seen this personally was when the pending 
sectors were on non-data parts of the drive, like some of the space 
around the superblock. Have you verified that these issues are really on 
sectors in the data area? SMART should tell you the LBA of the first 
error in a read test.

Brad

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  1:30 ` Brad Campbell
@ 2014-02-24  1:46   ` Eyal Lebedinsky
  2014-02-24  2:11     ` Brad Campbell
  2014-02-24  2:42   ` Mikael Abrahamsson
  1 sibling, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-24  1:46 UTC (permalink / raw)
  To: list linux-raid

In my case (see earlier thread "raid check does not..." the pending sector is early
in the device, in sector 261696 of a 4TB component (whole space in one partition of
each component). So yes, inside the data area.

I still have it reported in my daily logwatch, any idea what to try?

   Currently unreadable (pending) sectors detected:
	/dev/sdi [SAT] - 48 Time(s)
	1 unreadable sectors detected

Eyal

On 02/24/14 12:30, Brad Campbell wrote:
> On 22/02/14 02:09, Mikael Abrahamsson wrote:
>>
>
>> If this doesn't seem like a sensible approach, what would be a sensible
>> approach to avoid having pending sectors keep being "pending" even after
>> "check" and "repair"?
>>
>
> The only reason I've ever seen this personally was when the pending sectors were on non-data parts of the drive, like some of the space around the superblock. Have you verified that these issues are really on sectors in the data area? SMART should tell you the LBA of the first error in a read test.
>
> Brad

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  1:46   ` Eyal Lebedinsky
@ 2014-02-24  2:11     ` Brad Campbell
  2014-02-24  3:40       ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: Brad Campbell @ 2014-02-24  2:11 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid

On 24/02/14 09:46, Eyal Lebedinsky wrote:
> In my case (see earlier thread "raid check does not..." the pending
> sector is early
> in the device, in sector 261696 of a 4TB component (whole space in one
> partition of
> each component). So yes, inside the data area.
>
> I still have it reported in my daily logwatch, any idea what to try?
>

Yes, can you run a dd of the md device from well before to well after 
the theoretical position of the error?

If the dd passes cleanly, it indicates the bad block is a parity block 
rather than a data block. That hopefully will help narrow down the scope 
of the search.

Brad


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-21 18:09 feature re-quest for "re-write" Mikael Abrahamsson
  2014-02-24  1:30 ` Brad Campbell
@ 2014-02-24  2:24 ` Brad Campbell
  2014-02-25  2:10   ` NeilBrown
  1 sibling, 1 reply; 22+ messages in thread
From: Brad Campbell @ 2014-02-24  2:24 UTC (permalink / raw)
  To: Mikael Abrahamsson, linux-raid

On 22/02/14 02:09, Mikael Abrahamsson wrote:
>
> Hi,
>
> we have "check", "repair", "replacement" and other operations on raid
> volumes.
>
> I am not a programmer, but I was wondering how much work it would
> require to take current code and implement "rewrite", basically
> re-writing every block in the md raid level. Since "repair" and "check"
> doesn't seem to properly detect a few errors, wouldn't it make sense to
> try least existance / easiest implementation route to just re-write all
> data on the entire array? If reads fail, re-calculate from parity, if
> reads work, just write again.

Now, this is after 3 minutes of looking at raid5.c, so if I've missed 
something obvious please feel free to yell at me. I'm not much of a 
programmer. Having said that -

Can someone check my understanding of this bit of code?

static void handle_parity_checks6(struct r5conf *conf, struct 
stripe_head *sh,
                                   struct stripe_head_state *s,
                                   int disks)
<....>

         switch (sh->check_state) {
         case check_state_idle:
                 /* start a new check operation if there are < 2 failures */
                 if (s->failed == s->q_failed) {
                         /* The only possible failed device holds Q, so it
                          * makes sense to check P (If anything else 
were failed,
                          * we would have used P to recreate it).
                          */
                         sh->check_state = check_state_run;
                 }
                 if (!s->q_failed && s->failed < 2) {
                         /* Q is not failed, and we didn't use it to 
generate
                          * anything, so it makes sense to check it
                          */
                         if (sh->check_state == check_state_run)
                                 sh->check_state = check_state_run_pq;
                         else
                                 sh->check_state = check_state_run_q;
                 }


So we get passed a stripe. If it's not being checked we :

- If Q has failed we initiate check_state_run (which checks only P)

- If we have less than 2 failed drives (lets say we have none), if we 
are already checking P (check_state_run) we upgrade that to 
check_state_run_pq (and therefore check both).

However

- If we were check_state_idle, beacuse we had 0 failed drives, then we 
only mark check_state_run_q and therefore skip checking P ??

Regards,
Brad

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  1:30 ` Brad Campbell
  2014-02-24  1:46   ` Eyal Lebedinsky
@ 2014-02-24  2:42   ` Mikael Abrahamsson
  1 sibling, 0 replies; 22+ messages in thread
From: Mikael Abrahamsson @ 2014-02-24  2:42 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On Mon, 24 Feb 2014, Brad Campbell wrote:

> The only reason I've ever seen this personally was when the pending 
> sectors were on non-data parts of the drive, like some of the space 
> around the superblock. Have you verified that these issues are really on 
> sectors in the data area? SMART should tell you the LBA of the first 
> error in a read test.

I even received UNC errors in the log when doing "repair" but the sector 
still wasn't re-written. So yes, they were on data part.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  2:11     ` Brad Campbell
@ 2014-02-24  3:40       ` Eyal Lebedinsky
  2014-02-24 14:14         ` Wilson Jonathan
  0 siblings, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-24  3:40 UTC (permalink / raw)
  To: list linux-raid

I know that the i/o error is in /dev/sdi sector 261696 (consistent kernel and smart reports)
- /dev/sdi1 starts 2048 sectors later
- /dev/md127 is a 7 devs raid6 so there is 5 times as much data in the array until we hit
   the bad sector

# dd if=/dev/sdi1 of=/dev/null skip=$((1*(261696-2048))) count=1
dd: error reading '/dev/sdi1': Input/output error
0+0 records in
0+0 records out

The error is in one sector but 8 sectors will be read (a 4k buffer) so to get a clean read:

# dd if=/dev/sdi1 of=/dev/null skip=$((1*(261696-2048)+8)) count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 5.5852e-05 s, 9.2 MB/s

# echo 3 >'/proc/sys/vm/drop_caches'
# dd if=/dev/md127 of=/dev/null skip=$((5*(261696-2048))) count=5
5+0 records in
5+0 records out
2560 bytes (2.6 kB) copied, 0.00380436 s, 673 kB/s

Now reading *much* more than necessary (first 10GB of the array):

# echo 3 >'/proc/sys/vm/drop_caches'
# dd if=/dev/md127 of=/dev/null count=$((20*1024*1024))
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 13.5717 s, 791 MB/s

Note that I do not expect to get an error because reading the array will not read the P/Q checksums
(it assumes good data and avoids the calculations overhead of verifying P/Q).

BTW, due to the use of a buffer layer I could have done the whole test using 4k blocks rather than
sectors, but it makes no difference in this case.

Eyal

On 02/24/14 13:11, Brad Campbell wrote:
> On 24/02/14 09:46, Eyal Lebedinsky wrote:
>> In my case (see earlier thread "raid check does not..." the pending
>> sector is early
>> in the device, in sector 261696 of a 4TB component (whole space in one
>> partition of
>> each component). So yes, inside the data area.
>>
>> I still have it reported in my daily logwatch, any idea what to try?
>>
>
> Yes, can you run a dd of the md device from well before to well after the theoretical position of the error?
>
> If the dd passes cleanly, it indicates the bad block is a parity block rather than a data block. That hopefully will help narrow down the scope of the search.
>
> Brad

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  3:40       ` Eyal Lebedinsky
@ 2014-02-24 14:14         ` Wilson Jonathan
  2014-02-24 20:39           ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: Wilson Jonathan @ 2014-02-24 14:14 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid


> # echo 3 >'/proc/sys/vm/drop_caches'
> # dd if=/dev/md127 of=/dev/null count=$((20*1024*1024))
> 20971520+0 records in
> 20971520+0 records out
> 10737418240 bytes (11 GB) copied, 13.5717 s, 791 MB/s
> 
> Note that I do not expect to get an error because reading the array will not read the P/Q checksums
> (it assumes good data and avoids the calculations overhead of verifying P/Q).
> 
> BTW, due to the use of a buffer layer I could have done the whole test using 4k blocks rather than
> sectors, but it makes no difference in this case.
> 
> Eyal
> 

I wonder, could you not use dd to perform a "refresh"?

dd if=/dev/md127 of=/dev/md127 count=... bs=....

As that would force a re-calc and write of P & Q and data.

That said, its just a "could you/would it" suggestion not a "do it" due
to the inherent dangers of dd.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24 14:14         ` Wilson Jonathan
@ 2014-02-24 20:39           ` Eyal Lebedinsky
  2014-02-25  3:16             ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-24 20:39 UTC (permalink / raw)
  To: list linux-raid

My main interest is to understand why 'check' does not actually check.
I already know how to fix the problem, by writing to the location I
can force the pending reallocation to happen, but then I will not have
the test case anymore.

The OP asks for a specific solution, but I think that the 'check' action
should already correctly rewrite failed (i/o error) sectors. It does not
always know which sector to rewrite when it finds a raid6 mismatch
without an i/o error (with raid5 it never knows).

Eyal

On 02/25/14 01:14, Wilson Jonathan wrote:
>
>> # echo 3 >'/proc/sys/vm/drop_caches'
>> # dd if=/dev/md127 of=/dev/null count=$((20*1024*1024))
>> 20971520+0 records in
>> 20971520+0 records out
>> 10737418240 bytes (11 GB) copied, 13.5717 s, 791 MB/s
>>
>> Note that I do not expect to get an error because reading the array will not read the P/Q checksums
>> (it assumes good data and avoids the calculations overhead of verifying P/Q).
>>
>> BTW, due to the use of a buffer layer I could have done the whole test using 4k blocks rather than
>> sectors, but it makes no difference in this case.
>>
>> Eyal
>>
>
> I wonder, could you not use dd to perform a "refresh"?
>
> dd if=/dev/md127 of=/dev/md127 count=... bs=....
>
> As that would force a re-calc and write of P & Q and data.
>
> That said, its just a "could you/would it" suggestion not a "do it" due
> to the inherent dangers of dd.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24  2:24 ` Brad Campbell
@ 2014-02-25  2:10   ` NeilBrown
  2014-02-25  2:26     ` Brad Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-02-25  2:10 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Mikael Abrahamsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3222 bytes --]

On Mon, 24 Feb 2014 10:24:36 +0800 Brad Campbell <lists2009@fnarfbargle.com>
wrote:

> On 22/02/14 02:09, Mikael Abrahamsson wrote:
> >
> > Hi,
> >
> > we have "check", "repair", "replacement" and other operations on raid
> > volumes.
> >
> > I am not a programmer, but I was wondering how much work it would
> > require to take current code and implement "rewrite", basically
> > re-writing every block in the md raid level. Since "repair" and "check"
> > doesn't seem to properly detect a few errors, wouldn't it make sense to
> > try least existance / easiest implementation route to just re-write all
> > data on the entire array? If reads fail, re-calculate from parity, if
> > reads work, just write again.
> 
> Now, this is after 3 minutes of looking at raid5.c, so if I've missed 
> something obvious please feel free to yell at me. I'm not much of a 
> programmer. Having said that -
> 
> Can someone check my understanding of this bit of code?
> 
> static void handle_parity_checks6(struct r5conf *conf, struct 
> stripe_head *sh,
>                                    struct stripe_head_state *s,
>                                    int disks)
> <....>
> 
>          switch (sh->check_state) {
>          case check_state_idle:
>                  /* start a new check operation if there are < 2 failures */
>                  if (s->failed == s->q_failed) {
>                          /* The only possible failed device holds Q, so it
>                           * makes sense to check P (If anything else 
> were failed,
>                           * we would have used P to recreate it).
>                           */
>                          sh->check_state = check_state_run;
>                  }
>                  if (!s->q_failed && s->failed < 2) {
>                          /* Q is not failed, and we didn't use it to 
> generate
>                           * anything, so it makes sense to check it
>                           */
>                          if (sh->check_state == check_state_run)
>                                  sh->check_state = check_state_run_pq;
>                          else
>                                  sh->check_state = check_state_run_q;
>                  }
> 
>
> So we get passed a stripe. If it's not being checked we :
> 
> - If Q has failed we initiate check_state_run (which checks only P)
> 
> - If we have less than 2 failed drives (lets say we have none), if we 
> are already checking P (check_state_run) we upgrade that to 
> check_state_run_pq (and therefore check both).
> 
> However
> 
> - If we were check_state_idle, beacuse we had 0 failed drives, then we 
> only mark check_state_run_q and therefore skip checking P ??

This code is obviously too subtle.

If 0 drives have failed, then 's->failed' is 0 (it is the count of failed
drives), and  's->q_failed' is also 0 (it is a boolean flag, and q clearly
hasn't failed as nothing has).
So the first 'if' branch will be followed (as "0 == 0") and check_state set to
check_state_run.
Then as q_failed is still 0 and failed < 2, check_state gets set to
check_state_run_pq.

So it does check both p and q.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  2:10   ` NeilBrown
@ 2014-02-25  2:26     ` Brad Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Brad Campbell @ 2014-02-25  2:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, linux-raid

On 25/02/14 10:10, NeilBrown wrote:

> This code is obviously too subtle.

Not at all, it's my understanding that is under-developed. I was just 
looking for something obvious to explain the behaviour others have been 
reporting where a check won't trigger a re-write of a pending sector if 
the sector is a p or q rather than data.

> If 0 drives have failed, then 's->failed' is 0 (it is the count of failed
> drives), and  's->q_failed' is also 0 (it is a boolean flag, and q clearly
> hasn't failed as nothing has).
> So the first 'if' branch will be followed (as "0 == 0") and check_state set to
> check_state_run.
> Then as q_failed is still 0 and failed < 2, check_state gets set to
> check_state_run_pq.
>

Got it, thanks for taking the time to set me straight.

Regards,
Brad

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-24 20:39           ` Eyal Lebedinsky
@ 2014-02-25  3:16             ` NeilBrown
  2014-02-25  5:58               ` Eyal Lebedinsky
  2014-02-25  7:58               ` Eyal Lebedinsky
  0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2014-02-25  3:16 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid, Mikael Abrahamsson

[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]

On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
wrote:

> My main interest is to understand why 'check' does not actually check.
> I already know how to fix the problem, by writing to the location I
> can force the pending reallocation to happen, but then I will not have
> the test case anymore.
> 
> The OP asks for a specific solution, but I think that the 'check' action
> should already correctly rewrite failed (i/o error) sectors. It does not
> always know which sector to rewrite when it finds a raid6 mismatch
> without an i/o error (with raid5 it never knows).
> 

I cannot reproduce the problem.  In my testing a read error is fixed by
'check'.  For you it clearly isn't.  I wonder what is different.

During normal 'check' or 'repair' etc the read requests are allowed to be
combined by the io scheduler so when we get a read error, it could be one
error for a megabyte of more of the address space.
So the first thing raid5.c does is arrange to read all the blocks again but
to prohibit the merging of requests.  This time any read error will be for a
single 4K block.

Once we have that reliable read error the data is constructed from the other
blocks and the new block is written out.

This suggests that when there is a read error you should see e.g.

[  714.808494] end_request: I/O error, dev sds, sector 8141872

then shortly after that another similar error, possibly with a slightly
different sector number (at most a few thousand sectors later).

Then something like

md/raid:md0: read error corrected (8 sectors at 8141872 on sds)


However in the log Mikael Abrahamsson posted on 16 Jan 2014 
(Subject: Re: read errors not corrected when doing check on RAID6)

we only see that first 'end_request' message.  No second one and no "read
error corrected".

This seems to suggest that the second read succeeded, which is odd (to say
the least).

In your log posted 21 Feb 2014
(Subject: raid 'check' does not provoke expected i/o error)
there aren't even any read errors during 'check'.
The drive sometimes reports a read error and something doesn't?
Does reading the drive with 'dd' already report an error, and with 'check'
never report an error?



So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
the drive is getting confused.
Are all the people who report this using the same sort of drive??

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  3:16             ` NeilBrown
@ 2014-02-25  5:58               ` Eyal Lebedinsky
  2014-02-25  7:05                 ` Stan Hoeppner
  2014-02-25  7:58               ` Eyal Lebedinsky
  1 sibling, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25  5:58 UTC (permalink / raw)
  To: list linux-raid

My case is consistent.

Reading /dev/sdi1 provokes the end_request error. This is 100% reproducible.
Reading /dev/md127 runs clean (no error message).
Doing a 'check' completes clean (no error message).(*)
smartctl shows one pending sector.

Some details listed below.

Eyal

(*) I run a check action by setting sync_min/sync_max/sync_action
to cover the bad sector. However, just to be sure, I allowed an overnight
full check which also ran clean. The bad sector is still pending.

This is how I run the short tests:

# parted -l
Model: ATA WDC WD4001FAEX-0 (scsi)
Disk /dev/sdi: 4001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
  1      1049kB  4001GB  4001GB

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdf1[7] sdd1[1] sdc1[0] sdg1[4] sdh1[5] sde1[2] sdi1[6]
       19534425600 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]
       bitmap: 0/30 pages [0KB], 65536KB chunk

# cat /sys/block/md127/md/chunk_size
524288

# sys="/sys/block/md127/md"
# echo       '0' >$sys/sync_min	# check first
# echo '1000384' >$sys/sync_max	#   1m sectors, 0.5GB
# echo 'check'   >$sys/sync_action

Examining /proc/mdstat every second:
16:30:46       [>....................]  check =  0.0% (131360/3906885120) finish=495.6min speed=131360K/sec
16:30:47       [>....................]  check =  0.0% (273696/3906885120) finish=237.8min speed=273696K/sec
16:30:48       [>....................]  check =  0.0% (416032/3906885120) finish=312.9min speed=208016K/sec
16:30:49       [>....................]  check =  0.0% (561952/3906885120) finish=347.5min speed=187317K/sec
16:30:50       [>....................]  check =  0.0% (707872/3906885120) finish=367.8min speed=176968K/sec
16:30:51       [>....................]  check =  0.0% (844072/3906885120) finish=385.6min speed=168814K/sec
16:30:52       [>....................]  check =  0.0% (991016/3906885120) finish=394.1min speed=165169K/sec
16:30:53       [>....................]  check =  0.0% (1136936/3906885120) finish=400.7min speed=162419K/sec
16:30:54       [>....................]  check =  0.0% (1283368/3906885120) finish=405.7min speed=160421K/sec
16:30:55       [>....................]  check =  0.0% (1427752/3906885120) finish=410.3min speed=158639K/sec
16:30:56       [>....................]  check =  0.0% (1544492/3906885120) finish=463.5min speed=140408K/sec
16:30:57       [>....................]  check =  0.0% (1726304/3906885120) finish=452.4min speed=143858K/sec
16:30:58       [>....................]  check =  0.0% (1866592/3906885120) finish=453.2min speed=143584K/sec
16:30:59       [>....................]  check =  0.0% (2012000/3906885120) finish=452.8min speed=143714K/sec
16:31:00       [>....................]  check =  0.0% (2154336/3906885120) finish=453.1min speed=143622K/sec
16:31:01       [>....................]  check =  0.0% (2226336/3906885120) finish=467.6min speed=139146K/sec
16:31:02       [>....................]  check =  0.0% (2401636/3906885120) finish=460.6min speed=141272K/sec
16:31:03       [>....................]  check =  0.0% (2549592/3906885120) finish=459.4min speed=141644K/sec
16:31:04       [>....................]  check =  0.0% (2690864/3906885120) finish=459.4min speed=141625K/sec
16:31:05       [>....................]  check =  0.0% (2834776/3906885120) finish=459.0min speed=141738K/sec
16:31:06       [>....................]  check =  0.0% (2928880/3906885120) finish=466.5min speed=139470K/sec
16:31:07       [>....................]  check =  0.0% (3029760/3906885120) finish=472.4min speed=137716K/sec
16:31:08       [>....................]  check =  0.0% (3111680/3906885120) finish=480.9min speed=135290K/sec
16:31:09       [>....................]  check =  0.0% (3258624/3906885120) finish=479.1min speed=135776K/sec
16:31:10       [>....................]  check =  0.0% (3401472/3906885120) finish=478.1min speed=136058K/sec
16:31:11       [>....................]  check =  0.0% (3544832/3906885120) finish=477.1min speed=136339K/sec
16:31:12       [>....................]  check =  0.0% (3657476/3906885120) finish=480.2min speed=135462K/sec
16:31:13       [>....................]  check =  0.0% (3797764/3906885120) finish=479.6min speed=135634K/sec
16:31:14       [>....................]  check =  0.1% (3941636/3906885120) finish=478.5min speed=135918K/sec
16:31:15       [>....................]  check =  0.1% (4076292/3906885120) finish=478.7min speed=135876K/sec
16:31:16       [>....................]  check =  0.1% (4221188/3906885120) finish=477.6min speed=136167K/sec
16:31:17       [>....................]  check =  0.1% (4325252/3906885120) finish=481.2min speed=135164K/sec
16:31:18       [>....................]  check =  0.1% (4497992/3906885120) finish=477.1min speed=136302K/sec
16:31:19       [>....................]  check =  0.1% (4644936/3906885120) finish=477.1min speed=136300K/sec
16:31:20       [>....................]  check =  0.1% (4779088/3906885120) finish=477.3min speed=136233K/sec
16:31:21       [>....................]  check =  0.1% (4914888/3906885120) finish=477.4min speed=136220K/sec
16:31:22       [>....................]  check =  0.1% (4990720/3906885120) finish=487.6min speed=133366K/sec
16:31:23       [>....................]  check =  0.1% (4999896/3906885120) finish=502.2min speed=129485K/sec

# cat /sys/block/md127/md/mismatch_cnt
0

# echo 'idle'   >$sys/sync_action

# dmesg|tail
[ 4134.750324] md: data-check of RAID array md127
[ 4134.756992] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 4134.764956] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[ 4134.776816] md: using 128k window, over a total of 3906885120k.
[ 4174.065003] md: md_do_sync() got signal ... exiting

On 02/25/14 14:16, NeilBrown wrote:
> On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
> wrote:
>
>> My main interest is to understand why 'check' does not actually check.
>> I already know how to fix the problem, by writing to the location I
>> can force the pending reallocation to happen, but then I will not have
>> the test case anymore.
>>
>> The OP asks for a specific solution, but I think that the 'check' action
>> should already correctly rewrite failed (i/o error) sectors. It does not
>> always know which sector to rewrite when it finds a raid6 mismatch
>> without an i/o error (with raid5 it never knows).
>>
>
> I cannot reproduce the problem.  In my testing a read error is fixed by
> 'check'.  For you it clearly isn't.  I wonder what is different.
>
> During normal 'check' or 'repair' etc the read requests are allowed to be
> combined by the io scheduler so when we get a read error, it could be one
> error for a megabyte of more of the address space.
> So the first thing raid5.c does is arrange to read all the blocks again but
> to prohibit the merging of requests.  This time any read error will be for a
> single 4K block.
>
> Once we have that reliable read error the data is constructed from the other
> blocks and the new block is written out.
>
> This suggests that when there is a read error you should see e.g.
>
> [  714.808494] end_request: I/O error, dev sds, sector 8141872
>
> then shortly after that another similar error, possibly with a slightly
> different sector number (at most a few thousand sectors later).
>
> Then something like
>
> md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
>
>
> However in the log Mikael Abrahamsson posted on 16 Jan 2014
> (Subject: Re: read errors not corrected when doing check on RAID6)
>
> we only see that first 'end_request' message.  No second one and no "read
> error corrected".
>
> This seems to suggest that the second read succeeded, which is odd (to say
> the least).
>
> In your log posted 21 Feb 2014
> (Subject: raid 'check' does not provoke expected i/o error)
> there aren't even any read errors during 'check'.
> The drive sometimes reports a read error and something doesn't?
> Does reading the drive with 'dd' already report an error, and with 'check'
> never report an error?
>
>
>
> So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
> the drive is getting confused.
> Are all the people who report this using the same sort of drive??
>
> NeilBrown
>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  5:58               ` Eyal Lebedinsky
@ 2014-02-25  7:05                 ` Stan Hoeppner
  2014-02-25  7:45                   ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: Stan Hoeppner @ 2014-02-25  7:05 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid, NeilBrown

On 2/24/2014 11:58 PM, Eyal Lebedinsky wrote:
...
> (*) I run a check action by setting sync_min/sync_max/sync_action
> to cover the bad sector. However, just to be sure, I allowed an overnight
> full check which also ran clean. The bad sector is still pending.

What is the expected behavior when the drive's spare sector pool has
been exhausted, and thus the sector cannot be remapped by the drive
firmware?

Unless md now keeps a spare sector pool of its own and remaps bad
sectors, the only way to fix this situation is to replace the drive.
And if indeed drive spare pool exhaustion is the cause of the sector not
being remapped, the drive needs to be replaced.

-- 
Stan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  7:05                 ` Stan Hoeppner
@ 2014-02-25  7:45                   ` Eyal Lebedinsky
  0 siblings, 0 replies; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25  7:45 UTC (permalink / raw)
  To: list linux-raid

The disk (actually the whole array) is relatively new and so far has no reallocated sectors.

Eyal

On 02/25/14 18:05, Stan Hoeppner wrote:
> On 2/24/2014 11:58 PM, Eyal Lebedinsky wrote:
> ...
>> (*) I run a check action by setting sync_min/sync_max/sync_action
>> to cover the bad sector. However, just to be sure, I allowed an overnight
>> full check which also ran clean. The bad sector is still pending.
>
> What is the expected behavior when the drive's spare sector pool has
> been exhausted, and thus the sector cannot be remapped by the drive
> firmware?
>
> Unless md now keeps a spare sector pool of its own and remaps bad
> sectors, the only way to fix this situation is to replace the drive.
> And if indeed drive spare pool exhaustion is the cause of the sector not
> being remapped, the drive needs to be replaced.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  3:16             ` NeilBrown
  2014-02-25  5:58               ` Eyal Lebedinsky
@ 2014-02-25  7:58               ` Eyal Lebedinsky
  2014-02-25  8:35                 ` NeilBrown
  1 sibling, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25  7:58 UTC (permalink / raw)
  Cc: list linux-raid

BTW, Is there a monitoring tool to trace all i/o to a device? I could then
log activity to /dev/sd[c-i]1 during a (short) 'check' and see if all sectors
are really read. Or does md have a debug facility for this?

Eyal

On 02/25/14 14:16, NeilBrown wrote:
> On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
> wrote:
>
>> My main interest is to understand why 'check' does not actually check.
>> I already know how to fix the problem, by writing to the location I
>> can force the pending reallocation to happen, but then I will not have
>> the test case anymore.
>>
>> The OP asks for a specific solution, but I think that the 'check' action
>> should already correctly rewrite failed (i/o error) sectors. It does not
>> always know which sector to rewrite when it finds a raid6 mismatch
>> without an i/o error (with raid5 it never knows).
>>
>
> I cannot reproduce the problem.  In my testing a read error is fixed by
> 'check'.  For you it clearly isn't.  I wonder what is different.
>
> During normal 'check' or 'repair' etc the read requests are allowed to be
> combined by the io scheduler so when we get a read error, it could be one
> error for a megabyte of more of the address space.
> So the first thing raid5.c does is arrange to read all the blocks again but
> to prohibit the merging of requests.  This time any read error will be for a
> single 4K block.
>
> Once we have that reliable read error the data is constructed from the other
> blocks and the new block is written out.
>
> This suggests that when there is a read error you should see e.g.
>
> [  714.808494] end_request: I/O error, dev sds, sector 8141872
>
> then shortly after that another similar error, possibly with a slightly
> different sector number (at most a few thousand sectors later).
>
> Then something like
>
> md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
>
>
> However in the log Mikael Abrahamsson posted on 16 Jan 2014
> (Subject: Re: read errors not corrected when doing check on RAID6)
>
> we only see that first 'end_request' message.  No second one and no "read
> error corrected".
>
> This seems to suggest that the second read succeeded, which is odd (to say
> the least).
>
> In your log posted 21 Feb 2014
> (Subject: raid 'check' does not provoke expected i/o error)
> there aren't even any read errors during 'check'.
> The drive sometimes reports a read error and something doesn't?
> Does reading the drive with 'dd' already report an error, and with 'check'
> never report an error?
>
>
>
> So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
> the drive is getting confused.
> Are all the people who report this using the same sort of drive??
>
> NeilBrown
>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  7:58               ` Eyal Lebedinsky
@ 2014-02-25  8:35                 ` NeilBrown
  2014-02-25 11:08                   ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-02-25  8:35 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

[-- Attachment #1: Type: text/plain, Size: 3332 bytes --]

On Tue, 25 Feb 2014 18:58:16 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
wrote:

> BTW, Is there a monitoring tool to trace all i/o to a device? I could then
> log activity to /dev/sd[c-i]1 during a (short) 'check' and see if all sectors
> are really read. Or does md have a debug facility for this?

blktrace will collect a trace, blkparse will print it out for you.
You need to trace the 'whole' device.

So something like

  blktrace /dev/sd[c-i]
  # run the test
  ctrl-C
  blkparse sd[c-i]*

blktrace creates several files, I think one for each device on each CPU.


NeilBrown

> 
> Eyal
> 
> On 02/25/14 14:16, NeilBrown wrote:
> > On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
> > wrote:
> >
> >> My main interest is to understand why 'check' does not actually check.
> >> I already know how to fix the problem, by writing to the location I
> >> can force the pending reallocation to happen, but then I will not have
> >> the test case anymore.
> >>
> >> The OP asks for a specific solution, but I think that the 'check' action
> >> should already correctly rewrite failed (i/o error) sectors. It does not
> >> always know which sector to rewrite when it finds a raid6 mismatch
> >> without an i/o error (with raid5 it never knows).
> >>
> >
> > I cannot reproduce the problem.  In my testing a read error is fixed by
> > 'check'.  For you it clearly isn't.  I wonder what is different.
> >
> > During normal 'check' or 'repair' etc the read requests are allowed to be
> > combined by the io scheduler so when we get a read error, it could be one
> > error for a megabyte of more of the address space.
> > So the first thing raid5.c does is arrange to read all the blocks again but
> > to prohibit the merging of requests.  This time any read error will be for a
> > single 4K block.
> >
> > Once we have that reliable read error the data is constructed from the other
> > blocks and the new block is written out.
> >
> > This suggests that when there is a read error you should see e.g.
> >
> > [  714.808494] end_request: I/O error, dev sds, sector 8141872
> >
> > then shortly after that another similar error, possibly with a slightly
> > different sector number (at most a few thousand sectors later).
> >
> > Then something like
> >
> > md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
> >
> >
> > However in the log Mikael Abrahamsson posted on 16 Jan 2014
> > (Subject: Re: read errors not corrected when doing check on RAID6)
> >
> > we only see that first 'end_request' message.  No second one and no "read
> > error corrected".
> >
> > This seems to suggest that the second read succeeded, which is odd (to say
> > the least).
> >
> > In your log posted 21 Feb 2014
> > (Subject: raid 'check' does not provoke expected i/o error)
> > there aren't even any read errors during 'check'.
> > The drive sometimes reports a read error and something doesn't?
> > Does reading the drive with 'dd' already report an error, and with 'check'
> > never report an error?
> >
> >
> >
> > So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
> > the drive is getting confused.
> > Are all the people who report this using the same sort of drive??
> >
> > NeilBrown
> >
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25  8:35                 ` NeilBrown
@ 2014-02-25 11:08                   ` Eyal Lebedinsky
  2014-02-25 11:28                     ` Mikael Abrahamsson
  0 siblings, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25 11:08 UTC (permalink / raw)
  Cc: list linux-raid

This is helpful Neil.

I am running blktrace/blkparse and trying to understand what it is telling
me.

If I got it right then I see that doing a check of md127 (from the start)
starts reading with this entry

8,129  6      327     0.992307218 20259  D   R 264200 + 504 [md127_resync]

which means that the real data starts rather further into the stripes.
Actually, further than the bad block: sector 259648 of sdi1 is before the
first read operation. Though I am not even sure that the blkparse 264200
is sectors and now 1KB blocks or 4KB blocks.

Following is some speculation.

Does md127 store a header before it starts striping the data? May this
be why it rarely actually needs to read parts of this header?
(I thought that superblocks and what not are stored at the far end).

If so, then the content of this sector is not part of the redundant data and may
not be trivial to recover. Then again, I expect important data is recorded more
than once.

If this is the case then the calculation to correlate the bad sector to the fs
block (which I need to do whenever I find a bad sector in order to investigate
my data loss) is more complicated than I assumed.

Final thought: if this sector is in an important header, when it *does* need
to be read (and fail), how bad a reaction should I expect?

Eyal

On 02/25/14 19:35, NeilBrown wrote:
> On Tue, 25 Feb 2014 18:58:16 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
> wrote:
>
>> BTW, Is there a monitoring tool to trace all i/o to a device? I could then
>> log activity to /dev/sd[c-i]1 during a (short) 'check' and see if all sectors
>> are really read. Or does md have a debug facility for this?
>
> blktrace will collect a trace, blkparse will print it out for you.
> You need to trace the 'whole' device.
>
> So something like
>
>    blktrace /dev/sd[c-i]
>    # run the test
>    ctrl-C
>    blkparse sd[c-i]*
>
> blktrace creates several files, I think one for each device on each CPU.
>
>
> NeilBrown
>
>>
>> Eyal
>>
>> On 02/25/14 14:16, NeilBrown wrote:
>>> On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
>>> wrote:
>>>
>>>> My main interest is to understand why 'check' does not actually check.
>>>> I already know how to fix the problem, by writing to the location I
>>>> can force the pending reallocation to happen, but then I will not have
>>>> the test case anymore.
>>>>
>>>> The OP asks for a specific solution, but I think that the 'check' action
>>>> should already correctly rewrite failed (i/o error) sectors. It does not
>>>> always know which sector to rewrite when it finds a raid6 mismatch
>>>> without an i/o error (with raid5 it never knows).
>>>>
>>>
>>> I cannot reproduce the problem.  In my testing a read error is fixed by
>>> 'check'.  For you it clearly isn't.  I wonder what is different.
>>>
>>> During normal 'check' or 'repair' etc the read requests are allowed to be
>>> combined by the io scheduler so when we get a read error, it could be one
>>> error for a megabyte of more of the address space.
>>> So the first thing raid5.c does is arrange to read all the blocks again but
>>> to prohibit the merging of requests.  This time any read error will be for a
>>> single 4K block.
>>>
>>> Once we have that reliable read error the data is constructed from the other
>>> blocks and the new block is written out.
>>>
>>> This suggests that when there is a read error you should see e.g.
>>>
>>> [  714.808494] end_request: I/O error, dev sds, sector 8141872
>>>
>>> then shortly after that another similar error, possibly with a slightly
>>> different sector number (at most a few thousand sectors later).
>>>
>>> Then something like
>>>
>>> md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
>>>
>>>
>>> However in the log Mikael Abrahamsson posted on 16 Jan 2014
>>> (Subject: Re: read errors not corrected when doing check on RAID6)
>>>
>>> we only see that first 'end_request' message.  No second one and no "read
>>> error corrected".
>>>
>>> This seems to suggest that the second read succeeded, which is odd (to say
>>> the least).
>>>
>>> In your log posted 21 Feb 2014
>>> (Subject: raid 'check' does not provoke expected i/o error)
>>> there aren't even any read errors during 'check'.
>>> The drive sometimes reports a read error and something doesn't?
>>> Does reading the drive with 'dd' already report an error, and with 'check'
>>> never report an error?
>>>
>>>
>>>
>>> So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
>>> the drive is getting confused.
>>> Are all the people who report this using the same sort of drive??
>>>
>>> NeilBrown
>>>
>>
>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25 11:08                   ` Eyal Lebedinsky
@ 2014-02-25 11:28                     ` Mikael Abrahamsson
  2014-02-25 12:05                       ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2014-02-25 11:28 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

On Tue, 25 Feb 2014, Eyal Lebedinsky wrote:

> Final thought: if this sector is in an important header, when it *does* 
> need to be read (and fail), how bad a reaction should I expect?

I have two thoughts here:

Check data offset when doing mdadm -E. There you will see how much unused 
data is allocated between the superblock and start of the actual array 
data contents. This might be where your pending block is.

Regarding re-write. I have had happen to me that one drive that had bad 
blocks that "check" didn't find errors on, when I rebooted that drive had 
read errors on the superblock, was not assembled into the array, and 
instead md started rebuilding to a spare since the array was degraded. So 
my wonder is, when issuing "check" or "repair", does md actually check if 
the superblocks are readable? If not, perhaps it should? Should it check 
the contents of the superblocks are consistent with the data that the 
kernel has in its data structures?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25 11:28                     ` Mikael Abrahamsson
@ 2014-02-25 12:05                       ` Eyal Lebedinsky
  2014-02-25 12:17                         ` Mikael Abrahamsson
  0 siblings, 1 reply; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25 12:05 UTC (permalink / raw)
  Cc: list linux-raid

Yes, this matches what I saw. From "mdadm -E /dev/sdi1":

  Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
      Array Size : 19534425600 (18629.48 GiB 20003.25 GB)
   Used Dev Size : 7813770240 (3725.90 GiB 4000.65 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors

Is the layout of this 128KB area documented? I expect basic superblock data, log, bitmap,
what else? Are there copies elsewhere on the disk that can be used? I wonder what the
bad sector 259648 (close to the end of the header) covers.

Can it be rebuilt from the other members of the array (when stopped one can expect the
log and bitmap to be clearable)? Don't know.

Maybe build a new array with the --assume-clean option that will rewrite the header but
leave the data alone? Doco says "not  recommended".

Or just give up: fail and remove the disk, clear the superblock then add it and go through
a full resync. This way feels safer as I do not touch the other members.

cheers
	Eyal

On 02/25/14 22:28, Mikael Abrahamsson wrote:
> On Tue, 25 Feb 2014, Eyal Lebedinsky wrote:
>
>> Final thought: if this sector is in an important header, when it *does* need to be read (and fail), how bad a reaction should I expect?
>
> I have two thoughts here:
>
> Check data offset when doing mdadm -E. There you will see how much unused data is allocated between the superblock and start of the actual array data contents. This might be where your pending block is.
>
> Regarding re-write. I have had happen to me that one drive that had bad blocks that "check" didn't find errors on, when I rebooted that drive had read errors on the superblock, was not assembled into the array, and instead md started rebuilding to a spare since the array was degraded. So my wonder is, when issuing "check" or "repair", does md actually check if the superblocks are readable? If not, perhaps it should? Should it check the contents of the superblocks are consistent with the data that the kernel has in its data structures?

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25 12:05                       ` Eyal Lebedinsky
@ 2014-02-25 12:17                         ` Mikael Abrahamsson
  2014-02-25 12:32                           ` Eyal Lebedinsky
  0 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2014-02-25 12:17 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

On Tue, 25 Feb 2014, Eyal Lebedinsky wrote:

> Or just give up: fail and remove the disk, clear the superblock then add 
> it and go through a full resync. This way feels safer as I do not touch 
> the other members.

I am not sure this will work either. I would expect that the empty data in 
"data offset" is never touched even when doing rebuild. Perhaps also 
something that should be done?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: feature re-quest for "re-write"
  2014-02-25 12:17                         ` Mikael Abrahamsson
@ 2014-02-25 12:32                           ` Eyal Lebedinsky
  0 siblings, 0 replies; 22+ messages in thread
From: Eyal Lebedinsky @ 2014-02-25 12:32 UTC (permalink / raw)
  Cc: list linux-raid

I expect that zeroing (and then recreating the header) will surely write the whole area.
Or I can play really safe and write to the bad sector myself to force the reallocation
while the disk is out.

Eyal


On 02/25/14 23:17, Mikael Abrahamsson wrote:
> On Tue, 25 Feb 2014, Eyal Lebedinsky wrote:
>
>> Or just give up: fail and remove the disk, clear the superblock then add it and go through a full resync. This way feels safer as I do not touch the other members.
>
> I am not sure this will work either. I would expect that the empty data in "data offset" is never touched even when doing rebuild. Perhaps also something that should be done?
>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-02-25 12:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-21 18:09 feature re-quest for "re-write" Mikael Abrahamsson
2014-02-24  1:30 ` Brad Campbell
2014-02-24  1:46   ` Eyal Lebedinsky
2014-02-24  2:11     ` Brad Campbell
2014-02-24  3:40       ` Eyal Lebedinsky
2014-02-24 14:14         ` Wilson Jonathan
2014-02-24 20:39           ` Eyal Lebedinsky
2014-02-25  3:16             ` NeilBrown
2014-02-25  5:58               ` Eyal Lebedinsky
2014-02-25  7:05                 ` Stan Hoeppner
2014-02-25  7:45                   ` Eyal Lebedinsky
2014-02-25  7:58               ` Eyal Lebedinsky
2014-02-25  8:35                 ` NeilBrown
2014-02-25 11:08                   ` Eyal Lebedinsky
2014-02-25 11:28                     ` Mikael Abrahamsson
2014-02-25 12:05                       ` Eyal Lebedinsky
2014-02-25 12:17                         ` Mikael Abrahamsson
2014-02-25 12:32                           ` Eyal Lebedinsky
2014-02-24  2:42   ` Mikael Abrahamsson
2014-02-24  2:24 ` Brad Campbell
2014-02-25  2:10   ` NeilBrown
2014-02-25  2:26     ` Brad Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.