9 second recovery when re-adding a drive that got kicked out?

All of lore.kernel.org
 help / color / mirror / Atom feed

* 9 second recovery when re-adding a drive that got kicked out?
@ 2017-06-04 22:38 Marc MERLIN
  2017-06-06  2:58 ` Phil Turmel
  2017-06-06  3:57 ` NeilBrown
  0 siblings, 2 replies; 10+ messages in thread
From: Marc MERLIN @ 2017-06-04 22:38 UTC (permalink / raw)
  To: linux-raid

Howdy,

Can you confirm that I understand how the write intent bitmap works, and
that it doesn't cover the entire array, but only a part of it, and once
you overflow it, syncing reverts to syncing the entire array?

I had a raid5 array with 5 6TB drives.

/dev/sdl1 got kicked out due to a bus disk error of some kind.
The drive is fine, it was a cabling issue, so I fixed the cabling,
re-added it, and did

gargamel:~# mdadm -a /dev/md6 /dev/sdl1

Then I saw this:
[ 1001.728134] md: recovery of RAID array md6
[ 1010.975255] md: md6: recovery done.

Before the re-add:
md6 : active raid5 sdk1[5] sdb1[3] sdm1[2] sdj1[1]
      23441555456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [_UUUU]
      bitmap: 3/44 pages [12KB], 65536KB chunk

After the re-add (syncing now just to be safe):
md6 : active raid5 sdl1[0] sdj1[1] sdk1[5] sdf1[3] sdm1[2]
      23441555456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [>....................]  check =  0.8% (49258960/5860388864) finish=569.3min speed=170093K/sec
      bitmap: 0/44 pages [0KB], 65536KB chunk

https://raid.wiki.kernel.org/index.php/Mdstat
Explains a bit, I don't think it says how big a page is, but it seems to
be 4KB.

So let's say I have 64MB chuncks, each take 16 bits.
The whole array is 22,892,144MiB
That's 357,689 chunks, or about 700KB (16 bits per chunk) to keep all the
state, but there is 44 pages of 4KB, or 176KB of write intent
state.

The first bitmap line shows 3 pages totallying 12KB, so each page
contains 4KB, or 2048 chunks per page.
Did the above say that I had 6144 chunks that needed to be synced?

If so it would be 6144 * 65536KB = 393,216 MB to write
They were written in 9 seconds, so the sync happened at 43MB/s, which is
believeable.

The part I'm not too clear about is 44 pages of intent isn't enough to
cover all my data.
Is the idea that once I overflow that write intent bitmap, then it
reverts to resyncing the entire array?

I looked at https://raid.wiki.kernel.org/index.php/Write-intent_bitmap
but didn't see anything about that specific bit.

Array details if that helps:
gargamel:~# mdadm --examine /dev/sdl1
/dev/sdl1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 66bccdfb:afbf9683:fcf1f12e:f2af2dcb
           Name : gargamel.svh.merlins.org:6  (local to host gargamel.svh.merlins.org)
  Creation Time : Thu Jan 28 14:38:40 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720777728 (5588.90 GiB 6001.04 GB)
     Array Size : 23441555456 (22355.61 GiB 24004.15 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : ca4598ba:de585baa:b9935222:e06ac97d

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Jun  4 15:08:45 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d645f600 - correct
         Events : 84917

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-04 22:38 9 second recovery when re-adding a drive that got kicked out? Marc MERLIN
@ 2017-06-06  2:58 ` Phil Turmel
  2017-06-06  3:57 ` NeilBrown
  1 sibling, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2017-06-06  2:58 UTC (permalink / raw)
  To: Marc MERLIN, linux-raid

On 06/04/2017 06:38 PM, Marc MERLIN wrote:
> Howdy,
> 
> Can you confirm that I understand how the write intent bitmap works, and
> that it doesn't cover the entire array, but only a part of it, and once
> you overflow it, syncing reverts to syncing the entire array?

There's no overflow.  The size correlation between bits in the bitmap
and areas in the array is adjusted so the whole array can be
represented.  One bit can be many multiples of a page, iirc.

> I had a raid5 array with 5 6TB drives.
> 
> /dev/sdl1 got kicked out due to a bus disk error of some kind.
> The drive is fine, it was a cabling issue, so I fixed the cabling,
> re-added it, and did
> 
> gargamel:~# mdadm -a /dev/md6 /dev/sdl1
> 
> Then I saw this:
> [ 1001.728134] md: recovery of RAID array md6
> [ 1010.975255] md: md6: recovery done.

> So let's say I have 64MB chuncks, each take 16 bits.
> The whole array is 22,892,144MiB
> That's 357,689 chunks, or about 700KB (16 bits per chunk) to keep all the
> state, but there is 44 pages of 4KB, or 176KB of write intent
> state.

So each bit in the bitmap represents two 4k pages.

Nine seconds to re-add after a short disconnect is perfectly normal.
For lightly loaded arrays, it can be virtually instant.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-04 22:38 9 second recovery when re-adding a drive that got kicked out? Marc MERLIN
  2017-06-06  2:58 ` Phil Turmel
@ 2017-06-06  3:57 ` NeilBrown
  2017-06-07  3:03   ` Marc MERLIN
  2017-06-20 18:27   ` Marc MERLIN
  1 sibling, 2 replies; 10+ messages in thread
From: NeilBrown @ 2017-06-06  3:57 UTC (permalink / raw)
  To: Marc MERLIN, linux-raid

[-- Attachment #1: Type: text/plain, Size: 4975 bytes --]

On Sun, Jun 04 2017, Marc MERLIN wrote:

> Howdy,
>
> Can you confirm that I understand how the write intent bitmap works, and
> that it doesn't cover the entire array, but only a part of it, and once
> you overflow it, syncing reverts to syncing the entire array?
>
> I had a raid5 array with 5 6TB drives.
>
> /dev/sdl1 got kicked out due to a bus disk error of some kind.
> The drive is fine, it was a cabling issue, so I fixed the cabling,
> re-added it, and did
>
> gargamel:~# mdadm -a /dev/md6 /dev/sdl1
>
> Then I saw this:
> [ 1001.728134] md: recovery of RAID array md6
> [ 1010.975255] md: md6: recovery done.
>
> Before the re-add:
> md6 : active raid5 sdk1[5] sdb1[3] sdm1[2] sdj1[1]
>       23441555456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [_UUUU]
>       bitmap: 3/44 pages [12KB], 65536KB chunk
>
> After the re-add (syncing now just to be safe):
> md6 : active raid5 sdl1[0] sdj1[1] sdk1[5] sdf1[3] sdm1[2]
>       23441555456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>       [>....................]  check =  0.8% (49258960/5860388864) finish=569.3min speed=170093K/sec
>       bitmap: 0/44 pages [0KB], 65536KB chunk
>
> https://raid.wiki.kernel.org/index.php/Mdstat
> Explains a bit, I don't think it says how big a page is, but it seems to
> be 4KB.
>
> So let's say I have 64MB chuncks, each take 16 bits.
> The whole array is 22,892,144MiB
> That's 357,689 chunks, or about 700KB (16 bits per chunk) to keep all the
> state, but there is 44 pages of 4KB, or 176KB of write intent
> state.
>
> The first bitmap line shows 3 pages totallying 12KB, so each page
> contains 4KB, or 2048 chunks per page.
> Did the above say that I had 6144 chunks that needed to be synced?

No.  It said that of the 44 pages of space that might be needed to store
16-bit counters that each represent 1 bitmap-chunk, only 3 of those
pages would contain non-zero counters, so only 3 had been allocated.

There could be as few as 3 chunks that need to be recovered, or there
could be as many a 3*2048 chunks, or any number in between.

Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
would have told you how many bits were set at that time.

That "x/y pages" information never should have appeared in /proc/mdstat
- it is really just of interest to developers.  But it is there now, so
removing it is awkward.

>
> If so it would be 6144 * 65536KB = 393,216 MB to write
> They were written in 9 seconds, so the sync happened at 43MB/s, which is
> believeable.
>
> The part I'm not too clear about is 44 pages of intent isn't enough to
> cover all my data.

44 pages means 90112 16 bit counters, one for each 64M on each device.
90112 * 64M = 5632 GiB or 5905GB.
That is the size of each device.

One bit in the bitmap (one counter in the internal bitmap) corresponds
to "a set of data the might be out of sync" which, in your case, is a
64MB wide stripe across all devices.

So the numbers do add up.

NeilBrown

> Is the idea that once I overflow that write intent bitmap, then it
> reverts to resyncing the entire array?
>
> I looked at https://raid.wiki.kernel.org/index.php/Write-intent_bitmap
> but didn't see anything about that specific bit.
>
>
> Array details if that helps:
> gargamel:~# mdadm --examine /dev/sdl1
> /dev/sdl1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 66bccdfb:afbf9683:fcf1f12e:f2af2dcb
>            Name : gargamel.svh.merlins.org:6  (local to host gargamel.svh.merlins.org)
>   Creation Time : Thu Jan 28 14:38:40 2016
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 11720777728 (5588.90 GiB 6001.04 GB)
>      Array Size : 23441555456 (22355.61 GiB 24004.15 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : ca4598ba:de585baa:b9935222:e06ac97d
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Sun Jun  4 15:08:45 2017
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : d645f600 - correct
>          Events : 84917
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 0
>    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
>
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-06  3:57 ` NeilBrown
@ 2017-06-07  3:03   ` Marc MERLIN
  2017-06-20 18:27   ` Marc MERLIN
  1 sibling, 0 replies; 10+ messages in thread
From: Marc MERLIN @ 2017-06-07  3:03 UTC (permalink / raw)
  To: NeilBrown, Phil Turmel; +Cc: linux-raid

On Tue, Jun 06, 2017 at 01:57:27PM +1000, NeilBrown wrote:
> > The first bitmap line shows 3 pages totallying 12KB, so each page
> > contains 4KB, or 2048 chunks per page.
> > Did the above say that I had 6144 chunks that needed to be synced?
> 
> No.  It said that of the 44 pages of space that might be needed to store
> 16-bit counters that each represent 1 bitmap-chunk, only 3 of those
> pages would contain non-zero counters, so only 3 had been allocated.
> 
> There could be as few as 3 chunks that need to be recovered, or there
> could be as many a 3*2048 chunks, or any number in between.
 
Ah, I see. I wasn't clear about that part, thanks.

> Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
> would have told you how many bits were set at that time.

Noted for next time, thanks.

> > The part I'm not too clear about is 44 pages of intent isn't enough to
> > cover all my data.
> 
> 44 pages means 90112 16 bit counters, one for each 64M on each device.
> 90112 * 64M = 5632 GiB or 5905GB.
> That is the size of each device.

Ah, so it's not based on the 512k chunk size, I see. 

> One bit in the bitmap (one counter in the internal bitmap) corresponds
> to "a set of data the might be out of sync" which, in your case, is a
> 64MB wide stripe across all devices.

I got that part now, just now where the 64MB came from, or how I ended
up with 44 pages of intent maps if it's not based on my chunk size.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-06  3:57 ` NeilBrown
  2017-06-07  3:03   ` Marc MERLIN
@ 2017-06-20 18:27   ` Marc MERLIN
  2017-06-20 18:31     ` Marc MERLIN
  1 sibling, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2017-06-20 18:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Tue, Jun 06, 2017 at 01:57:27PM +1000, NeilBrown wrote:
> Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
> would have told you how many bits were set at that time.
> 
> That "x/y pages" information never should have appeared in /proc/mdstat
> - it is really just of interest to developers.  But it is there now, so
> removing it is awkward.

So, I got the problem again, re-added a drive that had just been missing
for maybe 2mn, and this time I'm getting a very long (but not full)
recovery:

gargamel:~# mdadm --examine-bitmap  /dev/sdh1
        Filename : /dev/sdh1
           Magic : 6d746962
         Version : 4
            UUID : 589f1176:8ee48905:d102340b:23f98ca1
          Events : 11588
  Events Cleared : 10625
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953380928 (1862.89 GiB 2000.26 GB)
          Bitmap : 29807 bits (chunks), 267 dirty (0.9%)

0.9% dirty still gives me a 5H recovery?


md8 : active raid5 sdi1[5] sdd1[0] sdh1[3] sdg1[2] sdf1[1]
      7813523712 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
      [==>..................]  recovery = 12.4% (243429660/1953380928) finish=297.6min speed=95741K/sec
      bitmap: 5/15 pages [20KB], 65536KB chunk

gargamel:~# mdadm --query --detail /dev/md8
/dev/md8:
        Version : 1.2
  Creation Time : Sun May 14 08:59:13 2017
     Raid Level : raid5
     Array Size : 7813523712 (7451.56 GiB 8001.05 GB)
  Used Dev Size : 1953380928 (1862.89 GiB 2000.26 GB)
   Raid Devices : 5
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jun 20 11:01:02 2017
          State : clean, degraded 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : gargamel.svh.merlins.org:8  (local to host gargamel.svh.merlins.org)
           UUID : 589f1176:8ee48905:d102340b:23f98ca1
         Events : 11588

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       81        1      active sync   /dev/sdf1
       2       8       97        2      active sync   /dev/sdg1
       3       8      113        3      active sync   /dev/sdh1
       -       0        0        4      removed

This was the drive that got kicked out (shown before it was re-added):
gargamel:~# mdadm --examine-bitmap  /dev/sdi1
        Filename : /dev/sdi1
           Magic : 6d746962
         Version : 4
            UUID : 589f1176:8ee48905:d102340b:23f98ca1
          Events : 10630
  Events Cleared : 10625
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953380928 (1862.89 GiB 2000.26 GB)
          Bitmap : 29807 bits (chunks), 0 dirty (0.0%)


Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-20 18:27   ` Marc MERLIN
@ 2017-06-20 18:31     ` Marc MERLIN
  2017-06-20 18:40       ` Roman Mamedov
  2017-06-20 21:02       ` NeilBrown
  0 siblings, 2 replies; 10+ messages in thread
From: Marc MERLIN @ 2017-06-20 18:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Tue, Jun 20, 2017 at 11:27:45AM -0700, Marc MERLIN wrote:
> On Tue, Jun 06, 2017 at 01:57:27PM +1000, NeilBrown wrote:
> > Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
> > would have told you how many bits were set at that time.
> > 
> > That "x/y pages" information never should have appeared in /proc/mdstat
> > - it is really just of interest to developers.  But it is there now, so
> > removing it is awkward.
> 
> So, I got the problem again, re-added a drive that had just been missing
> for maybe 2mn, and this time I'm getting a very long (but not full)
> recovery:
 
Mmmh, this is puzzling.

The progress meter was wrong, it recovered in 3mn:
Jun 20 11:19:28 gargamel kernel: [  916.007017] md: recovery of RAID array md8
Jun 20 11:22:41 gargamel kernel: [ 1108.395580] md: md8: recovery done.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-20 18:31     ` Marc MERLIN
@ 2017-06-20 18:40       ` Roman Mamedov
  2017-06-20 21:02       ` NeilBrown
  1 sibling, 0 replies; 10+ messages in thread
From: Roman Mamedov @ 2017-06-20 18:40 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

On Tue, 20 Jun 2017 11:31:54 -0700
Marc MERLIN <marc@merlins.org> wrote:

> On Tue, Jun 20, 2017 at 11:27:45AM -0700, Marc MERLIN wrote:
> > On Tue, Jun 06, 2017 at 01:57:27PM +1000, NeilBrown wrote:
> > > Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
> > > would have told you how many bits were set at that time.
> > > 
> > > That "x/y pages" information never should have appeared in /proc/mdstat
> > > - it is really just of interest to developers.  But it is there now, so
> > > removing it is awkward.
> > 
> > So, I got the problem again, re-added a drive that had just been missing
> > for maybe 2mn, and this time I'm getting a very long (but not full)
> > recovery:
>  
> Mmmh, this is puzzling.
> 
> The progress meter was wrong, it recovered in 3mn:
> Jun 20 11:19:28 gargamel kernel: [  916.007017] md: recovery of RAID array md8
> Jun 20 11:22:41 gargamel kernel: [ 1108.395580] md: md8: recovery done.

Yes the progress bar in effect just visualizes the entire drive area, and
thanks to the bitmap md will do syncing only where it needs to, skipping over
the parts which are in sync. So the progress will instantly jump dozens of
percents (skipping hundreds of GBs), stopping only here and there for a while.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-20 18:31     ` Marc MERLIN
  2017-06-20 18:40       ` Roman Mamedov
@ 2017-06-20 21:02       ` NeilBrown
  2017-06-20 21:32         ` Marc MERLIN
  2017-06-21 11:08         ` Nix
  1 sibling, 2 replies; 10+ messages in thread
From: NeilBrown @ 2017-06-20 21:02 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

On Tue, Jun 20 2017, Marc MERLIN wrote:

> On Tue, Jun 20, 2017 at 11:27:45AM -0700, Marc MERLIN wrote:
>> On Tue, Jun 06, 2017 at 01:57:27PM +1000, NeilBrown wrote:
>> > Had you run "mdadm --examine-bitmap /dev/sdk1" before the re-add, it
>> > would have told you how many bits were set at that time.
>> > 
>> > That "x/y pages" information never should have appeared in /proc/mdstat
>> > - it is really just of interest to developers.  But it is there now, so
>> > removing it is awkward.
>> 
>> So, I got the problem again, re-added a drive that had just been missing
>> for maybe 2mn, and this time I'm getting a very long (but not full)
>> recovery:
>  
> Mmmh, this is puzzling.
>
> The progress meter was wrong, it recovered in 3mn:
> Jun 20 11:19:28 gargamel kernel: [  916.007017] md: recovery of RAID array md8
> Jun 20 11:22:41 gargamel kernel: [ 1108.395580] md: md8: recovery done.

It is a progress bar - haven't you learned by now that they are *always*
wrong :-)

recovery always reports progress in sectors completed, and estimates
time based on how many sectors were processes in the last 30 seconds,
and how many are left.

With a bitmap based recovery, most sectors are handled very quickly
(instantly?), while some take milliseconds.  That makes the estimate
imprecise.

NeilBrown


>
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-20 21:02       ` NeilBrown
@ 2017-06-20 21:32         ` Marc MERLIN
  2017-06-21 11:08         ` Nix
  1 sibling, 0 replies; 10+ messages in thread
From: Marc MERLIN @ 2017-06-20 21:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

On Wed, Jun 21, 2017 at 07:02:33AM +1000, NeilBrown wrote:
> > The progress meter was wrong, it recovered in 3mn:
> > Jun 20 11:19:28 gargamel kernel: [  916.007017] md: recovery of RAID array md8
> > Jun 20 11:22:41 gargamel kernel: [ 1108.395580] md: md8: recovery done.
> 
> It is a progress bar - haven't you learned by now that they are *always*
> wrong :-)
> 
> recovery always reports progress in sectors completed, and estimates
> time based on how many sectors were processes in the last 30 seconds,
> and how many are left.
> 
> With a bitmap based recovery, most sectors are handled very quickly
> (instantly?), while some take milliseconds.  That makes the estimate
> imprecise.

Indeed.
Sorry that I jumped the gun, I think it looked like it was going to take
hours, and then it completed a few seconds after I had sent the Email.

Sorry for the noise, and obviously it's awesome that it worked so quickly
again
(been dealing with some unstable underlying SAS card that tends to kill
drives if the cable isn't plugged in perfectly after a good amount of
compressed air sent to the connector and the cable's plug)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 291 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 9 second recovery when re-adding a drive that got kicked out?
  2017-06-20 21:02       ` NeilBrown
  2017-06-20 21:32         ` Marc MERLIN
@ 2017-06-21 11:08         ` Nix
  1 sibling, 0 replies; 10+ messages in thread
From: Nix @ 2017-06-21 11:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: Marc MERLIN, linux-raid

On 20 Jun 2017, NeilBrown verbalised:

> On Tue, Jun 20 2017, Marc MERLIN wrote:
>> The progress meter was wrong, it recovered in 3mn:
>> Jun 20 11:19:28 gargamel kernel: [  916.007017] md: recovery of RAID array md8
>> Jun 20 11:22:41 gargamel kernel: [ 1108.395580] md: md8: recovery done.
>
> It is a progress bar - haven't you learned by now that they are *always*
> wrong :-)

I'd call this progress bar remarkably right by the standards of the
field -- it never jumps backwards :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-21 11:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-04 22:38 9 second recovery when re-adding a drive that got kicked out? Marc MERLIN
2017-06-06  2:58 ` Phil Turmel
2017-06-06  3:57 ` NeilBrown
2017-06-07  3:03   ` Marc MERLIN
2017-06-20 18:27   ` Marc MERLIN
2017-06-20 18:31     ` Marc MERLIN
2017-06-20 18:40       ` Roman Mamedov
2017-06-20 21:02       ` NeilBrown
2017-06-20 21:32         ` Marc MERLIN
2017-06-21 11:08         ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.