All of lore.kernel.org
 help / color / mirror / Atom feed
* Understanding raid array status: Active vs Clean
@ 2014-05-26 20:08 George Duffield
  2014-05-28 18:05 ` George Duffield
  2014-05-29  5:16 ` NeilBrown
  0 siblings, 2 replies; 15+ messages in thread
From: George Duffield @ 2014-05-26 20:08 UTC (permalink / raw)
  To: linux-raid

I recently created a raid 5 array under Arch Linux running on a HP
Microserver using pretty much the same topography as I do under Ubuntu
Server.  The creation process went fine and the array is accessible,
however, from the outset it's only ever reported status as Active
rather than Clean.

After creating the array, watch -d cat /proc/mdstat returned:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdc1[2] sde1[5] sdb1[1] sdd1[3]
      11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
[5/5] [UUUUU]
      bitmap: 2/22 pages [8KB], 65536KB chunk

unused devices: <none>

which to me pretty much looks like the array sync completed successfully.

I then updated the config file, assembled the array and formatted it using:
mdadm --detail --scan >> /etc/mdadm.conf
mdadm --assemble --scan
mkfs.ext4 -v -L offsitestorage -b 4096 -E stride=128,stripe-width=512 /dev/md0

mdadm --detail /dev/md0 returns:

/dev/md0:
        Version : 1.2
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Apr 17 18:55:01 2014
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : audioliboffsite:0  (local to host audioliboffsite)
           UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
         Events : 11306

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
       5       8       65        4      active sync   /dev/sde1

So, I'm now left wondering why the state of the array isn't "clean"?
Is it normal for arrays to show a state of "active" instead of clean
under Arch - is it simply a matter of Arch is packaged with a more
recent version of mdadm than Ubuntu Server?

Thx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-05-26 20:08 Understanding raid array status: Active vs Clean George Duffield
@ 2014-05-28 18:05 ` George Duffield
  2014-05-29  5:16 ` NeilBrown
  1 sibling, 0 replies; 15+ messages in thread
From: George Duffield @ 2014-05-28 18:05 UTC (permalink / raw)
  To: linux-raid

Anyone able to provide some insight please?

On Mon, May 26, 2014 at 10:08 PM, George Duffield
<forumscollective@gmail.com> wrote:
> I recently created a raid 5 array under Arch Linux running on a HP
> Microserver using pretty much the same topography as I do under Ubuntu
> Server.  The creation process went fine and the array is accessible,
> however, from the outset it's only ever reported status as Active
> rather than Clean.
>
> After creating the array, watch -d cat /proc/mdstat returned:
>
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdc1[2] sde1[5] sdb1[1] sdd1[3]
>       11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [5/5] [UUUUU]
>       bitmap: 2/22 pages [8KB], 65536KB chunk
>
> unused devices: <none>
>
> which to me pretty much looks like the array sync completed successfully.
>
> I then updated the config file, assembled the array and formatted it using:
> mdadm --detail --scan >> /etc/mdadm.conf
> mdadm --assemble --scan
> mkfs.ext4 -v -L offsitestorage -b 4096 -E stride=128,stripe-width=512 /dev/md0
>
> mdadm --detail /dev/md0 returns:
>
> /dev/md0:
>         Version : 1.2
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>    Raid Devices : 5
>   Total Devices : 5
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Thu Apr 17 18:55:01 2014
>           State : active
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>          Events : 11306
>
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        1       8       17        1      active sync   /dev/sdb1
>        2       8       33        2      active sync   /dev/sdc1
>        3       8       49        3      active sync   /dev/sdd1
>        5       8       65        4      active sync   /dev/sde1
>
> So, I'm now left wondering why the state of the array isn't "clean"?
> Is it normal for arrays to show a state of "active" instead of clean
> under Arch - is it simply a matter of Arch is packaged with a more
> recent version of mdadm than Ubuntu Server?
>
> Thx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-05-26 20:08 Understanding raid array status: Active vs Clean George Duffield
  2014-05-28 18:05 ` George Duffield
@ 2014-05-29  5:16 ` NeilBrown
  2014-05-29  5:52   ` forumscollective
  1 sibling, 1 reply; 15+ messages in thread
From: NeilBrown @ 2014-05-29  5:16 UTC (permalink / raw)
  To: George Duffield; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2770 bytes --]

On Mon, 26 May 2014 22:08:40 +0200 George Duffield
<forumscollective@gmail.com> wrote:

> I recently created a raid 5 array under Arch Linux running on a HP
> Microserver using pretty much the same topography as I do under Ubuntu
> Server.  The creation process went fine and the array is accessible,
> however, from the outset it's only ever reported status as Active
> rather than Clean.
> 
> After creating the array, watch -d cat /proc/mdstat returned:
> 
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdc1[2] sde1[5] sdb1[1] sdd1[3]
>       11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [5/5] [UUUUU]
>       bitmap: 2/22 pages [8KB], 65536KB chunk
> 
> unused devices: <none>
> 
> which to me pretty much looks like the array sync completed successfully.
> 
> I then updated the config file, assembled the array and formatted it using:
> mdadm --detail --scan >> /etc/mdadm.conf
> mdadm --assemble --scan
> mkfs.ext4 -v -L offsitestorage -b 4096 -E stride=128,stripe-width=512 /dev/md0
> 
> mdadm --detail /dev/md0 returns:
> 
> /dev/md0:
>         Version : 1.2
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>    Raid Devices : 5
>   Total Devices : 5
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Thu Apr 17 18:55:01 2014
>           State : active
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>          Events : 11306
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        1       8       17        1      active sync   /dev/sdb1
>        2       8       33        2      active sync   /dev/sdc1
>        3       8       49        3      active sync   /dev/sdd1
>        5       8       65        4      active sync   /dev/sde1
> 
> So, I'm now left wondering why the state of the array isn't "clean"?
> Is it normal for arrays to show a state of "active" instead of clean
> under Arch - is it simply a matter of Arch is packaged with a more
> recent version of mdadm than Ubuntu Server?

I doubt there is a difference between Ubuntu and Arch here.

The array should show "active" in "mdadm --detail" output for 200ms after the
last write, and then switch to 'clean'.
So if you are writing every 100ms, it will always say "active".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-05-29  5:16 ` NeilBrown
@ 2014-05-29  5:52   ` forumscollective
  2014-05-29  6:06     ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: forumscollective @ 2014-05-29  5:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

> I doubt there is a difference between Ubuntu and Arch here.
> 
> The array should show "active" in "mdadm --detail" output for 200ms after the
> last write, and then switch to 'clean'.
> So if you are writing every 100ms, it will always say "active".
> 
> NeilBrown

For some reason this array has never shown Clean status, only Active.

It isn't written to other than when NFS mounted for adding new content.

Any idea what would cause constant writing - I presume from what I see that the initial array sync completed?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-05-29  5:52   ` forumscollective
@ 2014-05-29  6:06     ` NeilBrown
  2014-06-17 14:31       ` George Duffield
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2014-05-29  6:06 UTC (permalink / raw)
  To: forumscollective; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Thu, 29 May 2014 07:52:02 +0200 forumscollective@gmail.com wrote:

> > I doubt there is a difference between Ubuntu and Arch here.
> > 
> > The array should show "active" in "mdadm --detail" output for 200ms after the
> > last write, and then switch to 'clean'.
> > So if you are writing every 100ms, it will always say "active".
> > 
> > NeilBrown
> 
> For some reason this array has never shown Clean status, only Active.
> 
> It isn't written to other than when NFS mounted for adding new content.
> 
> Any idea what would cause constant writing - I presume from what I see that the initial array sync completed?--

Hmmm...
Do the numbers in /proc/diskstats change?

  watch -d 'grep md0 /proc/diskstats'

What is in /sys/block/md0/md/safe_mode_delay?
What if you change that to a different number (it is in seconds and can be
fractional)?

What  kernel version (uname -a)?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-05-29  6:06     ` NeilBrown
@ 2014-06-17 14:31       ` George Duffield
  2014-06-18 13:25         ` George Duffield
  0 siblings, 1 reply; 15+ messages in thread
From: George Duffield @ 2014-06-17 14:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Apologies for the long delay in responding - I had further issues with
Microservers trashing the first drive in the backplane, including one
of the drives for the array in question (in the case of the array it
seems the drive lost power and dropped out the array, albeit it's
fully functional now and passes SMART testing).  As a result I've
built new machines using a mini-itx motherboards and made a clean
install of Arch Linux - finished that last night, so now have the
array migrated to the new machine and powered up, albeit in degraded
mode.  I'd appreciate some advice re rebuilding this array (by adding
back the drive in question).  I've set out below pertinent info
relating to the array and hard drives in the system as well as my
intended recovery strategy.  As can be seen from lsblk, /dev/sdb1 is
the drive that is no longer recognised as being part of the array.  It
has not been written to since the incident occurred.  Is there a quick
& easy to reintegrate it into the array or is my only option to run:
# mdadm /dev/md0 --add /dev/sdb1

and let it take its course?

The machine has a 3.5Ghz i3 CPU and currently has 8GB ram installed, I
can swap out the 4GB chips and replace with 8GB chips if 16GB RAM will
significantly increase the rebuild speed.  I'd also like to speed up
the rebuild as far as possible, so my plan is to set the following
parameters, (but I've no idea what safe numbers would be).

dev.raid.speed_limit_min =
dev.raid.speed_limit_max =

Current values are:
# sysctl dev.raid.speed_limit_min
dev.raid.speed_limit_min = 1000
# sysctl dev.raid.speed_limit_max
dev.raid.speed_limit_max = 200000

Set readahead:
# blockdev --setra 65536 /dev/md0

Set stripe_cache_size to 32 MiB:
# echo 32768 > /sys/block/md0/md/stripe_cache_size

Turn on bitmaps:
# mdadm --grow --bitmap=internal /dev/md0

Rebuild the array by reintegrating /dev/sdb1:
# mdadm /dev/md0 --add /dev/sdb1

Turn off bitmaps after rebuild is completed:
# mdadm --grow --bitmap=none /dev/md0


Thanks for your time and patience.


Current Array and hardware stats:
-------------------------------------------------

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
   Raid Devices : 5
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jun  3 17:38:15 2014
          State : active, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : audioliboffsite:0  (local to host audioliboffsite)
           UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
         Events : 11314

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       65        1      active sync   /dev/sde1
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1
       5       8       49        4      active sync   /dev/sdd1

# lsblk -i
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda       8:0    1  7.5G  0 disk
|-sda1    8:1    1  512M  0 part  /boot
`-sda2    8:2    1    7G  0 part  /
sdb       8:16   0  2.7T  0 disk
`-sdb1    8:17   0  2.7T  0 part
sdc       8:32   0  2.7T  0 disk
`-sdc1    8:33   0  2.7T  0 part
  `-md0   9:0    0 10.9T  0 raid5
sdd       8:48   0  2.7T  0 disk
`-sdd1    8:49   0  2.7T  0 part
  `-md0   9:0    0 10.9T  0 raid5
sde       8:64   0  2.7T  0 disk
`-sde1    8:65   0  2.7T  0 part
  `-md0   9:0    0 10.9T  0 raid5
sdf       8:80   0  2.7T  0 disk
`-sdf1    8:81   0  2.7T  0 part
  `-md0   9:0    0 10.9T  0 raid5







I've answered your questions below as best I can:

>> Any idea what would cause constant writing - I presume from what I see that the initial array sync completed?--
>
> Hmmm...
> Do the numbers in /proc/diskstats change?
>
>   watch -d 'grep md0 /proc/diskstats'


Nope, they remain constant


> What is in /sys/block/md0/md/safe_mode_delay?

0.203 is the value at present - I can try changing it afrter
rebuilding the array.


> What if you change that to a different number (it is in seconds and can be
> fractional)?
>
> What  kernel version (uname -a)?

3.14.6-1-ARCH #1 SMP PREEMPT Sun Jun 8 10:08:38 CEST 2014 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-17 14:31       ` George Duffield
@ 2014-06-18 13:25         ` George Duffield
  2014-06-18 14:31           ` George Duffield
  2014-06-18 15:03           ` Robin Hill
  0 siblings, 2 replies; 15+ messages in thread
From: George Duffield @ 2014-06-18 13:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

A little more information if it helps deciding on the best recovery
strategy.  As can be seen all drives still in the array have event
count:
Events : 11314

The drive that fell out of the array has an event count of:
Events : 11306

Unless mdadm writes to the drives when a machine is booted or the
array partitioned I know for certain that the array has not been
written to i.e. no files have been added or deleted.

Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
to me the following guidance applies:
If the event count closely matches but not exactly, use "mdadm
--assemble --force /dev/mdX <list of devices>" to force mdadm to
assemble the array anyway using the devices with the closest possible
event count. If the event count of a drive is way off, this probably
means that drive has been out of the array for a long time and
shouldn't be included in the assembly. Re-add it after the assembly so
it's sync:ed up using information from the drives with closest event
counts.

However, in my case the array has been auto assebled by mdadm at boot
time.  How would I best go about adding /dev/sdb1 back into the array?


Superblock information:

# mdadm --examine /dev/sd[bcdef]1

/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
           Name : audioliboffsite:0  (local to host audioliboffsite)
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : e9663464:5b912bb1:a5617fe9:19abfc55

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jun  3 17:31:02 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : fb31415f - correct
         Events : 11306

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
           Name : audioliboffsite:0  (local to host audioliboffsite)
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 71052522:8b78da02:3e0cd6da:f3b3eb3e

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jun  3 17:38:15 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : e5177c43 - correct
         Events : 11314

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
           Name : audioliboffsite:0  (local to host audioliboffsite)
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 2bd0953f:2319fe92:2dbe7e53:4b16fc80

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jun  3 17:38:15 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4d64fbdf - correct
         Events : 11314

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
           Name : audioliboffsite:0  (local to host audioliboffsite)
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 3e1155bb:a4b65803:caf487e4:9bb01396

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jun  3 17:38:15 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : df9fab5c - correct
         Events : 11314

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
           Name : audioliboffsite:0  (local to host audioliboffsite)
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 1714ea64:c1610064:b8603f47:eaaffc3c

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jun  3 17:38:15 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : f37cc48f - correct
         Events : 11314

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)




Checking event count on all drives making up the array (and the member
that "failed"):

[root@audioliboffsite ~]# mdadm --examine /dev/sdb
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
[root@audioliboffsite ~]# mdadm --examine /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
[root@audioliboffsite ~]# mdadm --examine /dev/sdd
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
[root@audioliboffsite ~]# mdadm --examine /dev/sde
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
[root@audioliboffsite ~]# mdadm --examine /dev/sdf
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)


On Tue, Jun 17, 2014 at 4:31 PM, George Duffield
<forumscollective@gmail.com> wrote:
> Apologies for the long delay in responding - I had further issues with
> Microservers trashing the first drive in the backplane, including one
> of the drives for the array in question (in the case of the array it
> seems the drive lost power and dropped out the array, albeit it's
> fully functional now and passes SMART testing).  As a result I've
> built new machines using a mini-itx motherboards and made a clean
> install of Arch Linux - finished that last night, so now have the
> array migrated to the new machine and powered up, albeit in degraded
> mode.  I'd appreciate some advice re rebuilding this array (by adding
> back the drive in question).  I've set out below pertinent info
> relating to the array and hard drives in the system as well as my
> intended recovery strategy.  As can be seen from lsblk, /dev/sdb1 is
> the drive that is no longer recognised as being part of the array.  It
> has not been written to since the incident occurred.  Is there a quick
> & easy to reintegrate it into the array or is my only option to run:
> # mdadm /dev/md0 --add /dev/sdb1
>
> and let it take its course?
>
> The machine has a 3.5Ghz i3 CPU and currently has 8GB ram installed, I
> can swap out the 4GB chips and replace with 8GB chips if 16GB RAM will
> significantly increase the rebuild speed.  I'd also like to speed up
> the rebuild as far as possible, so my plan is to set the following
> parameters, (but I've no idea what safe numbers would be).
>
> dev.raid.speed_limit_min =
> dev.raid.speed_limit_max =
>
> Current values are:
> # sysctl dev.raid.speed_limit_min
> dev.raid.speed_limit_min = 1000
> # sysctl dev.raid.speed_limit_max
> dev.raid.speed_limit_max = 200000
>
> Set readahead:
> # blockdev --setra 65536 /dev/md0
>
> Set stripe_cache_size to 32 MiB:
> # echo 32768 > /sys/block/md0/md/stripe_cache_size
>
> Turn on bitmaps:
> # mdadm --grow --bitmap=internal /dev/md0
>
> Rebuild the array by reintegrating /dev/sdb1:
> # mdadm /dev/md0 --add /dev/sdb1
>
> Turn off bitmaps after rebuild is completed:
> # mdadm --grow --bitmap=none /dev/md0
>
>
> Thanks for your time and patience.
>
>
> Current Array and hardware stats:
> -------------------------------------------------
>
> # mdadm --detail /dev/md0
> /dev/md0:
>         Version : 1.2
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>    Raid Devices : 5
>   Total Devices : 4
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Tue Jun  3 17:38:15 2014
>           State : active, degraded
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>          Events : 11314
>
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        1       8       65        1      active sync   /dev/sde1
>        2       8       81        2      active sync   /dev/sdf1
>        3       8       33        3      active sync   /dev/sdc1
>        5       8       49        4      active sync   /dev/sdd1
>
> # lsblk -i
> NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
> sda       8:0    1  7.5G  0 disk
> |-sda1    8:1    1  512M  0 part  /boot
> `-sda2    8:2    1    7G  0 part  /
> sdb       8:16   0  2.7T  0 disk
> `-sdb1    8:17   0  2.7T  0 part
> sdc       8:32   0  2.7T  0 disk
> `-sdc1    8:33   0  2.7T  0 part
>   `-md0   9:0    0 10.9T  0 raid5
> sdd       8:48   0  2.7T  0 disk
> `-sdd1    8:49   0  2.7T  0 part
>   `-md0   9:0    0 10.9T  0 raid5
> sde       8:64   0  2.7T  0 disk
> `-sde1    8:65   0  2.7T  0 part
>   `-md0   9:0    0 10.9T  0 raid5
> sdf       8:80   0  2.7T  0 disk
> `-sdf1    8:81   0  2.7T  0 part
>   `-md0   9:0    0 10.9T  0 raid5
>
>
>
>
>
>
>
> I've answered your questions below as best I can:
>
>>> Any idea what would cause constant writing - I presume from what I see that the initial array sync completed?--
>>
>> Hmmm...
>> Do the numbers in /proc/diskstats change?
>>
>>   watch -d 'grep md0 /proc/diskstats'
>
>
> Nope, they remain constant
>
>
>> What is in /sys/block/md0/md/safe_mode_delay?
>
> 0.203 is the value at present - I can try changing it afrter
> rebuilding the array.
>
>
>> What if you change that to a different number (it is in seconds and can be
>> fractional)?
>>
>> What  kernel version (uname -a)?
>
> 3.14.6-1-ARCH #1 SMP PREEMPT Sun Jun 8 10:08:38 CEST 2014 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-18 13:25         ` George Duffield
@ 2014-06-18 14:31           ` George Duffield
  2014-06-18 15:03           ` Robin Hill
  1 sibling, 0 replies; 15+ messages in thread
From: George Duffield @ 2014-06-18 14:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Please ignore my reference to the array being partitioned, what I'd
intended to write follows:
Unless mdadm writes to the drives when a machine is booted or the
array MOUNTED I know for certain that the array has not been
written to i.e. no files have been added or deleted from a user
perspective.  The degraded array has been mounted and files read from
the array, but that's it.

Would really appreciate some input here so I can get on with growing
my main array once this "backup" machine is fully functional and I
know the underlying files are intact.

On Wed, Jun 18, 2014 at 3:25 PM, George Duffield
<forumscollective@gmail.com> wrote:
> A little more information if it helps deciding on the best recovery
> strategy.  As can be seen all drives still in the array have event
> count:
> Events : 11314
>
> The drive that fell out of the array has an event count of:
> Events : 11306
>
> Unless mdadm writes to the drives when a machine is booted or the
> array partitioned I know for certain that the array has not been
> written to i.e. no files have been added or deleted.
>
> Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
> to me the following guidance applies:
> If the event count closely matches but not exactly, use "mdadm
> --assemble --force /dev/mdX <list of devices>" to force mdadm to
> assemble the array anyway using the devices with the closest possible
> event count. If the event count of a drive is way off, this probably
> means that drive has been out of the array for a long time and
> shouldn't be included in the assembly. Re-add it after the assembly so
> it's sync:ed up using information from the drives with closest event
> counts.
>
> However, in my case the array has been auto assebled by mdadm at boot
> time.  How would I best go about adding /dev/sdb1 back into the array?
>
>
> Superblock information:
>
> # mdadm --examine /dev/sd[bcdef]1
>
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : e9663464:5b912bb1:a5617fe9:19abfc55
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jun  3 17:31:02 2014
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : fb31415f - correct
>          Events : 11306
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 0
>    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : 71052522:8b78da02:3e0cd6da:f3b3eb3e
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jun  3 17:38:15 2014
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : e5177c43 - correct
>          Events : 11314
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 3
>    Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : 2bd0953f:2319fe92:2dbe7e53:4b16fc80
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jun  3 17:38:15 2014
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : 4d64fbdf - correct
>          Events : 11314
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 4
>    Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : 3e1155bb:a4b65803:caf487e4:9bb01396
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jun  3 17:38:15 2014
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : df9fab5c - correct
>          Events : 11314
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 1
>    Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262056 sectors, after=0 sectors
>           State : clean
>     Device UUID : 1714ea64:c1610064:b8603f47:eaaffc3c
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Tue Jun  3 17:38:15 2014
>   Bad Block Log : 512 entries available at offset 72 sectors
>        Checksum : f37cc48f - correct
>          Events : 11314
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 2
>    Array State : .AAAA ('A' == active, '.' == missing, 'R' == replacing)
>
>
>
>
> Checking event count on all drives making up the array (and the member
> that "failed"):
>
> [root@audioliboffsite ~]# mdadm --examine /dev/sdb
> /dev/sdb:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> [root@audioliboffsite ~]# mdadm --examine /dev/sdc
> /dev/sdc:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> [root@audioliboffsite ~]# mdadm --examine /dev/sdd
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> [root@audioliboffsite ~]# mdadm --examine /dev/sde
> /dev/sde:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> [root@audioliboffsite ~]# mdadm --examine /dev/sdf
> /dev/sdf:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
>
>
> On Tue, Jun 17, 2014 at 4:31 PM, George Duffield
> <forumscollective@gmail.com> wrote:
>> Apologies for the long delay in responding - I had further issues with
>> Microservers trashing the first drive in the backplane, including one
>> of the drives for the array in question (in the case of the array it
>> seems the drive lost power and dropped out the array, albeit it's
>> fully functional now and passes SMART testing).  As a result I've
>> built new machines using a mini-itx motherboards and made a clean
>> install of Arch Linux - finished that last night, so now have the
>> array migrated to the new machine and powered up, albeit in degraded
>> mode.  I'd appreciate some advice re rebuilding this array (by adding
>> back the drive in question).  I've set out below pertinent info
>> relating to the array and hard drives in the system as well as my
>> intended recovery strategy.  As can be seen from lsblk, /dev/sdb1 is
>> the drive that is no longer recognised as being part of the array.  It
>> has not been written to since the incident occurred.  Is there a quick
>> & easy to reintegrate it into the array or is my only option to run:
>> # mdadm /dev/md0 --add /dev/sdb1
>>
>> and let it take its course?
>>
>> The machine has a 3.5Ghz i3 CPU and currently has 8GB ram installed, I
>> can swap out the 4GB chips and replace with 8GB chips if 16GB RAM will
>> significantly increase the rebuild speed.  I'd also like to speed up
>> the rebuild as far as possible, so my plan is to set the following
>> parameters, (but I've no idea what safe numbers would be).
>>
>> dev.raid.speed_limit_min =
>> dev.raid.speed_limit_max =
>>
>> Current values are:
>> # sysctl dev.raid.speed_limit_min
>> dev.raid.speed_limit_min = 1000
>> # sysctl dev.raid.speed_limit_max
>> dev.raid.speed_limit_max = 200000
>>
>> Set readahead:
>> # blockdev --setra 65536 /dev/md0
>>
>> Set stripe_cache_size to 32 MiB:
>> # echo 32768 > /sys/block/md0/md/stripe_cache_size
>>
>> Turn on bitmaps:
>> # mdadm --grow --bitmap=internal /dev/md0
>>
>> Rebuild the array by reintegrating /dev/sdb1:
>> # mdadm /dev/md0 --add /dev/sdb1
>>
>> Turn off bitmaps after rebuild is completed:
>> # mdadm --grow --bitmap=none /dev/md0
>>
>>
>> Thanks for your time and patience.
>>
>>
>> Current Array and hardware stats:
>> -------------------------------------------------
>>
>> # mdadm --detail /dev/md0
>> /dev/md0:
>>         Version : 1.2
>>   Creation Time : Thu Apr 17 01:13:52 2014
>>      Raid Level : raid5
>>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>>     Persistence : Superblock is persistent
>>
>>   Intent Bitmap : Internal
>>
>>     Update Time : Tue Jun  3 17:38:15 2014
>>           State : active, degraded
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>            Name : audioliboffsite:0  (local to host audioliboffsite)
>>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>>          Events : 11314
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       0        0        0      removed
>>        1       8       65        1      active sync   /dev/sde1
>>        2       8       81        2      active sync   /dev/sdf1
>>        3       8       33        3      active sync   /dev/sdc1
>>        5       8       49        4      active sync   /dev/sdd1
>>
>> # lsblk -i
>> NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
>> sda       8:0    1  7.5G  0 disk
>> |-sda1    8:1    1  512M  0 part  /boot
>> `-sda2    8:2    1    7G  0 part  /
>> sdb       8:16   0  2.7T  0 disk
>> `-sdb1    8:17   0  2.7T  0 part
>> sdc       8:32   0  2.7T  0 disk
>> `-sdc1    8:33   0  2.7T  0 part
>>   `-md0   9:0    0 10.9T  0 raid5
>> sdd       8:48   0  2.7T  0 disk
>> `-sdd1    8:49   0  2.7T  0 part
>>   `-md0   9:0    0 10.9T  0 raid5
>> sde       8:64   0  2.7T  0 disk
>> `-sde1    8:65   0  2.7T  0 part
>>   `-md0   9:0    0 10.9T  0 raid5
>> sdf       8:80   0  2.7T  0 disk
>> `-sdf1    8:81   0  2.7T  0 part
>>   `-md0   9:0    0 10.9T  0 raid5
>>
>>
>>
>>
>>
>>
>>
>> I've answered your questions below as best I can:
>>
>>>> Any idea what would cause constant writing - I presume from what I see that the initial array sync completed?--
>>>
>>> Hmmm...
>>> Do the numbers in /proc/diskstats change?
>>>
>>>   watch -d 'grep md0 /proc/diskstats'
>>
>>
>> Nope, they remain constant
>>
>>
>>> What is in /sys/block/md0/md/safe_mode_delay?
>>
>> 0.203 is the value at present - I can try changing it afrter
>> rebuilding the array.
>>
>>
>>> What if you change that to a different number (it is in seconds and can be
>>> fractional)?
>>>
>>> What  kernel version (uname -a)?
>>
>> 3.14.6-1-ARCH #1 SMP PREEMPT Sun Jun 8 10:08:38 CEST 2014 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-18 13:25         ` George Duffield
  2014-06-18 14:31           ` George Duffield
@ 2014-06-18 15:03           ` Robin Hill
  2014-06-18 15:57             ` George Duffield
  1 sibling, 1 reply; 15+ messages in thread
From: Robin Hill @ 2014-06-18 15:03 UTC (permalink / raw)
  To: George Duffield; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4862 bytes --]

On Wed Jun 18, 2014 at 03:25:27PM +0200, George Duffield wrote:

> A little more information if it helps deciding on the best recovery
> strategy.  As can be seen all drives still in the array have event
> count:
> Events : 11314
> 
> The drive that fell out of the array has an event count of:
> Events : 11306
> 
> Unless mdadm writes to the drives when a machine is booted or the
> array partitioned I know for certain that the array has not been
> written to i.e. no files have been added or deleted.
> 
> Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
> to me the following guidance applies:
> If the event count closely matches but not exactly, use "mdadm
> --assemble --force /dev/mdX <list of devices>" to force mdadm to
> assemble the array anyway using the devices with the closest possible
> event count. If the event count of a drive is way off, this probably
> means that drive has been out of the array for a long time and
> shouldn't be included in the assembly. Re-add it after the assembly so
> it's sync:ed up using information from the drives with closest event
> counts.
> 
> However, in my case the array has been auto assebled by mdadm at boot
> time.  How would I best go about adding /dev/sdb1 back into the array?
> 
That doesn't matter here - a force assemble would have left out the
drive with the lower event count as well. As there's a bitmap on the
array then either a --re-add or a --add (these should be treated the
same for arrays with persistent superblocks) should just synch any
differences since the disk was failed.

> On Tue, Jun 17, 2014 at 4:31 PM, George Duffield
> <forumscollective@gmail.com> wrote:
> > Apologies for the long delay in responding - I had further issues with
> > Microservers trashing the first drive in the backplane, including one
> > of the drives for the array in question (in the case of the array it
> > seems the drive lost power and dropped out the array, albeit it's
> > fully functional now and passes SMART testing).  As a result I've
> > built new machines using a mini-itx motherboards and made a clean
> > install of Arch Linux - finished that last night, so now have the
> > array migrated to the new machine and powered up, albeit in degraded
> > mode.  I'd appreciate some advice re rebuilding this array (by adding
> > back the drive in question).  I've set out below pertinent info
> > relating to the array and hard drives in the system as well as my
> > intended recovery strategy.  As can be seen from lsblk, /dev/sdb1 is
> > the drive that is no longer recognised as being part of the array.  It
> > has not been written to since the incident occurred.  Is there a quick
> > & easy to reintegrate it into the array or is my only option to run:
> > # mdadm /dev/md0 --add /dev/sdb1
> >
> > and let it take its course?
> >
> > The machine has a 3.5Ghz i3 CPU and currently has 8GB ram installed, I
> > can swap out the 4GB chips and replace with 8GB chips if 16GB RAM will
> > significantly increase the rebuild speed.  I'd also like to speed up
> > the rebuild as far as possible, so my plan is to set the following
> > parameters, (but I've no idea what safe numbers would be).
> >
> > dev.raid.speed_limit_min =
> > dev.raid.speed_limit_max =
> >
> > Current values are:
> > # sysctl dev.raid.speed_limit_min
> > dev.raid.speed_limit_min = 1000
> > # sysctl dev.raid.speed_limit_max
> > dev.raid.speed_limit_max = 200000
> >
You can set these as high as you like, though it can impact other tasks.
I'd suggest bumping the speed_limit_min up gradually and seeing how it
goes (unless you're hitting speed_limit_max already).

> > Set readahead:
> > # blockdev --setra 65536 /dev/md0
> >
> > Set stripe_cache_size to 32 MiB:
> > # echo 32768 > /sys/block/md0/md/stripe_cache_size
> >
> > Turn on bitmaps:
> > # mdadm --grow --bitmap=internal /dev/md0
> >
> > Rebuild the array by reintegrating /dev/sdb1:
> > # mdadm /dev/md0 --add /dev/sdb1
> >
> > Turn off bitmaps after rebuild is completed:
> > # mdadm --grow --bitmap=none /dev/md0
> >
I'm not sure you can modify the bitmaps on degraded arrays anyway, and
adding one before replacing a failed member won't do any good
regardless. The bitmap is only used if the disk used to be an active
member of the array, so will be ignored until the disk is fully synched
anyway. If you were adding a new disk (rather than just re-adding the
existing failed disk) then it might speed things up to drop the bitmap
until the array rebuild is complete (if this is possible).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-18 15:03           ` Robin Hill
@ 2014-06-18 15:57             ` George Duffield
  2014-06-18 16:04               ` George Duffield
  0 siblings, 1 reply; 15+ messages in thread
From: George Duffield @ 2014-06-18 15:57 UTC (permalink / raw)
  To: George Duffield, linux-raid

Thx Robin

I've run:
# mdadm --manage /dev/md0 --re-add /dev/sdb1
mdadm: re-added /dev/sdb1

# mdadm --detail /dev/md0 now returns:

/dev/md0:
        Version : 1.2
  Creation Time : Thu Apr 17 01:13:52 2014
     Raid Level : raid5
     Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Jun 18 19:46:38 2014
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : audioliboffsite:0  (local to host audioliboffsite)
           UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
         Events : 11319

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       65        1      active sync   /dev/sde1
       2       8       81        2      active sync   /dev/sdf1
       3       8       33        3      active sync   /dev/sdc1
       5       8       49        4      active sync   /dev/sdd1


# watch cat /proc/mdstat returns:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[0] sdd1[5] sdc1[3] sde1[1] sdf1[2]
      11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
[5/5] [UUUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>


# watch -d 'grep md0 /proc/diskstats' returns:
   9       0 md0 348 0 2784 0 0 0 0 0 0 0 0

and the output never changes.

So, array seems OK, and I'm back to the question that started this
thread - why would this array's state be Active rather than Clean?




On Wed, Jun 18, 2014 at 5:03 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Wed Jun 18, 2014 at 03:25:27PM +0200, George Duffield wrote:
>
>> A little more information if it helps deciding on the best recovery
>> strategy.  As can be seen all drives still in the array have event
>> count:
>> Events : 11314
>>
>> The drive that fell out of the array has an event count of:
>> Events : 11306
>>
>> Unless mdadm writes to the drives when a machine is booted or the
>> array partitioned I know for certain that the array has not been
>> written to i.e. no files have been added or deleted.
>>
>> Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
>> to me the following guidance applies:
>> If the event count closely matches but not exactly, use "mdadm
>> --assemble --force /dev/mdX <list of devices>" to force mdadm to
>> assemble the array anyway using the devices with the closest possible
>> event count. If the event count of a drive is way off, this probably
>> means that drive has been out of the array for a long time and
>> shouldn't be included in the assembly. Re-add it after the assembly so
>> it's sync:ed up using information from the drives with closest event
>> counts.
>>
>> However, in my case the array has been auto assebled by mdadm at boot
>> time.  How would I best go about adding /dev/sdb1 back into the array?
>>
> That doesn't matter here - a force assemble would have left out the
> drive with the lower event count as well. As there's a bitmap on the
> array then either a --re-add or a --add (these should be treated the
> same for arrays with persistent superblocks) should just synch any
> differences since the disk was failed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-18 15:57             ` George Duffield
@ 2014-06-18 16:04               ` George Duffield
  2014-06-22 14:32                 ` George Duffield
  0 siblings, 1 reply; 15+ messages in thread
From: George Duffield @ 2014-06-18 16:04 UTC (permalink / raw)
  To: George Duffield, linux-raid

# cat /sys/block/md0/md/safe_mode_delay returns:

0.203

changing the value to 0.500:
# echo 0.503 > /sys/block/md0/md/safe_mode_delay

makes no difference to the array state.



On Wed, Jun 18, 2014 at 5:57 PM, George Duffield
<forumscollective@gmail.com> wrote:
> Thx Robin
>
> I've run:
> # mdadm --manage /dev/md0 --re-add /dev/sdb1
> mdadm: re-added /dev/sdb1
>
> # mdadm --detail /dev/md0 now returns:
>
> /dev/md0:
>         Version : 1.2
>   Creation Time : Thu Apr 17 01:13:52 2014
>      Raid Level : raid5
>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>    Raid Devices : 5
>   Total Devices : 5
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Wed Jun 18 19:46:38 2014
>           State : active
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : audioliboffsite:0  (local to host audioliboffsite)
>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>          Events : 11319
>
>     Number   Major   Minor   RaidDevice State
>        0       8       17        0      active sync   /dev/sdb1
>        1       8       65        1      active sync   /dev/sde1
>        2       8       81        2      active sync   /dev/sdf1
>        3       8       33        3      active sync   /dev/sdc1
>        5       8       49        4      active sync   /dev/sdd1
>
>
> # watch cat /proc/mdstat returns:
>
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[0] sdd1[5] sdc1[3] sde1[1] sdf1[2]
>       11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [5/5] [UUUUU]
>       bitmap: 0/22 pages [0KB], 65536KB chunk
>
> unused devices: <none>
>
>
> # watch -d 'grep md0 /proc/diskstats' returns:
>    9       0 md0 348 0 2784 0 0 0 0 0 0 0 0
>
> and the output never changes.
>
> So, array seems OK, and I'm back to the question that started this
> thread - why would this array's state be Active rather than Clean?
>
>
>
>
> On Wed, Jun 18, 2014 at 5:03 PM, Robin Hill <robin@robinhill.me.uk> wrote:
>> On Wed Jun 18, 2014 at 03:25:27PM +0200, George Duffield wrote:
>>
>>> A little more information if it helps deciding on the best recovery
>>> strategy.  As can be seen all drives still in the array have event
>>> count:
>>> Events : 11314
>>>
>>> The drive that fell out of the array has an event count of:
>>> Events : 11306
>>>
>>> Unless mdadm writes to the drives when a machine is booted or the
>>> array partitioned I know for certain that the array has not been
>>> written to i.e. no files have been added or deleted.
>>>
>>> Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
>>> to me the following guidance applies:
>>> If the event count closely matches but not exactly, use "mdadm
>>> --assemble --force /dev/mdX <list of devices>" to force mdadm to
>>> assemble the array anyway using the devices with the closest possible
>>> event count. If the event count of a drive is way off, this probably
>>> means that drive has been out of the array for a long time and
>>> shouldn't be included in the assembly. Re-add it after the assembly so
>>> it's sync:ed up using information from the drives with closest event
>>> counts.
>>>
>>> However, in my case the array has been auto assebled by mdadm at boot
>>> time.  How would I best go about adding /dev/sdb1 back into the array?
>>>
>> That doesn't matter here - a force assemble would have left out the
>> drive with the lower event count as well. As there's a bitmap on the
>> array then either a --re-add or a --add (these should be treated the
>> same for arrays with persistent superblocks) should just synch any
>> differences since the disk was failed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-18 16:04               ` George Duffield
@ 2014-06-22 14:32                 ` George Duffield
  2014-06-23  2:01                   ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: George Duffield @ 2014-06-22 14:32 UTC (permalink / raw)
  To: linux-raid, NeilBrown

Can anyone give me some hints as to why this array would remain Active
rather than report Clean?  Any help/ insights much appreciated.

On Wed, Jun 18, 2014 at 6:04 PM, George Duffield
<forumscollective@gmail.com> wrote:
> # cat /sys/block/md0/md/safe_mode_delay returns:
>
> 0.203
>
> changing the value to 0.500:
> # echo 0.503 > /sys/block/md0/md/safe_mode_delay
>
> makes no difference to the array state.
>
>
>
> On Wed, Jun 18, 2014 at 5:57 PM, George Duffield
> <forumscollective@gmail.com> wrote:
>> Thx Robin
>>
>> I've run:
>> # mdadm --manage /dev/md0 --re-add /dev/sdb1
>> mdadm: re-added /dev/sdb1
>>
>> # mdadm --detail /dev/md0 now returns:
>>
>> /dev/md0:
>>         Version : 1.2
>>   Creation Time : Thu Apr 17 01:13:52 2014
>>      Raid Level : raid5
>>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>>   Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
>>    Raid Devices : 5
>>   Total Devices : 5
>>     Persistence : Superblock is persistent
>>
>>   Intent Bitmap : Internal
>>
>>     Update Time : Wed Jun 18 19:46:38 2014
>>           State : active
>>  Active Devices : 5
>> Working Devices : 5
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>            Name : audioliboffsite:0  (local to host audioliboffsite)
>>            UUID : aba348c6:8dc7b4a7:4e282ab5:40431aff
>>          Events : 11319
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       17        0      active sync   /dev/sdb1
>>        1       8       65        1      active sync   /dev/sde1
>>        2       8       81        2      active sync   /dev/sdf1
>>        3       8       33        3      active sync   /dev/sdc1
>>        5       8       49        4      active sync   /dev/sdd1
>>
>>
>> # watch cat /proc/mdstat returns:
>>
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdb1[0] sdd1[5] sdc1[3] sde1[1] sdf1[2]
>>       11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2
>> [5/5] [UUUUU]
>>       bitmap: 0/22 pages [0KB], 65536KB chunk
>>
>> unused devices: <none>
>>
>>
>> # watch -d 'grep md0 /proc/diskstats' returns:
>>    9       0 md0 348 0 2784 0 0 0 0 0 0 0 0
>>
>> and the output never changes.
>>
>> So, array seems OK, and I'm back to the question that started this
>> thread - why would this array's state be Active rather than Clean?
>>
>>
>>
>>
>> On Wed, Jun 18, 2014 at 5:03 PM, Robin Hill <robin@robinhill.me.uk> wrote:
>>> On Wed Jun 18, 2014 at 03:25:27PM +0200, George Duffield wrote:
>>>
>>>> A little more information if it helps deciding on the best recovery
>>>> strategy.  As can be seen all drives still in the array have event
>>>> count:
>>>> Events : 11314
>>>>
>>>> The drive that fell out of the array has an event count of:
>>>> Events : 11306
>>>>
>>>> Unless mdadm writes to the drives when a machine is booted or the
>>>> array partitioned I know for certain that the array has not been
>>>> written to i.e. no files have been added or deleted.
>>>>
>>>> Per https://raid.wiki.kernel.org/index.php/RAID_Recovery it would seem
>>>> to me the following guidance applies:
>>>> If the event count closely matches but not exactly, use "mdadm
>>>> --assemble --force /dev/mdX <list of devices>" to force mdadm to
>>>> assemble the array anyway using the devices with the closest possible
>>>> event count. If the event count of a drive is way off, this probably
>>>> means that drive has been out of the array for a long time and
>>>> shouldn't be included in the assembly. Re-add it after the assembly so
>>>> it's sync:ed up using information from the drives with closest event
>>>> counts.
>>>>
>>>> However, in my case the array has been auto assebled by mdadm at boot
>>>> time.  How would I best go about adding /dev/sdb1 back into the array?
>>>>
>>> That doesn't matter here - a force assemble would have left out the
>>> drive with the lower event count as well. As there's a bitmap on the
>>> array then either a --re-add or a --add (these should be treated the
>>> same for arrays with persistent superblocks) should just synch any
>>> differences since the disk was failed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-22 14:32                 ` George Duffield
@ 2014-06-23  2:01                   ` NeilBrown
  2014-06-28  3:01                     ` George Duffield
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2014-06-23  2:01 UTC (permalink / raw)
  To: George Duffield; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

On Sun, 22 Jun 2014 16:32:31 +0200 George Duffield
<forumscollective@gmail.com> wrote:

> Can anyone give me some hints as to why this array would remain Active
> rather than report Clean?  Any help/ insights much appreciated.

It is a mystery.

> 
> On Wed, Jun 18, 2014 at 6:04 PM, George Duffield
> <forumscollective@gmail.com> wrote:
> > # cat /sys/block/md0/md/safe_mode_delay returns:
> >
> > 0.203
> >
> > changing the value to 0.500:
> > # echo 0.503 > /sys/block/md0/md/safe_mode_delay
> >
> > makes no difference to the array state.

What if you write a smaller number?  e.g. 0.1

What does /sys/block/md0/md/array_state show?
If you "echo clean" into that file, does that "fix" it?
Does it stay fixed if you write to the array?

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-23  2:01                   ` NeilBrown
@ 2014-06-28  3:01                     ` George Duffield
  2014-06-28  5:29                       ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: George Duffield @ 2014-06-28  3:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>> > changing the value to 0.500:
>> > # echo 0.503 > /sys/block/md0/md/safe_mode_delay
>> >
>> > makes no difference to the array state.
>
> What if you write a smaller number?  e.g. 0.1

No change to array state.

> What does /sys/block/md0/md/array_state show?

Funnily enough, it shows Clean

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Understanding raid array status: Active vs Clean
  2014-06-28  3:01                     ` George Duffield
@ 2014-06-28  5:29                       ` NeilBrown
  0 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2014-06-28  5:29 UTC (permalink / raw)
  To: George Duffield; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On Sat, 28 Jun 2014 05:01:00 +0200 George Duffield
<forumscollective@gmail.com> wrote:

> >> > changing the value to 0.500:
> >> > # echo 0.503 > /sys/block/md0/md/safe_mode_delay
> >> >
> >> > makes no difference to the array state.
> >
> > What if you write a smaller number?  e.g. 0.1
> 
> No change to array state.
> 
> > What does /sys/block/md0/md/array_state show?
> 
> Funnily enough, it shows Clean


Ahh - I found it.
In get_array_info() in drivers/md/md.c:

	if (mddev->in_sync)
		info.state = (1<<MD_SB_CLEAN);
	if (mddev->bitmap && mddev->bitmap_info.offset)
		info.state = (1<<MD_SB_BITMAP_PRESENT);


that last line should be "|=".

Because you have a bitmap, the 'clean' state is being hidden.

Though if you have a bitmap, the 'clean' state isn't really important because
the bitmap knows which regions are 'clean' and which are not.

But it should be fixed.  I'll send a patch next week.

Thanks for persisting.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-06-28  5:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 20:08 Understanding raid array status: Active vs Clean George Duffield
2014-05-28 18:05 ` George Duffield
2014-05-29  5:16 ` NeilBrown
2014-05-29  5:52   ` forumscollective
2014-05-29  6:06     ` NeilBrown
2014-06-17 14:31       ` George Duffield
2014-06-18 13:25         ` George Duffield
2014-06-18 14:31           ` George Duffield
2014-06-18 15:03           ` Robin Hill
2014-06-18 15:57             ` George Duffield
2014-06-18 16:04               ` George Duffield
2014-06-22 14:32                 ` George Duffield
2014-06-23  2:01                   ` NeilBrown
2014-06-28  3:01                     ` George Duffield
2014-06-28  5:29                       ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.