All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID-1 does not rebuild after hot-add
@ 2003-08-03 14:43 David Chow
  2003-08-03 16:55 ` Stephen Lee
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: David Chow @ 2003-08-03 14:43 UTC (permalink / raw)
  To: linux-raid

Dear Neil,

A problem on hot adding a disk to an existing RAID array. I was 
converting my root fs and other fs to md . While I am using the 
failed-disk and move my data to the new degraded md device, after I hot 
add a new disk to the md , it doesn't start rebuild. What it looks like 
in the syslog is as follows (see belows). Looks like the recovery thread 
dot woken up and finished right away... why? My kernel is 2.4.18-3smp 
which is a RH7.3 vendor kernel. I'd experience on other 2.4.20 RH 
kernels which had the same problem. My end up result is to use "mkraid 
--force" to make it as a new array to enable the resync.  The 
/proc/mdstat also looks wired which one drive is down "[_U]". In fact, 2 
drives are actually healthy. I've trid mdadm -manage which produce the 
same result. I've also tried to dd the partitions to all zero before add 
but same result. Please give direction, as moving the root to somewhere 
else and use mkraid to start with is really stupid (my opinion), 
actually, I've no spare disk to that this time.

regards,
David Chow

Aug  4 06:25:32 www2 kernel: RAID1 conf printout:
Aug  4 06:25:33 www2 kernel:  --- wd:1 rd:2 nd:3
Aug  4 06:25:33 www2 kernel:  disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb3
Aug  4 06:25:33 www2 kernel:  disk 2, s:1, o:0, n:2 rd:2 us:1 dev:sda3
Aug  4 06:25:33 www2 kernel:  disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 12, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 13, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 14, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 15, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 16, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 17, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 18, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 19, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 20, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 21, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 22, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 23, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 24, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 25, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel:  disk 26, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Aug  4 06:25:33 www2 kernel: md: updating md2 RAID superblock on device
Aug  4 06:25:33 www2 kernel: md: sda3 [events: 0000000f]<6>(write) 
sda3's sb offset: 3076352
Aug  4 06:25:33 www2 kernel: md: sdb3 [events: 0000000f]<6>(write) 
sdb3's sb offset: 3076352
Aug  4 06:25:33 www2 kernel: md: recovery thread got woken up ...
Aug  4 06:25:33 www2 kernel: md: recovery thread finished ...

[root@www2 root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md1 : active raid1 sdb2[1] sda2[0]
      1052160 blocks [2/2] [UU]
     
md2 : active raid1 sdb3[1]
      3076352 blocks [2/1] [_U]
     
md3 : active raid1 sdb5[1]
      1052160 blocks [2/1] [_U]
     
md4 : active raid1 sdb6[1]
      12635008 blocks [2/1] [_U]
     
unused devices: <none>
[root@www2 root]# raidhotadd /dev/md2 /dev/sda3
[root@www2 root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md1 : active raid1 sdb2[1] sda2[0]
      1052160 blocks [2/2] [UU]
     
md2 : active raid1 sda3[2] sdb3[1]
      3076352 blocks [2/1] [_U]
     
md3 : active raid1 sdb5[1]
      1052160 blocks [2/1] [_U]
     
md4 : active raid1 sdb6[1]
      12635008 blocks [2/1] [_U]
     
unused devices: <none>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-03 14:43 RAID-1 does not rebuild after hot-add David Chow
@ 2003-08-03 16:55 ` Stephen Lee
  2003-08-04 16:34   ` David Chow
  2003-08-04  0:38 ` Neil Brown
  2003-08-04 12:41 ` Andrew Rechenberg
  2 siblings, 1 reply; 14+ messages in thread
From: Stephen Lee @ 2003-08-03 16:55 UTC (permalink / raw)
  To: Raid

On Sun, 2003-08-03 at 07:43, David Chow wrote:
> Dear Neil,
> 
> A problem on hot adding a disk to an existing RAID array. I was 
> converting my root fs and other fs to md . While I am using the 
> failed-disk and move my data to the new degraded md device, after I hot 
> add a new disk to the md , it doesn't start rebuild. What it looks like 
> in the syslog is as follows (see belows). Looks like the recovery thread 
> dot woken up and finished right away... why? My kernel is 2.4.18-3smp 
> which is a RH7.3 vendor kernel. I'd experience on other 2.4.20 RH 
> kernels which had the same problem. My end up result is to use "mkraid 
> --force" to make it as a new array to enable the resync.  The 
> /proc/mdstat also looks wired which one drive is down "[_U]". In fact, 2 
> drives are actually healthy. I've trid mdadm -manage which produce the 
> same result. I've also tried to dd the partitions to all zero before add 
> but same result. Please give direction, as moving the root to somewhere 
> else and use mkraid to start with is really stupid (my opinion), 
> actually, I've no spare disk to that this time.
> 
> regards,
> David Chow
<snip>

> [root@www2 root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sdb2[1] sda2[0]
>       1052160 blocks [2/2] [UU]
>      
> md2 : active raid1 sdb3[1]
>       3076352 blocks [2/1] [_U]
>      
> md3 : active raid1 sdb5[1]
>       1052160 blocks [2/1] [_U]
>      
> md4 : active raid1 sdb6[1]
>       12635008 blocks [2/1] [_U]
<snip>

Did you try:
mdadm /dev/md2 -a /dev/sda3
mdadm /dev/md3 -a /dev/sda5
mdadm /dev/md4 -a /dev/sda6

If this doesn't work then what are the exact error messages?

Stephen


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-03 14:43 RAID-1 does not rebuild after hot-add David Chow
  2003-08-03 16:55 ` Stephen Lee
@ 2003-08-04  0:38 ` Neil Brown
  2003-08-04 16:41   ` David Chow
  2003-08-04 12:41 ` Andrew Rechenberg
  2 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2003-08-04  0:38 UTC (permalink / raw)
  To: David Chow; +Cc: linux-raid

On Sunday August 3, davidchow@shaolinmicro.com wrote:
> Dear Neil,
> 
> A problem on hot adding a disk to an existing RAID array. I was 
> converting my root fs and other fs to md . While I am using the 
> failed-disk and move my data to the new degraded md device, after I hot 
> add a new disk to the md , it doesn't start rebuild. What it looks like 
> in the syslog is as follows (see belows). Looks like the recovery thread 
> dot woken up and finished right away... why? My kernel is 2.4.18-3smp 
> which is a RH7.3 vendor kernel. I'd experience on other 2.4.20 RH 
> kernels which had the same problem. My end up result is to use "mkraid 
> --force" to make it as a new array to enable the resync.  The 
> /proc/mdstat also looks wired which one drive is down "[_U]". In fact, 2 
> drives are actually healthy. I've trid mdadm -manage which produce the 
> same result. I've also tried to dd the partitions to all zero before add 
> but same result. Please give direction, as moving the root to somewhere 
> else and use mkraid to start with is really stupid (my opinion), 
> actually, I've no spare disk to that this time.

I'm afraid I've got no idea what would be causing this.  
I can only suggest you try a plain 2.4.21 kernel and if the problem
persists we can add some extra printk's to find out what is happening.

NeilBrown

> 
> regards,
> David Chow
> 
> Aug  4 06:25:32 www2 kernel: RAID1 conf printout:
> Aug  4 06:25:33 www2 kernel:  --- wd:1 rd:2 nd:3
> Aug  4 06:25:33 www2 kernel:  disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb3
> Aug  4 06:25:33 www2 kernel:  disk 2, s:1, o:0, n:2 rd:2 us:1 dev:sda3
> Aug  4 06:25:33 www2 kernel:  disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 12, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 13, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 14, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 15, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 16, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 17, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 18, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 19, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 20, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 21, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 22, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 23, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 24, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 25, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 26, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel: md: updating md2 RAID superblock on device
> Aug  4 06:25:33 www2 kernel: md: sda3 [events: 0000000f]<6>(write) 
> sda3's sb offset: 3076352
> Aug  4 06:25:33 www2 kernel: md: sdb3 [events: 0000000f]<6>(write) 
> sdb3's sb offset: 3076352
> Aug  4 06:25:33 www2 kernel: md: recovery thread got woken up ...
> Aug  4 06:25:33 www2 kernel: md: recovery thread finished ...
> 
> [root@www2 root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sdb2[1] sda2[0]
>       1052160 blocks [2/2] [UU]
>      
> md2 : active raid1 sdb3[1]
>       3076352 blocks [2/1] [_U]
>      
> md3 : active raid1 sdb5[1]
>       1052160 blocks [2/1] [_U]
>      
> md4 : active raid1 sdb6[1]
>       12635008 blocks [2/1] [_U]
>      
> unused devices: <none>
> [root@www2 root]# raidhotadd /dev/md2 /dev/sda3
> [root@www2 root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sdb2[1] sda2[0]
>       1052160 blocks [2/2] [UU]
>      
> md2 : active raid1 sda3[2] sdb3[1]
>       3076352 blocks [2/1] [_U]
>      
> md3 : active raid1 sdb5[1]
>       1052160 blocks [2/1] [_U]
>      
> md4 : active raid1 sdb6[1]
>       12635008 blocks [2/1] [_U]
>      
> unused devices: <none>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-03 14:43 RAID-1 does not rebuild after hot-add David Chow
  2003-08-03 16:55 ` Stephen Lee
  2003-08-04  0:38 ` Neil Brown
@ 2003-08-04 12:41 ` Andrew Rechenberg
  2003-08-04 16:32   ` David Chow
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Rechenberg @ 2003-08-04 12:41 UTC (permalink / raw)
  To: David Chow; +Cc: linux-raid

I've seen this behavior occur on my RAID-1 arrays after using mdadm to
hotadd drives.  If I wait a little while the recovery thread does wake
up and starts the resync, but it doesn't occur immediately after the
hotadd.

I'm not sure if this type of behavior is expected, but I have seen it.

Running RH 2.4.18-27bigmem with /proc/mdstat seq_file, and megaraid2
patches



On Sun, 2003-08-03 at 10:43, David Chow wrote:
> Dear Neil,
> 
> A problem on hot adding a disk to an existing RAID array. I was 
> converting my root fs and other fs to md . While I am using the 
> failed-disk and move my data to the new degraded md device, after I hot 
> add a new disk to the md , it doesn't start rebuild. What it looks like 
> in the syslog is as follows (see belows). Looks like the recovery thread 
> dot woken up and finished right away... why? My kernel is 2.4.18-3smp 
> which is a RH7.3 vendor kernel. I'd experience on other 2.4.20 RH 
> kernels which had the same problem. My end up result is to use "mkraid 
> --force" to make it as a new array to enable the resync.  The 
> /proc/mdstat also looks wired which one drive is down "[_U]". In fact, 2 
> drives are actually healthy. I've trid mdadm -manage which produce the 
> same result. I've also tried to dd the partitions to all zero before add 
> but same result. Please give direction, as moving the root to somewhere 
> else and use mkraid to start with is really stupid (my opinion), 
> actually, I've no spare disk to that this time.
> 
> regards,
> David Chow
> 
> Aug  4 06:25:32 www2 kernel: RAID1 conf printout:
> Aug  4 06:25:33 www2 kernel:  --- wd:1 rd:2 nd:3
> Aug  4 06:25:33 www2 kernel:  disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb3
> Aug  4 06:25:33 www2 kernel:  disk 2, s:1, o:0, n:2 rd:2 us:1 dev:sda3
> Aug  4 06:25:33 www2 kernel:  disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 12, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 13, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 14, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 15, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 16, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 17, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 18, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 19, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 20, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 21, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 22, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 23, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 24, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 25, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel:  disk 26, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
> 00:00]
> Aug  4 06:25:33 www2 kernel: md: updating md2 RAID superblock on device
> Aug  4 06:25:33 www2 kernel: md: sda3 [events: 0000000f]<6>(write) 
> sda3's sb offset: 3076352
> Aug  4 06:25:33 www2 kernel: md: sdb3 [events: 0000000f]<6>(write) 
> sdb3's sb offset: 3076352
> Aug  4 06:25:33 www2 kernel: md: recovery thread got woken up ...
> Aug  4 06:25:33 www2 kernel: md: recovery thread finished ...
> 
> [root@www2 root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sdb2[1] sda2[0]
>       1052160 blocks [2/2] [UU]
>      
> md2 : active raid1 sdb3[1]
>       3076352 blocks [2/1] [_U]
>      
> md3 : active raid1 sdb5[1]
>       1052160 blocks [2/1] [_U]
>      
> md4 : active raid1 sdb6[1]
>       12635008 blocks [2/1] [_U]
>      
> unused devices: <none>
> [root@www2 root]# raidhotadd /dev/md2 /dev/sda3
> [root@www2 root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sdb2[1] sda2[0]
>       1052160 blocks [2/2] [UU]
>      
> md2 : active raid1 sda3[2] sdb3[1]
>       3076352 blocks [2/1] [_U]
>      
> md3 : active raid1 sdb5[1]
>       1052160 blocks [2/1] [_U]
>      
> md4 : active raid1 sdb6[1]
>       12635008 blocks [2/1] [_U]
>      
> unused devices: <none>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Andrew Rechenberg <arechenberg@shermfin.com>
Infrastructure Team, Sherman Financial Group

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-04 12:41 ` Andrew Rechenberg
@ 2003-08-04 16:32   ` David Chow
  0 siblings, 0 replies; 14+ messages in thread
From: David Chow @ 2003-08-04 16:32 UTC (permalink / raw)
  To: linux-raid

Andrew Rechenberg wrote:

>I've seen this behavior occur on my RAID-1 arrays after using mdadm to
>hotadd drives.  If I wait a little while the recovery thread does wake
>up and starts the resync, but it doesn't occur immediately after the
>hotadd.
>
>I'm not sure if this type of behavior is expected, but I have seen it.
>
>Running RH 2.4.18-27bigmem with /proc/mdstat seq_file, and megaraid2
>patches
>  
>
Yes thanks. My RAID hotadd since last night and still the same. Resync 
thread never starts.

regards,
David Chow



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-03 16:55 ` Stephen Lee
@ 2003-08-04 16:34   ` David Chow
  2003-08-04 17:06     ` Ross Vandegrift
  0 siblings, 1 reply; 14+ messages in thread
From: David Chow @ 2003-08-04 16:34 UTC (permalink / raw)
  To: Stephen Lee; +Cc: Raid

>
>
>>[root@www2 root]# cat /proc/mdstat
>>Personalities : [raid1]
>>read_ahead 1024 sectors
>>md0 : active raid1 sdb1[1] sda1[0]
>>      104320 blocks [2/2] [UU]
>>     
>>md1 : active raid1 sdb2[1] sda2[0]
>>      1052160 blocks [2/2] [UU]
>>     
>>md2 : active raid1 sdb3[1]
>>      3076352 blocks [2/1] [_U]
>>     
>>md3 : active raid1 sdb5[1]
>>      1052160 blocks [2/1] [_U]
>>     
>>md4 : active raid1 sdb6[1]
>>      12635008 blocks [2/1] [_U]
>>    
>>
><snip>
>
>Did you try:
>mdadm /dev/md2 -a /dev/sda3
>mdadm /dev/md3 -a /dev/sda5
>mdadm /dev/md4 -a /dev/sda6
>  
>
Already tried. I notice this problem happens in Redhat distributions 
since 7.3 to 9 . When attempting to hot add disks, it doesn't rebuild. I 
am wondering are there ioctls that I can call the md to start rebuild 
without having to wait for an auto rebuild action.

My previous experience is the only way to fix is to freshly mkraid the 
array. I guess something wrong with the super block.

If I can manually dd a superblock into the new disks, I am fool the md 
driver to make it look like already in the array. However, I think I 
have to edit the event count and reboot the machine. Please suggests 
directions.

regards,
David Chow

>If this doesn't work then what are the exact error messages?
>
>Stephen
>  
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-04  0:38 ` Neil Brown
@ 2003-08-04 16:41   ` David Chow
  2003-08-05  2:16     ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: David Chow @ 2003-08-04 16:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>
>
>I'm afraid I've got no idea what would be causing this.  
>I can only suggest you try a plain 2.4.21 kernel and if the problem
>persists we can add some extra printk's to find out what is happening.
>
>NeilBrown
>
Actually, I will try to recompile a plain vanilla kernel 2.4.21 and see 
what happens. However, it seems the problem exists if the raid is 
created by mkraid with one of the disk set to failed-disk . Then hot 
adding other disks to the degraded array will cause this behaviour. I 
deduce it is something wrong in the superblock because I can only make a 
normal RAID with no failed-disk using "mkraid --force" or mdadm which 
will of course resync right after the raid starts. Is there any chance 
to record any failed disk information in RAID superblocks (I mean 
recording failed-disk on the good disk's superblock)? I thought it 
doesn't make sense but it did happen and is repeatable (you can try if 
you want). This it the only thing to deal with because we can never keep 
the already started "good" disk superblock which is previously created 
in a degraded mode with failed-disk. Also, I've make sure other hot add 
partitions have already dd'ed to zeros. Maybe, I can hexdump a copy of 
the superblock for you to look at. What is the offset and size of the 
superblock of a RAID-1 device? I am sure this can effectively solve the 
problem right away.

regards,
David Chow


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-04 16:34   ` David Chow
@ 2003-08-04 17:06     ` Ross Vandegrift
  2003-08-04 17:46       ` Paul Clements
  0 siblings, 1 reply; 14+ messages in thread
From: Ross Vandegrift @ 2003-08-04 17:06 UTC (permalink / raw)
  To: David Chow; +Cc: Raid

On Tue, Aug 05, 2003 at 12:34:22AM +0800, David Chow wrote:
> >Did you try:
> >mdadm /dev/md2 -a /dev/sda3
> >mdadm /dev/md3 -a /dev/sda5
> >mdadm /dev/md4 -a /dev/sda6
> > 
> >
> Already tried. I notice this problem happens in Redhat distributions 
> since 7.3 to 9 . When attempting to hot add disks, it doesn't rebuild. I 
> am wondering are there ioctls that I can call the md to start rebuild 
> without having to wait for an auto rebuild action.

I wonder - have you accidently created a mirror and then added a
hot-spare?  I've run into this problem before when switching to mdadm,
and it confused me pretty well.

What's your /etc/raidtab look like?


-- 
Ross Vandegrift
ross@willow.seitz.com

A Pope has a Water Cannon.                               It is a Water Cannon.
He fires Holy-Water from it.                        It is a Holy-Water Cannon.
He Blesses it.                                 It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it.          It is a Wholly Holy Holy-Water Cannon.
He has it pierced.                It is a Holey Wholly Holy Holy-Water Cannon.
He makes it official.       It is a Canon Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive.                                       He shoots them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-04 17:06     ` Ross Vandegrift
@ 2003-08-04 17:46       ` Paul Clements
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Clements @ 2003-08-04 17:46 UTC (permalink / raw)
  To: David Chow; +Cc: Ross Vandegrift, Raid

Ross Vandegrift wrote:
 
> I wonder - have you accidently created a mirror and then added a
> hot-spare?  I've run into this problem before when switching to mdadm,
> and it confused me pretty well.

Hmm, good thought, but I don't think that would be it, since his mdstat
is showing 2 raid-disks for each array.

David, could you send the rest of your log from the time you create the
array until after the hot add? I'm wondering what's in the log when the
hot add is done. Are there errors?

Also, would it be possible to run "mdadm --examine /dev/sd[ab]3" and/or
run this:

perl -e '$io_num = 0x0913; open (FD, "/dev/md2") or die "open: $!";
ioctl(FD, $io_num, $null) or die "ioctl: $!";'

and send the log output, so we can see what's in the superblocks?

--
Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-04 16:41   ` David Chow
@ 2003-08-05  2:16     ` Neil Brown
  2003-08-05  6:32       ` David Chow
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2003-08-05  2:16 UTC (permalink / raw)
  To: David Chow; +Cc: linux-raid

On Tuesday August 5, davidchow@shaolinmicro.com wrote:
> >
> >
> >I'm afraid I've got no idea what would be causing this.  
> >I can only suggest you try a plain 2.4.21 kernel and if the problem
> >persists we can add some extra printk's to find out what is happening.
> >
> >NeilBrown
> >
> Actually, I will try to recompile a plain vanilla kernel 2.4.21 and see 
> what happens. However, it seems the problem exists if the raid is 
> created by mkraid with one of the disk set to failed-disk . Then hot 
> adding other disks to the degraded array will cause this behaviour. I 
> deduce it is something wrong in the superblock because I can only make a 
> normal RAID with no failed-disk using "mkraid --force" or mdadm which 
> will of course resync right after the raid starts. Is there any chance 
> to record any failed disk information in RAID superblocks (I mean 
> recording failed-disk on the good disk's superblock)? I thought it 
> doesn't make sense but it did happen and is repeatable (you can try if 
> you want). This it the only thing to deal with because we can never keep 
> the already started "good" disk superblock which is previously created 
> in a degraded mode with failed-disk. Also, I've make sure other hot add 
> partitions have already dd'ed to zeros. Maybe, I can hexdump a copy of 
> the superblock for you to look at. What is the offset and size of the 
> superblock of a RAID-1 device? I am sure this can effectively solve the 
> problem right away.
> 
> regards,
> David Chow

Rather than a hex dump, just use

  mdadm --examine /dev/XXX

That is the easiest way to view the superblock on the device.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-05  2:16     ` Neil Brown
@ 2003-08-05  6:32       ` David Chow
  2003-08-05  6:53         ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: David Chow @ 2003-08-05  6:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>
>
>Rather than a hex dump, just use
>
>  mdadm --examine /dev/XXX
>
>That is the easiest way to view the superblock on the device.
>
>NeilBrown
>
Neil,

OK, I find out there is a problem in the superblock which is descrbed as 
follows. The "Total Devices" is wrong. It should be 2 instead of 3. This 
is probably a bug of the mkraid (from raidtools) which when I created 
the array with a "failed-disk" in degraded mode. mkraid incorrectly made 
a wrong superblock even I put "nr-raid-disk 2" and "nr-spare-disks 0" in 
the raidtab. Ok, the superblock is incorrect, but how can I change the 
total devices and spare device pramaters without re-initialize the array?

David Chow

[root@www2 /]# mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 39042a54:7c2ca55a:939c1846:94f73fb8
  Creation Time : Sat Aug  2 21:00:17 2003
     Raid Level : raid1
    Device Size : 3076352 (2.93 GiB 3.15 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 2

    Update Time : Mon Aug  4 06:25:32 2003
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 578aa21 - correct
         Events : 0.15


      Number   Major   Minor   RaidDevice State
this     2       8        3        2        /dev/sda3
   0     0       0        0       -1      faulty
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8        3        2        /dev/sda3
[root@www2 /]# mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 39042a54:7c2ca55a:939c1846:94f73fb8
  Creation Time : Sat Aug  2 21:00:17 2003
     Raid Level : raid1
    Device Size : 3076352 (2.93 GiB 3.15 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 2

    Update Time : Mon Aug  4 06:25:32 2003
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 578aa35 - correct
         Events : 0.15


      Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3
   0     0       0        0       -1      faulty
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8        3        2        /dev/sda3



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-05  6:32       ` David Chow
@ 2003-08-05  6:53         ` Neil Brown
  2003-08-05 11:58           ` David Chow
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2003-08-05  6:53 UTC (permalink / raw)
  To: David Chow; +Cc: linux-raid

On Tuesday August 5, davidchow@shaolinmicro.com wrote:
> >
> >
> >Rather than a hex dump, just use
> >
> >  mdadm --examine /dev/XXX
> >
> >That is the easiest way to view the superblock on the device.
> >
> >NeilBrown
> >
> Neil,
> 
> OK, I find out there is a problem in the superblock which is descrbed as 
> follows. The "Total Devices" is wrong. It should be 2 instead of 3. This 
> is probably a bug of the mkraid (from raidtools) which when I created 
> the array with a "failed-disk" in degraded mode. mkraid incorrectly made 
> a wrong superblock even I put "nr-raid-disk 2" and "nr-spare-disks 0" in 
> the raidtab. Ok, the superblock is incorrect, but how can I change the 
> total devices and spare device pramaters without re-initialize the
> array?

It is actuallt "Active Devices" that is the problem.  As this is the
same as "Raid Devices", it doesn't bother doing a reconstruction.

The following patch adds --update=summaries to mdadm-1.3.0, which
updates the various summary fields in the superblock (Total, Active,
Working, Failed, Spare - Devices).
It is untested but should work.
If you patch mdadm-1.3.0 and compile it, then

  mdadm --assemble /dev/md2 --update=summaries /dev/sda3 /dev/sdb3

then it should update these fields and start the array (you might need
a --run as well).

Let me know how it goes.

NeilBrown


 ----------- Diffstat output ------------
 ./Assemble.c |   25 +++++++++++++++++++++++++
 ./ReadMe.c   |    4 ++--
 ./mdadm.c    |    4 +++-
 3 files changed, 30 insertions(+), 3 deletions(-)

diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~	2003-08-05 16:40:07.000000000 +1000
+++ ./Assemble.c	2003-08-05 16:49:00.000000000 +1000
@@ -292,6 +292,31 @@ int Assemble(char *mddev, int mdfd,
 					fprintf(stderr, Name ": updating superblock of %s with minor number %d\n",
 						devname, super.md_minor);
 			}
+			if (strcmp(update, "summaries") == 0) {
+				/* set nr_disks, active_disks, working_disks,
+				 * failed_disks, spare_disks based on disks[] 
+				 * array in superblock
+				 */
+				super.nr_disks = super.active_disks =
+					super.working_disks = super.failed_disks =
+					super.spare_disks = 0;
+				for (i=0; MD_SB_DISKS ; i++) 
+					if (super.disks[i].major ||
+					    super.disks[i].minor) {
+						int state = super.disks[i].state;
+						if (state & (1<<MD_DISK_REMOVED))
+							continue;
+						super.nr_disks++;
+						if (state & (1<<MD_DISK_ACTIVE))
+							super.active_disks++;
+						if (state & (1<<MD_DISK_FAULTY))
+							super.failed_disks++;
+						else
+							super.working_disks++;
+						if (state == 0)
+							super.spare_disks++;
+					}
+			}
 			super.sb_csum = calc_sb_csum(&super);
 			dfd = open(devname, O_RDWR, 0);
 			if (dfd < 0) 

diff ./ReadMe.c~current~ ./ReadMe.c
--- ./ReadMe.c~current~	2003-08-05 16:38:03.000000000 +1000
+++ ./ReadMe.c	2003-08-05 16:38:35.000000000 +1000
@@ -221,7 +221,7 @@ char OptionHelp[] =
 "  --config=     -c   : config file\n"
 "  --scan        -s   : scan config file for missing information\n"
 "  --force       -f   : Assemble the array even if some superblocks appear out-of-date\n"
-"  --update=     -U   : Update superblock: either sparc2.2 or super-minor\n"
+"  --update=     -U   : Update superblock: one of sparc2.2, super-minor or summaries\n"
 "\n"
 " For detail or examine:\n"
 "  --brief       -b   : Just print device name and UUID\n"
@@ -344,7 +344,7 @@ char Help_assemble[] =
 "                       for a full array are present\n"
 "  --force       -f   : Assemble the array even if some superblocks appear\n"
 "                     : out-of-date.  This involves modifying the superblocks.\n"
-"  --update=     -U   : Update superblock: either sparc2.2 or super-minor\n"
+"  --update=     -U   : Update superblock: one of sparc2.2, super-minor or summaries\n"
 ;
 
 char Help_manage[] =

diff ./mdadm.c~current~ ./mdadm.c
--- ./mdadm.c~current~	2003-08-05 16:39:52.000000000 +1000
+++ ./mdadm.c	2003-08-05 16:52:37.000000000 +1000
@@ -397,7 +397,9 @@ int main(int argc, char *argv[])
 			if (strcmp(update, "sparc2.2")==0) continue;
 			if (strcmp(update, "super-minor") == 0)
 				continue;
-			fprintf(stderr, Name ": '--update %s' invalid.  Only 'sparc2.2' or 'super-minor' supported\n",update);
+			if (strcmp(update, "summaries")==0)
+				continue;
+			fprintf(stderr, Name ": '--update %s' invalid.  Only 'sparc2.2', 'super-minor' or 'summaries' supported\n",update);
 			exit(2);
 
 		case O(ASSEMBLE,'c'): /* config file */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-05  6:53         ` Neil Brown
@ 2003-08-05 11:58           ` David Chow
  2003-08-06  1:01             ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: David Chow @ 2003-08-05 11:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>
>
>>
>>OK, I find out there is a problem in the superblock which is descrbed as 
>>follows. The "Total Devices" is wrong. It should be 2 instead of 3. This 
>>is probably a bug of the mkraid (from raidtools) which when I created 
>>the array with a "failed-disk" in degraded mode. mkraid incorrectly made 
>>a wrong superblock even I put "nr-raid-disk 2" and "nr-spare-disks 0" in 
>>the raidtab. Ok, the superblock is incorrect, but how can I change the 
>>total devices and spare device pramaters without re-initialize the
>>array?
>>    
>>
>
>It is actuallt "Active Devices" that is the problem.  As this is the
>same as "Raid Devices", it doesn't bother doing a reconstruction.
>
>The following patch adds --update=summaries to mdadm-1.3.0, which
>updates the various summary fields in the superblock (Total, Active,
>Working, Failed, Spare - Devices).
>It is untested but should work.
>If you patch mdadm-1.3.0 and compile it, then
>
>  mdadm --assemble /dev/md2 --update=summaries /dev/sda3 /dev/sdb3
>
>then it should update these fields and start the array (you might need
>a --run as well).
>
>Let me know how it goes.
>
>  
>
The point is my root is on  md2 . I can never reassemble the RAID 
devices. This is just the same of using a --force with mkraid of mdadm 
with reassemble . There seems no other way to update buggy superblocks 
online or even do something with reboot? Can I just modify the field and 
reset the machine after a "sync"?

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID-1 does not rebuild after hot-add
  2003-08-05 11:58           ` David Chow
@ 2003-08-06  1:01             ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2003-08-06  1:01 UTC (permalink / raw)
  To: David Chow; +Cc: linux-raid

On Tuesday August 5, davidchow@shaolinmicro.com wrote:
> The point is my root is on  md2 . I can never reassemble the RAID 
> devices. This is just the same of using a --force with mkraid of mdadm 
> with reassemble . There seems no other way to update buggy superblocks 
> online or even do something with reboot? Can I just modify the field and 
> reset the machine after a "sync"?

You could try, but I wouldn't recommend it.

You could upgrade to 2.6 which it probably get it right, but as it is
still -testing, you might not want to.

Best solution is to boot a rescue disk and do it from there.

Sorry I cannot be more helpful.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-08-06  1:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-03 14:43 RAID-1 does not rebuild after hot-add David Chow
2003-08-03 16:55 ` Stephen Lee
2003-08-04 16:34   ` David Chow
2003-08-04 17:06     ` Ross Vandegrift
2003-08-04 17:46       ` Paul Clements
2003-08-04  0:38 ` Neil Brown
2003-08-04 16:41   ` David Chow
2003-08-05  2:16     ` Neil Brown
2003-08-05  6:32       ` David Chow
2003-08-05  6:53         ` Neil Brown
2003-08-05 11:58           ` David Chow
2003-08-06  1:01             ` Neil Brown
2003-08-04 12:41 ` Andrew Rechenberg
2003-08-04 16:32   ` David Chow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.