disaster. raid1 drive failure rsync=DELAYED why?? please help

All of lore.kernel.org
 help / color / mirror / Atom feed

* disaster. raid1 drive failure rsync=DELAYED why?? please help
@ 2005-03-13  4:51 Mitchell Laks
  2005-03-13  9:49 ` David Greaves
  2005-03-13 15:49 ` David Greaves
  0 siblings, 2 replies; 10+ messages in thread
From: Mitchell Laks @ 2005-03-13  4:51 UTC (permalink / raw)
  To: linux-raid

Hi,
I have a remote system with a raid1 of a data disk. I got a call from  the 
person using the system that the application that writes to the data disk was 
not working.

system drive is /dev/hda with separte partitions / , /var, /home, /tmp.
data drive is linux software raid1 /dev/md0 with /dev/hdc1,  /dev/hde1. 

I logged in remotely and discovered that the /var partition was full because 
many write errors from /dev/hde1 in /var/log/syslog.

When I looked into cat /proc/mdstat i discovered that /dev/md0 was degraded  
because /dev/hdc1 had failed (there was an f there) and /dev/hde1 was 
carrying the load.

I shut down the applications in background. I emptied out /var/log/syslog. I 
then removed /dev/hdc1 from the array /dev/md0. 

I had another pair of drives on the system that was part of another mirrored 
array /dev/md1 with no useful information stored on them. 

/dev/md1  /dev/hdf1 /dev/hdh1 

I thought ok, let me detach /dev/hdf1 from the  other array /dev/md1  and try 
attach it to /dev/md0 and rebuild the array /dev/md0. That way i would rescue 
the data on the threatening drive /dev/hde1 which is spewing out error 
messages to my /var/log/syslog and threatening to die! 

So stupidly (probably), I did

mdadm /dev/md1  --fail /dev/hdf1 --remove /dev/hdf1

then i did 
mdadm /dev/md0 --add /dev/hdf1

Now when i did 
cat /proc/mdstat I see:

md0 : active raid1 hdf1[2] hde1[0]
      244195904 blocks [2/1] [U_]
        resync=DELAYED

I don't see any rebuilding action going on.

Did I have to do something like fdisk 
to the drive /dev/hdf1 before adding it to the array?? I didnt do anything to 
zero out the data that was on the disk (no data really just whatever was 
created on the disk when i made it part of an ext3 raid /dev/md1.  I had 
fdisked it a while ago as a linux raid type partition...

What do I do to rebuild the raid?
Thanks millions for your help!!
Mitchell

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13  4:51 disaster. raid1 drive failure rsync=DELAYED why?? please help Mitchell Laks
@ 2005-03-13  9:49 ` David Greaves
  2005-03-13 14:32   ` Mitchell Laks
  2005-03-13 15:49 ` David Greaves
  1 sibling, 1 reply; 10+ messages in thread
From: David Greaves @ 2005-03-13  9:49 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

kernel version, mdadm version?

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13  9:49 ` David Greaves
@ 2005-03-13 14:32   ` Mitchell Laks
  2005-03-13 15:23     ` David Greaves
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Laks @ 2005-03-13 14:32 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

On Sunday 13 March 2005 04:49 am, you wrote:
> kernel version, mdadm version?

debian sarge:

wustl@A1:~$ uname -a
Linux A1 2.6.8-1-386 #1 Thu Nov 25 04:24:08 UTC 2004 i686 GNU/Linux
wustl@A1:~$ export COLUMNS=200;dpkg -l|grep mdadm
ii  mdadm                                        1.8.1-1                       
wustl@A1:~$ apt-cache show mdadm
Package: mdadm
Priority: optional
Section: admin
Installed-Size: 252
Maintainer: Mario Joussen <joussen@debian.org>
Architecture: i386
Version: 1.8.1-1
Replaces: mdctl
Depends: libc6 (>= 2.3.2.ds1-4), makedev, debconf (>> 0.5)
Conflicts: mdctl (<< 0.7.2), raidtools2 (<< 1.00.3-12.1)
Filename: mdadm_1.8.1-1_i386.deb
Size: 104964
MD5sum: 0550c71ce7c24d77b93bac373cd98839
Description: Manage MD devices aka Linux Software Raid
 mdadm is a program that can be used to create, manage, and monitor MD
 devices.  As such it provides a similar set  of functionality  to the
 raidtools packages.
 .
 Unlike raidtools, mdadm can perform (almost) all of its functions
 without having a configuration file.

Thank you.
Mitchell Laks

>
> David
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13 14:32   ` Mitchell Laks
@ 2005-03-13 15:23     ` David Greaves
  0 siblings, 0 replies; 10+ messages in thread
From: David Greaves @ 2005-03-13 15:23 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

Mitchell Laks wrote:

>On Sunday 13 March 2005 04:49 am, you wrote:
>  
>
>>kernel version, mdadm version?
>>    
>>
>
>debian sarge:
>
>wustl@A1:~$ uname -a
>Linux A1 2.6.8-1-386 #1 Thu Nov 25 04:24:08 UTC 2004 i686 GNU/Linux
>wustl@A1:~$ export COLUMNS=200;dpkg -l|grep mdadm
>ii  mdadm                                        1.8.1-1   
>
Ah!
This is an 'experimental' verion of mdadm
I'll interrupt the answer to your other emails I'm writing for now...
upgrade to mdadm 1.9 immediately! (or down to 1.8.0 doesn't matter)

The debian maintainer should have read the release notes - bad mainter!!

This won't affect the resync=DELAYED; that's in the kernel

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13  4:51 disaster. raid1 drive failure rsync=DELAYED why?? please help Mitchell Laks
  2005-03-13  9:49 ` David Greaves
@ 2005-03-13 15:49 ` David Greaves
  2005-03-14  7:43   ` Mitchell Laks
  1 sibling, 1 reply; 10+ messages in thread
From: David Greaves @ 2005-03-13 15:49 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

Mitchell Laks wrote:

>Hi,
>I have a remote system with a raid1 of a data disk. I got a call from  the 
>person using the system that the application that writes to the data disk was 
>not working.
>
>system drive is /dev/hda with separte partitions / , /var, /home, /tmp.
>data drive is linux software raid1 /dev/md0 with /dev/hdc1,  /dev/hde1. 
>
>I logged in remotely and discovered that the /var partition was full because 
>many write errors from /dev/hde1 in /var/log/syslog.
>
>When I looked into cat /proc/mdstat i discovered that /dev/md0 was degraded  
>because /dev/hdc1 had failed (there was an f there) and /dev/hde1 was 
>carrying the load.
>
>I shut down the applications in background. I emptied out /var/log/syslog. I 
>then removed /dev/hdc1 from the array /dev/md0. 
>
>I had another pair of drives on the system that was part of another mirrored 
>array /dev/md1 with no useful information stored on them. 
>
>/dev/md1  /dev/hdf1 /dev/hdh1 
>
>I thought ok, let me detach /dev/hdf1 from the  other array /dev/md1  and try 
>attach it to /dev/md0 and rebuild the array /dev/md0. That way i would rescue 
>the data on the threatening drive /dev/hde1 which is spewing out error 
>messages to my /var/log/syslog and threatening to die! 
>
>So stupidly (probably), I did
>
>mdadm /dev/md1  --fail /dev/hdf1 --remove /dev/hdf1
>  
>
OK
what does mdadm --detail /dev/md1 show?

>then i did 
>mdadm /dev/md0 --add /dev/hdf1
>  
>
hmm - I don't know. I would have zeroed it :)

>Now when i did 
>cat /proc/mdstat I see:
>
>md0 : active raid1 hdf1[2] hde1[0]
>      244195904 blocks [2/1] [U_]
>        resync=DELAYED
>
>I don't see any rebuilding action going on.
>  
>
I see the full /proc/mdstat appears later...

 From the source (md.c)
    /* we overload curr_resync somewhat here.
     * 0 == not engaged in resync at all
     * 2 == checking that there is no conflict with another sync
     * 1 == like 2, but have yielded to allow conflicting resync to
     *        commense
     * other == active in resync - this many blocks
     *
     * Before starting a resync we must have set curr_resync to
     * 2, and then checked that every "conflicting" array has curr_resync
     * less than ours.  When we find one that is the same or higher
     * we wait on resync_wait.  To avoid deadlock, we reduce curr_resync
     * to 1 if we choose to yield (based arbitrarily on address of mddev 
structure).
     * This will mean we have to start checking from the beginning again.

you are in state 1 or 2.
hmmm


next email:

Mitchell Laks wrote:

>1) I tried to add the new spare device to /dev/md0 on friday afternoon.  It
>still has not rebuilt.
>
problem 1.

> I am also unable to do "ls" of the directory of the 
>drive.
>
problem 2 - this shouldn't be happening

>2) I had another idea. Why not umount the drive and then run fsck.ext3 on the 
>drive. Maybe it needs fsck? When I tried that I got the message:
>  
>
nope - rebuilding happens deep underneath the filesystem.

>A1:~# umount /home/big0
>umount: /home/big0: device is busy
>umount: /home/big0: device is busy
>
>(/dev/md0 is mounted on /home/big0).
>  
>
This just means that some process has a filehandle open on /home/big0
lsof + grep can help to find candidate processes

>A1:~# cat /proc/mdstat
>Personalities : [raid1]
>md0 : active raid1 hdi1[2] hdg1[0]
>      244195904 blocks [2/1] [U_]
>        resync=DELAYED
>md1 : active raid1 hdc1[1]
>      244195904 blocks [2/1] [_U]
>
>md2 : active raid1 hde1[1]
>      244195904 blocks [2/1] [_U]
>
>unused devices: 
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>

next email:

>I had some more bright ideas and here is what happened:
>
>I am unable to even do ls on the directory mounted on this raid device.
>
>So, I said, maybe the problem is that I need to run fsck.ext3 on the drive 
>first. So I tried to umount it and i got the error message:
>
>A1:~# umount /home/big0
>umount: /home/big0: device is busy
>umount: /home/big0: device is busy
>
>So I said maybe the problem is the rsyncing. So maybe an idea is to fail the 
>new added device /dev/hdi1  and then remove /dev/hdi1, move back to degraded 
>mode. Do an umount of the drive, then do an fsck.ext3 on the drive and then I 
>can do a reboot and then add the drive back in.
>
>Hey why not?
>  
>
'cos I can't figure out what's going on!

>Ok. So I tried: Here is the transcipt of the session:
>
>A1:~# cat /proc/mdstat
>Personalities : [raid1]
>md0 : active raid1 hdi1[2] hdg1[0]
>      244195904 blocks [2/1] [U_]
>        resync=DELAYED
>md1 : active raid1 hdc1[1]
>      244195904 blocks [2/1] [_U]
>
>md2 : active raid1 hde1[1]
>      244195904 blocks [2/1] [_U]
>
>unused devices: <none>
>A1:~# umount /home/big0
>umount: /home/big0: device is busy
>umount: /home/big0: device is busy
>A1:~# whoami
>root
>A1:~# mdadm /dev/md0 -fail /dev/hdi1 --remove /dev/hdi1
>mdadm: hot add failed for /dev/hdi1: Invalid argument
>
>A1:~# cat /proc/mdstat
>Personalities : [raid1]
>md0 : active raid1 hdi1[2] hdg1[0]
>      244195904 blocks [2/1] [U_]
>        resync=DELAYED
>md1 : active raid1 hdc1[1]
>      244195904 blocks [2/1] [_U]
>
>md2 : active raid1 hde1[1]
>      244195904 blocks [2/1] [_U]
>
>unused devices: <none>
>A1:~# mdadm --manage --set-faulty /dev/md0  /dev/hdi1
>mdadm: set /dev/hdi1 faulty in /dev/md0
>A1:~# mdadm --detail /dev/md0
>/dev/md0:
>        Version : 00.90.01
>  Creation Time : Wed Jan 12 14:19:21 2005
>     Raid Level : raid1
>     Array Size : 244195904 (232.88 GiB 250.06 GB)
>    Device Size : 244195904 (232.88 GiB 250.06 GB)
>   Raid Devices : 2
>  Total Devices : 2
>Preferred Minor : 0
>    Persistence : Superblock is persistent
>
>    Update Time : Sun Mar 13 01:28:06 2005
>          State : clean, degraded
> Active Devices : 1
>Working Devices : 1
> Failed Devices : 1
>  Spare Devices : 0
>
>           UUID : 6b8b4567:327b23c6:643c9869:66334873
>         Events : 0.343413
>
>    Number   Major   Minor   RaidDevice State
>       0      34        1        0      active sync   /dev/hdg1
>       1       0        0        -      removed
>
>       2      56        1        1      faulty   /dev/hdi1
>A1:~# mdadm /dev/md0 -r /dev/hdi1
>mdadm: hot remove failed for /dev/hdi1: Device or resource busy
>  
>
could this be mdadm 1.8.1 issue?? it seemed like the right thing to do.

>A1:~# cat /proc/mdstat
>Personalities : [raid1]
>md0 : active raid1 hdi1[2](F) hdg1[0]
>      244195904 blocks [2/1] [U_]
>        resync=DELAYED
>md1 : active raid1 hdc1[1]
>      244195904 blocks [2/1] [_U]
>
>md2 : active raid1 hde1[1]
>      244195904 blocks [2/1] [_U]
>
>unused devices: <none>
>A1:~# mdadm /dev/md0 -r /dev/hdi1
>mdadm: hot remove failed for /dev/hdi1: Device or resource busy
>A1:~#                                                                 
>
>Any ideas on what I can do now?
>  
>
upgrade mdadm and try the remove again.

next email:

>One more bit of information:
>
>this was a bit of info from 
>
>tail /var/log/kern.log
>
>Mar 11 04:42:11 A1 kernel:
>Mar 11 04:42:11 A1 kernel: hdg: drive not ready for command
>Mar 11 04:42:11 A1 kernel: raid1: hdg1: rescheduling sector 215908496
>Mar 11 04:42:11 A1 kernel: raid1: hdg1: redirecting sector 215908496 to 
>anotherr
>Mar 11 04:42:11 A1 kernel: hdg: status error: status=0x58 { DriveReady 
>SeekComp}
>Mar 11 04:42:11 A1 kernel:
>Mar 11 04:42:11 A1 kernel: hdg: drive not ready for command
>Mar 11 04:42:11 A1 kernel: raid1: hdg1: rescheduling sector 215908496
>Mar 11 04:42:11 A1 kernel: raid1: hdg1: redirecting sector 215908496 to 
>
>but that all was from Mar11 and today is Mar13....
>  
>
well, it may explain why things went bad.


I think you need to:
* upgrade mdadm.
* Then cat /proc/mdstat
* then mdadm --detail on all md devices

Then note what md devices are 'important'

Also:
what does mount say?
is the filessytem on /dev/md0 useable (it should be fine)

Is the box safe to reboot?

when you reply to my inline questions, remove all the context to trim 
the mail right down :)

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13 15:49 ` David Greaves
@ 2005-03-14  7:43   ` Mitchell Laks
  2005-03-14  9:49     ` David Greaves
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Laks @ 2005-03-14  7:43 UTC (permalink / raw)
  To: linux-raid

On Sunday 13 March 2005 10:49 am, David Greave wrote: Many Helpful remarks:
> 
David I am grateful that you were there for me.
I went to the site. I connected a monitor to the headless machine. I saw the 
screen flooded with write errors to the spare drive in the original raid1.

The terminals were flooded on tty1-6. I had to log in remotely  on network. I 
tried to shutdown -h now. I tried "init 1" as root. No go. I had to hard 
power down the machine. Debian boot dropped me into single user mode on 
bootup and I took /dev/md0 out of the /etc/fstab to get the machine to boot 
past the fsck on boot.

I looked through /var/log/messages.0. I found that on last wednesday at 10am 
drive /dev/hda1 failed. The paired drive (it actually was /dev/hdg1)  in the 
raid1 began to issue bad kernel messages immediately. These filled the /var 
directory to 100% and seemed to have caused all the bad  behavior we saw.

I had noticed that /var was full initially and made room by cutting out much 
of /var/log/messages.

I likely could not successfully run "shutdown -h now" because the /var 
partition likely needed some kind of fsck or something to deal with having 
been filled and the many many processes that had been writing to it were very 
"angry" and in a "messy state". They needed a powerdown. (Very m$ftlike).

I am not sure I mentioned here, (but  I discussed on another mailing list :)), 
that my main application on the server is a database backed application 
running off of a postgresql backend. 

Postgresql was also put into a weird state by this incident - not because 
there was anything wrong with it. Just because filling /var partition caused 
multiple effects, cause postgresql databases live in /var/lib/postgres. I 
could not run pg_ctl stop or pg_ctl stop -m fast. Only pg_ctl stop -m 
immediate. Luckily I was able to rescue the database.

My assessment (correct me if I am wrong) is that I have to rethink my 
architecture. As I continue to work with software raid, I likely will have to 
move the postgresql database to a separate partition, so I will not have 
mixing of points of failure.  

I took out the 2 drives /dev/hda1 and /dev/hdg1 from the machine. I restored 
my systems from the most recent backup, with only a few days worth of suspect 
data (wed/thur/friday ...). I replaced with new hard drives. Its good to have 
duplicate servers and raids. Both are neccesary I see.

I will play with /dev/hdg1 a little on a different machine to see what it 
behaves like. I suspect with all those errors it is really dead too.

I just had bad luck. A double disk failure.

Thank you David again!

Mitchell

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-14  7:43   ` Mitchell Laks
@ 2005-03-14  9:49     ` David Greaves
  0 siblings, 0 replies; 10+ messages in thread
From: David Greaves @ 2005-03-14  9:49 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

Mitchell Laks wrote:

>On Sunday 13 March 2005 10:49 am, David Greave wrote: Many Helpful remarks:
>  
>
>David I am grateful that you were there for me.
>  
>
No probs - we've all been there!

>My assessment (correct me if I am wrong) is that I have to rethink my 
>architecture. As I continue to work with software raid, I likely will have to 
>move the postgresql database to a separate partition, so I will not have 
>mixing of points of failure.  
>
Well, once things are calmer, post your layout and new thinking and I'm 
sure people will input.
Amongst other things, mdadm can allow you to keep 1 or more hot spares 
in a system that you can 'share' between multiple raid1 mirrors.
This kind of trick (learnt by hanging out here) may be the answer to 
muliple failures.

David
PS don't forget the mdadm upgrade.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
@ 2005-03-13  7:22 Mitchell Laks
  0 siblings, 0 replies; 10+ messages in thread
From: Mitchell Laks @ 2005-03-13  7:22 UTC (permalink / raw)
  To: linux-raid

One more bit of information:

this was a bit of info from 

tail /var/log/kern.log

Mar 11 04:42:11 A1 kernel:
Mar 11 04:42:11 A1 kernel: hdg: drive not ready for command
Mar 11 04:42:11 A1 kernel: raid1: hdg1: rescheduling sector 215908496
Mar 11 04:42:11 A1 kernel: raid1: hdg1: redirecting sector 215908496 to 
anotherr
Mar 11 04:42:11 A1 kernel: hdg: status error: status=0x58 { DriveReady 
SeekComp}
Mar 11 04:42:11 A1 kernel:
Mar 11 04:42:11 A1 kernel: hdg: drive not ready for command
Mar 11 04:42:11 A1 kernel: raid1: hdg1: rescheduling sector 215908496
Mar 11 04:42:11 A1 kernel: raid1: hdg1: redirecting sector 215908496 to 

but that all was from Mar11 and today is Mar13....

thanks

Mitchell

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
  2005-03-13  6:23 Mitchell Laks
@ 2005-03-13  6:45 ` Mitchell Laks
  0 siblings, 0 replies; 10+ messages in thread
From: Mitchell Laks @ 2005-03-13  6:45 UTC (permalink / raw)
  To: linux-raid

I had some more bright ideas and here is what happened:

I am unable to even do ls on the directory mounted on this raid device.

So, I said, maybe the problem is that I need to run fsck.ext3 on the drive 
first. So I tried to umount it and i got the error message:

A1:~# umount /home/big0
umount: /home/big0: device is busy
umount: /home/big0: device is busy

So I said maybe the problem is the rsyncing. So maybe an idea is to fail the 
new added device /dev/hdi1  and then remove /dev/hdi1, move back to degraded 
mode. Do an umount of the drive, then do an fsck.ext3 on the drive and then I 
can do a reboot and then add the drive back in.

Hey why not?

Ok. So I tried: Here is the transcipt of the session:

A1:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hdi1[2] hdg1[0]
      244195904 blocks [2/1] [U_]
        resync=DELAYED
md1 : active raid1 hdc1[1]
      244195904 blocks [2/1] [_U]

md2 : active raid1 hde1[1]
      244195904 blocks [2/1] [_U]

unused devices: <none>
A1:~# umount /home/big0
umount: /home/big0: device is busy
umount: /home/big0: device is busy
A1:~# whoami
root
A1:~# mdadm /dev/md0 -fail /dev/hdi1 --remove /dev/hdi1
mdadm: hot add failed for /dev/hdi1: Invalid argument

A1:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hdi1[2] hdg1[0]
      244195904 blocks [2/1] [U_]
        resync=DELAYED
md1 : active raid1 hdc1[1]
      244195904 blocks [2/1] [_U]

md2 : active raid1 hde1[1]
      244195904 blocks [2/1] [_U]

unused devices: <none>
A1:~# mdadm --manage --set-faulty /dev/md0  /dev/hdi1
mdadm: set /dev/hdi1 faulty in /dev/md0
A1:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Jan 12 14:19:21 2005
     Raid Level : raid1
     Array Size : 244195904 (232.88 GiB 250.06 GB)
    Device Size : 244195904 (232.88 GiB 250.06 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Mar 13 01:28:06 2005
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : 6b8b4567:327b23c6:643c9869:66334873
         Events : 0.343413

    Number   Major   Minor   RaidDevice State
       0      34        1        0      active sync   /dev/hdg1
       1       0        0        -      removed

       2      56        1        1      faulty   /dev/hdi1
A1:~# mdadm /dev/md0 -r /dev/hdi1
mdadm: hot remove failed for /dev/hdi1: Device or resource busy
A1:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hdi1[2](F) hdg1[0]
      244195904 blocks [2/1] [U_]
        resync=DELAYED
md1 : active raid1 hdc1[1]
      244195904 blocks [2/1] [_U]

md2 : active raid1 hde1[1]
      244195904 blocks [2/1] [_U]

unused devices: <none>
A1:~# mdadm /dev/md0 -r /dev/hdi1
mdadm: hot remove failed for /dev/hdi1: Device or resource busy
A1:~#                                                                 

Any ideas on what I can do now?

thanks
Mitchell

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
@ 2005-03-13  6:23 Mitchell Laks
  2005-03-13  6:45 ` Mitchell Laks
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Laks @ 2005-03-13  6:23 UTC (permalink / raw)
  To: linux-raid

Hi: some additional information

1) I tried to add the new spare device to /dev/md0 on friday afternoon.  It
still has not rebuilt. I am also unable to do "ls" of the directory of the 
drive.
2) I had another idea. Why not umount the drive and then run fsck.ext3 on the 
drive. Maybe it needs fsck? When I tried that I got the message:

A1:~# umount /home/big0
umount: /home/big0: device is busy
umount: /home/big0: device is busy

(/dev/md0 is mounted on /home/big0).

Here is the output from mdadm --detail.

Note: I accidentally renamed the drives in the original posting (sorry..). 
This is the 'rebuilt' setup. It has been 2 days of time for rebuilding with 
no change. What do I do to restart the rebuilding???

A1: mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Jan 12 14:19:21 2005
     Raid Level : raid1
     Array Size : 244195904 (232.88 GiB 250.06 GB)
    Device Size : 244195904 (232.88 GiB 250.06 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Mar 11 11:40:23 2005
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

           UUID : 6b8b4567:327b23c6:643c9869:66334873
         Events : 0.343412

    Number   Major   Minor   RaidDevice State
       0      34        1        0      active sync   /dev/hdg1
       1       0        0        -      removed

       2      56        1        1      spare rebuilding   /dev/hdi1
A1:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hdi1[2] hdg1[0]
      244195904 blocks [2/1] [U_]
        resync=DELAYED
md1 : active raid1 hdc1[1]
      244195904 blocks [2/1] [_U]

md2 : active raid1 hde1[1]
      244195904 blocks [2/1] [_U]

unused devices: 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-03-14  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-13  4:51 disaster. raid1 drive failure rsync=DELAYED why?? please help Mitchell Laks
2005-03-13  9:49 ` David Greaves
2005-03-13 14:32   ` Mitchell Laks
2005-03-13 15:23     ` David Greaves
2005-03-13 15:49 ` David Greaves
2005-03-14  7:43   ` Mitchell Laks
2005-03-14  9:49     ` David Greaves
2005-03-13  6:23 Mitchell Laks
2005-03-13  6:45 ` Mitchell Laks
2005-03-13  7:22 Mitchell Laks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.