All of lore.kernel.org
 help / color / mirror / Atom feed
* Server down-fail​ed RAID5-asking for some assistance
@ 2011-04-21 18:29 John Valarti
  2011-04-21 19:59 ` David Brown
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-21 18:29 UTC (permalink / raw)
  To: linux-raid

Hi there.
Please pardon my lack of experience and expertise here, as this is my
first time posting.

Where I work there is a fairly old fileserver.
It is running CentOS 4, kernel 2.6.9-100EL
Recently it failed and it tries to boot, but fails part way with:
RAID5: not enough operational device for md1 (2/4 failed).

This machine has data for a number of users, and, of course it seems
the backup has not been roperly done for a few months ( responsible
staff member left).
I am in the position of being teh only likely person with a chance of
recovering the data for a few users on this machine.
And I am certainly NOT an expert!

So, here is what I have done so far:
On further inspection, I disconnected the drives out one at a time and
determined which 2 are "failed".
I pulled those out, and on another machine ran Seagate Seatest for
Linux to test them.
They both came out as healthy, although one apparently has a lot of
uncommited bad sectors, or so the disk tool on a Fedora14 mchine tells
me.
I looked and see the layout is each of the 4 disks present have 2 partitions.
After testing I was able to see the partitions on each disk with fdisk.
I did not try to mount as these are simply RAID members, and I know
there is no complete filesystem to mount on any single drive here.

First partiton on each drive is small,  /boot, and it seems to be
RAID1 on all 4 drives.
Those are healthy enough to get partially into a boot.

The machine still boots to the point of trying to get access to / and
then kernel panics.
The / and other parts are on a RAID5 made from the second partiton of
the 4 disks.

I have returned all 4 disks to the machine, and using CentOS
install/recovery media, have teh machine up
in rescue mode.
At this point I believe that I need to rebuild the RAID5.

I understand that I probably only get one chance to do this right, so
I write here today
to beg some help with this.
 I do not lose other peoples data,

Can anyone make me a suggestion?


Thaks in advance for any help !



John Valarti - under a lot of pressure..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-21 18:29 Server down-fail​ed RAID5-asking for some assistance John Valarti
@ 2011-04-21 19:59 ` David Brown
       [not found]   ` <BANLkTim18Sx6JdZO5PiAqnrakDPzy5PNJQ@mail.gmail.com>
  0 siblings, 1 reply; 22+ messages in thread
From: David Brown @ 2011-04-21 19:59 UTC (permalink / raw)
  To: linux-raid

On 21/04/11 20:29, John Valarti wrote:
> Hi there.
> Please pardon my lack of experience and expertise here, as this is my
> first time posting.
>
> Where I work there is a fairly old fileserver.
> It is running CentOS 4, kernel 2.6.9-100EL
> Recently it failed and it tries to boot, but fails part way with:
> RAID5: not enough operational device for md1 (2/4 failed).
>
> This machine has data for a number of users, and, of course it seems
> the backup has not been roperly done for a few months ( responsible
> staff member left).
> I am in the position of being teh only likely person with a chance of
> recovering the data for a few users on this machine.
> And I am certainly NOT an expert!
>
> So, here is what I have done so far:
> On further inspection, I disconnected the drives out one at a time and
> determined which 2 are "failed".
> I pulled those out, and on another machine ran Seagate Seatest for
> Linux to test them.
> They both came out as healthy, although one apparently has a lot of
> uncommited bad sectors, or so the disk tool on a Fedora14 mchine tells
> me.
> I looked and see the layout is each of the 4 disks present have 2 partitions.
> After testing I was able to see the partitions on each disk with fdisk.
> I did not try to mount as these are simply RAID members, and I know
> there is no complete filesystem to mount on any single drive here.
>
> First partiton on each drive is small,  /boot, and it seems to be
> RAID1 on all 4 drives.
> Those are healthy enough to get partially into a boot.
>
> The machine still boots to the point of trying to get access to / and
> then kernel panics.
> The / and other parts are on a RAID5 made from the second partiton of
> the 4 disks.
>
> I have returned all 4 disks to the machine, and using CentOS
> install/recovery media, have teh machine up
> in rescue mode.
> At this point I believe that I need to rebuild the RAID5.
>
> I understand that I probably only get one chance to do this right, so
> I write here today
> to beg some help with this.
>   I do not lose other peoples data,
>
> Can anyone make me a suggestion?
>
>
> Thaks in advance for any help !
>

My first thought would be to get /all/ the disks, not just the "failed" 
ones, out of the machine.  You want to make full images of them (with 
ddrescue or something similar) to files on another disk, and then work 
with those images.  Don't touch the original disks - you will very 
quickly lose any chance you have of recovering your data.  But once 
you've got the images, you can copy them and try out recovery strategies 
- all it costs is some disk space and some time, and you've no risk of 
making things worse.

Once you've got some (hopefully most) of your data recovered from the 
images, buy four /new/ disks to put in the machine, and work on your 
restore.  You don't want to reuse the failing disks, and probably the 
other two equally old and worn disks will be high risk too.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re:Server down-fail​ed RAID5-asking for some assistance
       [not found]   ` <BANLkTim18Sx6JdZO5PiAqnrakDPzy5PNJQ@mail.gmail.com>
@ 2011-04-22  2:32     ` John Valarti
  2011-04-22  2:57       ` Server " NeilBrown
  2011-04-22 11:19       ` David Brown
  0 siblings, 2 replies; 22+ messages in thread
From: John Valarti @ 2011-04-22  2:32 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 21, 2011 at 1:59 PM, David Brown <david.brown@hesbynett.no> wrote:
.
> My first thought would be to get /all/ the disks, not just the "failed"
> ones, out of the machine.  You want to make full images of them (with
> ddrescue or something similar) to files on another disk, and then work with
> those images.  ..
> Once you've got some (hopefully most) of your data recovered from the
> images, buy four /new/ disks to put in the machine, and work on your
> restore.  You don't want to reuse the failing disks, and probably the other
> two equally old and worn disks will be high risk too.

OK, I think I understand.
Does that mean I need to buy 8 disks, all the same size or bigger?
The originals are 250GB SATA so that should be OK, I guess.

I read some more and found out I should run mdadm --examine.

Should I not be able to just add the one disk partition sdc2 back to the RAID?


Here is the result of --examine

/dev/sda2:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : ddf4d448:36afa319:f0917855:03f8bbe8
 Creation Time : Mon May 15 16:38:05 2006
    Raid Level : raid5
 Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
    Array Size : 734925312 (700.88 GiB 752.56 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 1

   Update Time : Mon Apr 18 07:48:54 2011
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 5674ce60 - correct
        Events : 28580020

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

  0     0       8        2        0      active sync   /dev/sda2
  1     1       8       18        1      active sync   /dev/sdb2
  2     2       8       34        2      active sync   /dev/sdc2
  3     3       0        0        3      faulty removed
/dev/sdb2:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : ddf4d448:36afa319:f0917855:03f8bbe8
 Creation Time : Mon May 15 16:38:05 2006
    Raid Level : raid5
 Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
    Array Size : 734925312 (700.88 GiB 752.56 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 1

   Update Time : Sun Oct 18 10:04:06 2009
         State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
 Spare Devices : 0
      Checksum : 5171dcb2 - correct
        Events : 20333614

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     3       8       50        3      active sync   /dev/sdd2

  0     0       8        2        0      active sync   /dev/sda2
  1     1       8       18        1      active sync   /dev/sdb2
  2     2       8       34        2      active sync   /dev/sdc2
  3     3       8       50        3      active sync   /dev/sdd2
/dev/sdc2:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : ddf4d448:36afa319:f0917855:03f8bbe8
 Creation Time : Mon May 15 16:38:05 2006
    Raid Level : raid5
 Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
    Array Size : 734925312 (700.88 GiB 752.56 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 1

   Update Time : Mon Apr 18 07:48:51 2011
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 5674ce6b - correct
        Events : 28580018

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2

  0     0       8        2        0      active sync   /dev/sda2
  1     1       8       18        1      active sync   /dev/sdb2
  2     2       8       34        2      active sync   /dev/sdc2
  3     3       0        0        3      faulty removed
/dev/sdd2:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : ddf4d448:36afa319:f0917855:03f8bbe8
 Creation Time : Mon May 15 16:38:05 2006
    Raid Level : raid5
 Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
    Array Size : 734925312 (700.88 GiB 752.56 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 1

   Update Time : Mon Apr 18 07:48:54 2011
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 5674ce4e - correct
        Events : 28580020

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

  0     0       8        2        0      active sync   /dev/sda2
  1     1       8       18        1      active sync   /dev/sdb2
  2     2       8       34        2      active sync   /dev/sdc2
  3     3       0        0        3      faulty removed
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-22  2:32     ` John Valarti
@ 2011-04-22  2:57       ` NeilBrown
  2011-04-22  3:31         ` John Valarti
       [not found]         ` <BANLkTin0SoBzRAear8Jt+26MnVJWouXoNA@mail.gmail.com>
  2011-04-22 11:19       ` David Brown
  1 sibling, 2 replies; 22+ messages in thread
From: NeilBrown @ 2011-04-22  2:57 UTC (permalink / raw)
  To: John Valarti; +Cc: linux-raid

On Thu, 21 Apr 2011 20:32:57 -0600 John Valarti <mdadmuser@gmail.com> wrote:

> On Thu, Apr 21, 2011 at 1:59 PM, David Brown <david.brown@hesbynett.no> wrote:
> .
> > My first thought would be to get /all/ the disks, not just the "failed"
> > ones, out of the machine.  You want to make full images of them (with
> > ddrescue or something similar) to files on another disk, and then work with
> > those images.  ..
> > Once you've got some (hopefully most) of your data recovered from the
> > images, buy four /new/ disks to put in the machine, and work on your
> > restore.  You don't want to reuse the failing disks, and probably the other
> > two equally old and worn disks will be high risk too.
> 
> OK, I think I understand.
> Does that mean I need to buy 8 disks, all the same size or bigger?
> The originals are 250GB SATA so that should be OK, I guess.
> 
> I read some more and found out I should run mdadm --examine.
> 
> Should I not be able to just add the one disk partition sdc2 back to the RAID?

Possibly.

It looks like sdb2 failed in October 2009 !!!! and nobody noticed.  So your
array has been running degraded since then.

If you

 mdadm -A /dev/md1 --force /dev/sd[acd]2

Then you will have your array back, though there could be a small amount of
data corruption if the array was in the middle of writing when the system
crashed/died/lost-power/whatever-happened.

This will give you access to your data.
How much you trust your drives to continue to give access to your data is up
to you.  But you would be wise to at least by a 1TB drive to copy all the
data on to before you put too much stress on your old drives.

Once you have a safe copy, you could

 mdadm /dev/md1 --add /dev/sdb2

This will add sdb2 to the array and it will recovery the data for sdb2 from
the data and parity on the other drives.  If this works - great.  However
there is a reasonable chance you will hit a read error in which case the
recovery will abort and you will still have your data on the degraded array.

You could possibly run some bad-blocks test on each drive (which will be
destructive - but you  have a backup on the 1TB drive) and decide if you want
to throw them out or keep using them.


What ever you do, once you have a work array again what you feel happy to
trust, make sure a 'check' run happens regularly.  Some distros provide a
cron job to do this for you.  It involves simply
   echo check > /sys/block/md0/md/sync_action

This will read every block on every device to make sure there are no sleeping
bad blocks.  Every month is probably a reasonable frequency to run it.

Also run "mdadm --monitor" configured to send you email if there is a drive
failure.  Also run "mdadm --monitor --oneshot" from a cron tab every day so
that if you have a degraded array it will nag you about it every day.

Good luck,
NeilBrown

> 
> 
> Here is the result of --examine
> 
> /dev/sda2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:54 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce60 - correct
>         Events : 28580020
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     1       8       18        1      active sync   /dev/sdb2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> /dev/sdb2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 4
> Preferred Minor : 1
> 
>    Update Time : Sun Oct 18 10:04:06 2009
>          State : active
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
>  Spare Devices : 0
>       Checksum : 5171dcb2 - correct
>         Events : 20333614
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     3       8       50        3      active sync   /dev/sdd2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       8       50        3      active sync   /dev/sdd2
> /dev/sdc2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:51 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce6b - correct
>         Events : 28580018
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     2       8       34        2      active sync   /dev/sdc2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> /dev/sdd2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:54 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce4e - correct
>         Events : 28580020
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     0       8        2        0      active sync   /dev/sda2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-22  2:57       ` Server " NeilBrown
@ 2011-04-22  3:31         ` John Valarti
       [not found]         ` <BANLkTin0SoBzRAear8Jt+26MnVJWouXoNA@mail.gmail.com>
  1 sibling, 0 replies; 22+ messages in thread
From: John Valarti @ 2011-04-22  3:31 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Thu, Apr 21, 2011 at 8:57 PM, NeilBrown <neilb@suse.de> wrote:
..
> Possibly.
>
> It looks like sdb2 failed in October 2009 !!!! and nobody noticed.  So your
> array has been running degraded since then.

Hmm, I would say "Oops" but I guess I am happy it was not my job to watch this.
> If you
>
>  mdadm -A /dev/md1 --force /dev/sd[acd]2
>
> Then you will have your array back,..

I will be grabbing a set of 4 new drives, plus a spare tomorrow morning.

Thanks VERY much for this!

> Also run "mdadm --monitor" configured to send you email if there is a drive
> failure.

I will do some more reading to figure out how to do that I guess.
Good to know that is possible.

>  Also run "mdadm --monitor --oneshot" from a cron tab every day so
> that if you have a degraded array it will nag you about it every day.

More good stuff.
Is there a reasonably good book that one can read to learn this stuff?
I read the man pages and my head is still fuzzy.

> Good luck,
> NeilBrown

Thank you, and much obliged!
I **hope** I will not have to bug you any more about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-22  2:32     ` John Valarti
  2011-04-22  2:57       ` Server " NeilBrown
@ 2011-04-22 11:19       ` David Brown
  1 sibling, 0 replies; 22+ messages in thread
From: David Brown @ 2011-04-22 11:19 UTC (permalink / raw)
  To: linux-raid

On 22/04/11 04:32, John Valarti wrote:
> On Thu, Apr 21, 2011 at 1:59 PM, David Brown<david.brown@hesbynett.no>  wrote:
> .
>> My first thought would be to get /all/ the disks, not just the "failed"
>> ones, out of the machine.  You want to make full images of them (with
>> ddrescue or something similar) to files on another disk, and then work with
>> those images.  ..
>> Once you've got some (hopefully most) of your data recovered from the
>> images, buy four /new/ disks to put in the machine, and work on your
>> restore.  You don't want to reuse the failing disks, and probably the other
>> two equally old and worn disks will be high risk too.
>
> OK, I think I understand.
> Does that mean I need to buy 8 disks, all the same size or bigger?
> The originals are 250GB SATA so that should be OK, I guess.
>

The way I would handle this is to get a couple of big disks (2 TB). 
They can be external USB drives if that's the most convenient (I have a 
nice hot-plug USB/eSATA enclosure that I find handy for messing about 
with temporary disks).  Put an ext4 (or xfs if you like) system on these 
disks.

Note that none of this need be done on the original computer - use 
whatever is convenient.  And if you already have lots of temporary disk 
space, you don't need to buy new disks yet.

Make images of your original disks - i.e., copy the whole 250 GB disk 
into a file on your big disk, so that you have four 250 GB files 
"originalA.image", "originalB.image", etc.  You can probably forget 
about the oldest dead disk - if it's been dead since 2009 there is 
little chance of it being useful.

Those "original" files are your safety copies - keep them, so that you 
can always get back to where you started without stressing the original 
disks any more.

Then copy those files to new files "diskA.image", etc.  Attach these to 
loop devices ("losetup /dev/loop1 diskA.image", etc.).  Then use these 
loop devices as devices for re-assembling your raid.  I'm not going to 
make any suggestions for that part - Neil is the expert.

The point is, if you mess up you can simple go back a couple of steps 
and re-copy your "original" image files and try again.  You loose 
nothing but a bit of time.

Once you have a re-assembled raid that looks like it contains your data, 
you can work on the restore process.

Restore is done by buying 4 new disks for the original server, setting 
them up as a new raid5, and copying the data over.  It can be very 
convenient to use something like a system rescue cd during this 
operation, so that you are not trying to run from the disks while doing 
the restore.

Once you are done, you will want to check for missing data or file 
system corruption.


mvh.,

David


> I read some more and found out I should run mdadm --examine.
>
> Should I not be able to just add the one disk partition sdc2 back to the RAID?
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
       [not found]                   ` <20110424075101.6763309f@notabene.brown>
@ 2011-04-24  0:07                     ` John Valarti
  2011-04-24  0:37                       ` John Robinson
  2011-04-24  2:54                       ` NeilBrown
  0 siblings, 2 replies; 22+ messages in thread
From: John Valarti @ 2011-04-24  0:07 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Sat, Apr 23, 2011 at 3:51 PM, NeilBrown <neilb@suse.de> wrote:
> On Sat, 23 Apr 2011 10:19:31 -0600 John Valarti <mdadmuser@gmail.com> wrote:
>
>> On Sat, Apr 23, 2011 at 2:48 AM, NeilBrown <neilb@suse.de> wrote:
>> >> No luck
>> >> Same thing:
>> >> "no devices found.."
>> >
>> > I'm sure it said more than just that.  Complete error messages really are
>> > helpful..
>> >
>> > But the implication seems to be that /dev/sd[acd] don't exist... That is
>> > weird.
>> > What does "cat /proc/partitions" show?
>> > If e.g.  sda is there but sda2 is not, does "blockdev --rereadpt /dev/sda"
>> > help?
>> >
>> > NeilBrown
>>
>> They exist.
>> I have physically removed the dead drive, so now the 3 remaining ones
>> are at sd[abc]
>> I have a usb thumb drive plugged in to capture outputs as needed, it is sdd.
>> blockdev does not seem to exist on this system.
>>
>> If useful I now have some more of the same 250GB drives, and I can
>> plug one in to the sdb port and make partitions on it, and so on..
>>
>> For example fdisk -l shows all my partitions. I also show the output
>> you asked for proc/partitions:
>>
>> Disk /dev/sda: 251.0 GB, 251000193024 bytes
>> 255 heads, 63 sectors/track, 30515 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sda1   *           1          17      136521   fd  Linux raid autodetect
>> /dev/sda2              18       30515   244975185   fd  Linux raid autodetect
>>
>> Disk /dev/sdb: 251.0 GB, 251000193024 bytes
>> 255 heads, 63 sectors/track, 30515 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sdb1   *           1          17      136521   fd  Linux raid autodetect
>> /dev/sdb2              18       30515   244975185   fd  Linux raid autodetect
>>
>> Disk /dev/sdc: 251.0 GB, 251000193024 bytes
>> 255 heads, 63 sectors/track, 30515 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sdc1   *           1          17      136521   fd  Linux raid autodetect
>> /dev/sdc2              18       30515   244975185   fd  Linux raid autodetect
>>
>> Disk /dev/sdd: 1039 MB, 1039663104 bytes
>> 255 heads, 63 sectors/track, 126 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sdd1   *           1         127     1015264+   c  W95 FAT32 (LBA)
>> Partition 1 has different physical/logical endings:
>>      phys=(125, 254, 63) logical=(126, 101, 39)
>> ===================================
>> major minor  #blocks  name
>>
>>    7     0     111000 loop0
>>    8     0  245117376 sda
>>    8     1     136521 sda1
>>    8     2  244975185 sda2
>>    8    16  245117376 sdb
>>    8    17     136521 sdb1
>>    8    18  244975185 sdb2
>>    8    32  245117376 sdc
>>    8    33     136521 sdc1
>>    8    34  244975185 sdc2
>>    8    48    1015296 sdd
>>    8    49    1015264 sdd1
>>
>> --
>> John
>
>
> I really cannot help you until you show me the output of "mdadm --assemble
> --verbose ..." like I asked.
>
> NeilBrown
>

I really WAS NOT lying!

I just did it again, and redirected output to a file

The file ( located on my USB stick) contains:

mdadm: looking for devices for /dev/md1
mdadm: no devices found for /dev/md1

--
John
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  0:07                     ` John Valarti
@ 2011-04-24  0:37                       ` John Robinson
  2011-04-24  1:49                         ` John Valarti
  2011-04-24  2:54                       ` NeilBrown
  1 sibling, 1 reply; 22+ messages in thread
From: John Robinson @ 2011-04-24  0:37 UTC (permalink / raw)
  To: John Valarti; +Cc: NeilBrown, linux-raid

On 24/04/2011 01:07, John Valarti wrote:
> On Sat, Apr 23, 2011 at 3:51 PM, NeilBrown<neilb@suse.de>  wrote:
[...]
>> I really cannot help you until you show me the output of "mdadm --assemble
>> --verbose ..." like I asked.
>>
>> NeilBrown
>
> I really WAS NOT lying!
>
> I just did it again, and redirected output to a file
>
> The file ( located on my USB stick) contains:
>
> mdadm: looking for devices for /dev/md1
> mdadm: no devices found for /dev/md1

If I can butt in here... presumably while the system can't assemble your 
array, you're running from rescue media or whatever? You may not have a 
suitable mdadm.conf in the rescue environment.

If you haven't already posted `mdadm --examine --scan --verbose 
--verbose` aka `mdadm -Esvv`, or at the very least `mdadm --examine 
--verbose /dev/sd[abc][12]`, then please do.

Please also try `mdadm --assemble --scan --verbose` and post the result.

Another thing that occurs to me is that since you said you had CentOS 4, 
you likely have kernel 2.6.9 and mdadm 1.12 which are very old. You 
might try SystemRescueCD or similar to get a mdadm 3.1.x which will be 
much better at finding and fixing any problems.

Cheers,

John.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  0:37                       ` John Robinson
@ 2011-04-24  1:49                         ` John Valarti
  2011-04-24  2:12                           ` John Robinson
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-24  1:49 UTC (permalink / raw)
  To: John Robinson; +Cc: NeilBrown, linux-raid

On Sat, Apr 23, 2011 at 6:37 PM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
..
>> mdadm: looking for devices for /dev/md1
>> mdadm: no devices found for /dev/md1
>
> If I can butt in here... presumably while the system can't assemble your
> array, you're running from rescue media or whatever? You may not have a
> suitable mdadm.conf in the rescue environment.

Please feel free to "butt in" , I can use any suggestions.
This is looking gloomy now.

There definitely is NO mdadm.conf
I checked.
I used "find"

This is rescue media. I can not boot off the system using local disk.
I am booting with CentOS 5.5 install media with "linux rescue"

> If you haven't already posted `mdadm --examine --scan --verbose --verbose`
> aka `mdadm -Esvv`, or at the very least `mdadm --examine --verbose
> /dev/sd[abc][12]`, then please do.

Originally there were 4 disks here. sd[abcd]
sdb was the first and worst faulty one, so I removed it.
We now see sd[abc]

HERE:
mdadm --examine --scan --verbose --verbose /dev/sd[abc]2:


   Update Time : Mon Apr 18 07:48:54 2011
         State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
 Spare Devices : 0
      Checksum : 5674ce4e - correct
        Events : 28580020

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

  0     0       8        2        0      active sync   /dev/sda2
  1     1       8       18        1      active sync   /dev/sdb2
  2     2       8       34        2      active sync   /dev/sdc2
  3     3       0        0        3      faulty removed

AND:
mdadm -Esvv /dev/sd[abc]2:
  mdadm: --examine/-E cannot be given with -w


> Please also try `mdadm --assemble --scan --verbose` and post the result.
HERE:
mdadm: /dev/sda2 not identified in config file.
mdadm: /dev/sdb2 not identified in config file.
mdadm: /dev/sdc2 not identified in config file.


> Another thing that occurs to me is that since you said you had CentOS 4, you
> likely have kernel 2.6.9 and mdadm 1.12 which are very old. You might try
> SystemRescueCD or similar to get a mdadm 3.1.x which will be much better at
> finding and fixing any problems.

The rescue media I am using for this is CentOS 5.5:
   uname -a:
Linux localhost.localdomain 2.6.18-238.el5 #1 SMP Thu Jan 13 15:51:15
EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Hope this helps.

--
John

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  1:49                         ` John Valarti
@ 2011-04-24  2:12                           ` John Robinson
  2011-04-24  2:28                             ` John Valarti
  0 siblings, 1 reply; 22+ messages in thread
From: John Robinson @ 2011-04-24  2:12 UTC (permalink / raw)
  To: John Valarti; +Cc: NeilBrown, linux-raid

On 24/04/2011 02:49, John Valarti wrote:
[...]
> This is looking gloomy now.

No it isn't!

> There definitely is NO mdadm.conf
> I checked.
> I used "find"

OK, that's fine - I didn't really expect there to be, unless you were 
using an emergency shell half-way through booting, in which case if 
there was one it might have been wrong since you've had to remove drives.

[...]
> mdadm -Esvv /dev/sd[abc]2:
>    mdadm: --examine/-E cannot be given with -w

That was the letter v twice, not the letter w - and I didn't mean you to 
specify any devices, just `mdadm -Esvv` on its own.

>> Please also try `mdadm --assemble --scan --verbose` and post the result.
> HERE:
> mdadm: /dev/sda2 not identified in config file.
> mdadm: /dev/sdb2 not identified in config file.
> mdadm: /dev/sdc2 not identified in config file.

Again, I didn't mean you to specify any devices, just exactly `mdadm 
--assemble --scan --verbose` on its own.

I'm not quite sure where you are but your timezone is -0600 while mine 
is +0100, it's gone 3 in the morning, so I'm going to bed now. Neil 
Brown is +1000 so it's Easter Sunday for him, which makes me think it's 
unlikely you'll see a reply from him today/tomorrow. Just stay calm and 
don't panic, we'll get there.

Cheers,

John.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  2:12                           ` John Robinson
@ 2011-04-24  2:28                             ` John Valarti
  2011-04-24  2:58                               ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-24  2:28 UTC (permalink / raw)
  To: John Robinson; +Cc: NeilBrown, linux-raid

On Sat, Apr 23, 2011 at 8:12 PM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
..
> That was the letter v twice, not the letter w - and I didn't mean you to
> specify any devices, just `mdadm -Esvv` on its own.

Oops! sorry.
Here you go (sdd is my USB thumb drive):

mdadm: No md superblock detected on /dev/sdd1.
mdadm: No md superblock detected on /dev/sdd.
mdadm: No md superblock detected on /dev/sdc.
mdadm: No md superblock detected on /dev/sdb.
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 734925312 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:54 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce4e - correct
         Events : 28580020

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
  Creation Time : Mon May 15 16:38:08 2006
     Raid Level : raid1
  Used Dev Size : 136448 (133.27 MiB 139.72 MB)
     Array Size : 136448 (133.27 MiB 139.72 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Apr 18 07:36:56 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6b5ae9fa - correct
         Events : 4878


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 734925312 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:51 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce6b - correct
         Events : 28580018

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
  Creation Time : Mon May 15 16:38:08 2006
     Raid Level : raid1
  Used Dev Size : 136448 (133.27 MiB 139.72 MB)
     Array Size : 136448 (133.27 MiB 139.72 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Apr 18 07:36:56 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6b5aea1e - correct
         Events : 4878


      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 73492531mdadm: No md superblock detected on /dev/sda.
mdadm: No md superblock detected on /dev/loop0.
2 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:54 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce60 - correct
         Events : 28580020

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
  Creation Time : Mon May 15 16:38:08 2006
     Raid Level : raid1
  Used Dev Size : 136448 (133.27 MiB 139.72 MB)
     Array Size : 136448 (133.27 MiB 139.72 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Apr 18 07:36:56 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6b5aea0c - correct
         Events : 4878


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1


>>> Please also try `mdadm --assemble --scan --verbose` and post the result.
mdadm: No arrays found in config file


Happy Easter bunny day to you..

Thanks John.

--
John V

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  0:07                     ` John Valarti
  2011-04-24  0:37                       ` John Robinson
@ 2011-04-24  2:54                       ` NeilBrown
  2011-04-24  7:06                         ` John Valarti
  1 sibling, 1 reply; 22+ messages in thread
From: NeilBrown @ 2011-04-24  2:54 UTC (permalink / raw)
  To: John Valarti; +Cc: linux-raid

On Sat, 23 Apr 2011 18:07:37 -0600 John Valarti <mdadmuser@gmail.com> wrote:

> On Sat, Apr 23, 2011 at 3:51 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sat, 23 Apr 2011 10:19:31 -0600 John Valarti <mdadmuser@gmail.com> wrote:
> >
> >> On Sat, Apr 23, 2011 at 2:48 AM, NeilBrown <neilb@suse.de> wrote:
> >> >> No luck
> >> >> Same thing:
> >> >> "no devices found.."
> >> >
> >> > I'm sure it said more than just that.  Complete error messages really are
> >> > helpful..
> >> >
> >> > But the implication seems to be that /dev/sd[acd] don't exist... That is
> >> > weird.
> >> > What does "cat /proc/partitions" show?
> >> > If e.g.  sda is there but sda2 is not, does "blockdev --rereadpt /dev/sda"
> >> > help?
> >> >
> >> > NeilBrown
> >>
> >> They exist.
> >> I have physically removed the dead drive, so now the 3 remaining ones
> >> are at sd[abc]
> >> I have a usb thumb drive plugged in to capture outputs as needed, it is sdd.
> >> blockdev does not seem to exist on this system.
> >>
> >> If useful I now have some more of the same 250GB drives, and I can
> >> plug one in to the sdb port and make partitions on it, and so on..
> >>
> >> For example fdisk -l shows all my partitions. I also show the output
> >> you asked for proc/partitions:
> >>
> >> Disk /dev/sda: 251.0 GB, 251000193024 bytes
> >> 255 heads, 63 sectors/track, 30515 cylinders
> >> Units = cylinders of 16065 * 512 = 8225280 bytes
> >>
> >>    Device Boot      Start         End      Blocks   Id  System
> >> /dev/sda1   *           1          17      136521   fd  Linux raid autodetect
> >> /dev/sda2              18       30515   244975185   fd  Linux raid autodetect
> >>
> >> Disk /dev/sdb: 251.0 GB, 251000193024 bytes
> >> 255 heads, 63 sectors/track, 30515 cylinders
> >> Units = cylinders of 16065 * 512 = 8225280 bytes
> >>
> >>    Device Boot      Start         End      Blocks   Id  System
> >> /dev/sdb1   *           1          17      136521   fd  Linux raid autodetect
> >> /dev/sdb2              18       30515   244975185   fd  Linux raid autodetect
> >>
> >> Disk /dev/sdc: 251.0 GB, 251000193024 bytes
> >> 255 heads, 63 sectors/track, 30515 cylinders
> >> Units = cylinders of 16065 * 512 = 8225280 bytes
> >>
> >>    Device Boot      Start         End      Blocks   Id  System
> >> /dev/sdc1   *           1          17      136521   fd  Linux raid autodetect
> >> /dev/sdc2              18       30515   244975185   fd  Linux raid autodetect
> >>
> >> Disk /dev/sdd: 1039 MB, 1039663104 bytes
> >> 255 heads, 63 sectors/track, 126 cylinders
> >> Units = cylinders of 16065 * 512 = 8225280 bytes
> >>
> >>    Device Boot      Start         End      Blocks   Id  System
> >> /dev/sdd1   *           1         127     1015264+   c  W95 FAT32 (LBA)
> >> Partition 1 has different physical/logical endings:
> >>      phys=(125, 254, 63) logical=(126, 101, 39)
> >> ===================================
> >> major minor  #blocks  name
> >>
> >>    7     0     111000 loop0
> >>    8     0  245117376 sda
> >>    8     1     136521 sda1
> >>    8     2  244975185 sda2
> >>    8    16  245117376 sdb
> >>    8    17     136521 sdb1
> >>    8    18  244975185 sdb2
> >>    8    32  245117376 sdc
> >>    8    33     136521 sdc1
> >>    8    34  244975185 sdc2
> >>    8    48    1015296 sdd
> >>    8    49    1015264 sdd1
> >>
> >> --
> >> John
> >
> >
> > I really cannot help you until you show me the output of "mdadm --assemble
> > --verbose ..." like I asked.
> >
> > NeilBrown
> >
> 
> I really WAS NOT lying!

I never thought you were lying.  But you were summarising.  And when people
summarise instead of giving exact messages I cannot be sure exactly what was
excluded.  I could guess - but the chances are that I would be wrong.

> 
> I just did it again, and redirected output to a file
> 
> The file ( located on my USB stick) contains:
> 
> mdadm: looking for devices for /dev/md1
> mdadm: no devices found for /dev/md1

Thank you.  This is helpful.
It implies that there mdadm didn't actually try including any devices in the
array, which is very odd because you listed some devices on the command line
for it to try.
So I'm a bit confused.

Could you please run this set of commands exactly as given and send me the
complete output.  Maybe put them in a file and run
  sh -x file > /path/on/usb 2>&1


cat /proc/mdstat
mdadm --stop --verbose /dev/md1
ls -l /dev/sd[acd]2
mdadm -E /dev/sd[acd]2
mdadm -Afvv /dev/md1 /dev/sd[acd]2
cat /proc/mdstat
dmesg | tail -100


Thanks,
NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  2:28                             ` John Valarti
@ 2011-04-24  2:58                               ` NeilBrown
  2011-04-24  6:30                                 ` John Valarti
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2011-04-24  2:58 UTC (permalink / raw)
  To: John Valarti; +Cc: John Robinson, linux-raid

On Sat, 23 Apr 2011 20:28:34 -0600 John Valarti <mdadmuser@gmail.com> wrote:

> On Sat, Apr 23, 2011 at 8:12 PM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
> ..
> > That was the letter v twice, not the letter w - and I didn't mean you to
> > specify any devices, just `mdadm -Esvv` on its own.
> 
> Oops! sorry.
> Here you go (sdd is my USB thumb drive):
> 
> mdadm: No md superblock detected on /dev/sdd1.
> mdadm: No md superblock detected on /dev/sdd.
> mdadm: No md superblock detected on /dev/sdc.
> mdadm: No md superblock detected on /dev/sdb.
> /dev/sdc2:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>   Creation Time : Mon May 15 16:38:05 2006
>      Raid Level : raid5
>   Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>      Array Size : 734925312 (700.88 GiB 752.56 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 1
> 
>     Update Time : Mon Apr 18 07:48:54 2011
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : 5674ce4e - correct
>          Events : 28580020
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8        2        0      active sync   /dev/sda2
> 
>    0     0       8        2        0      active sync   /dev/sda2
>    1     1       8       18        1      active sync   /dev/sdb2
>    2     2       8       34        2      active sync   /dev/sdc2
>    3     3       0        0        3      faulty removed
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
>   Creation Time : Mon May 15 16:38:08 2006
>      Raid Level : raid1
>   Used Dev Size : 136448 (133.27 MiB 139.72 MB)
>      Array Size : 136448 (133.27 MiB 139.72 MB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 0
> 
>     Update Time : Mon Apr 18 07:36:56 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 6b5ae9fa - correct
>          Events : 4878
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8        1        0      active sync   /dev/sda1
> 
>    0     0       8        1        0      active sync   /dev/sda1
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       8       33        2      active sync   /dev/sdc1
>    3     3       8       49        3      active sync   /dev/sdd1
> /dev/sdb2:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>   Creation Time : Mon May 15 16:38:05 2006
>      Raid Level : raid5
>   Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>      Array Size : 734925312 (700.88 GiB 752.56 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 1
> 
>     Update Time : Mon Apr 18 07:48:51 2011
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : 5674ce6b - correct
>          Events : 28580018
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       34        2      active sync   /dev/sdc2
> 
>    0     0       8        2        0      active sync   /dev/sda2
>    1     1       8       18        1      active sync   /dev/sdb2
>    2     2       8       34        2      active sync   /dev/sdc2
>    3     3       0        0        3      faulty removed
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
>   Creation Time : Mon May 15 16:38:08 2006
>      Raid Level : raid1
>   Used Dev Size : 136448 (133.27 MiB 139.72 MB)
>      Array Size : 136448 (133.27 MiB 139.72 MB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 0
> 
>     Update Time : Mon Apr 18 07:36:56 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 6b5aea1e - correct
>          Events : 4878
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       33        2      active sync   /dev/sdc1
> 
>    0     0       8        1        0      active sync   /dev/sda1
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       8       33        2      active sync   /dev/sdc1
>    3     3       8       49        3      active sync   /dev/sdd1
> /dev/sda2:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>   Creation Time : Mon May 15 16:38:05 2006
>      Raid Level : raid5
>   Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>      Array Size : 73492531mdadm: No md superblock detected on /dev/sda.
> mdadm: No md superblock detected on /dev/loop0.
> 2 (700.88 GiB 752.56 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 1
> 
>     Update Time : Mon Apr 18 07:48:54 2011
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : 5674ce60 - correct
>          Events : 28580020
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       18        1      active sync   /dev/sdb2
> 
>    0     0       8        2        0      active sync   /dev/sda2
>    1     1       8       18        1      active sync   /dev/sdb2
>    2     2       8       34        2      active sync   /dev/sdc2
>    3     3       0        0        3      faulty removed
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 79435baa:74a2b2c3:68d7e34f:d95ad478
>   Creation Time : Mon May 15 16:38:08 2006
>      Raid Level : raid1
>   Used Dev Size : 136448 (133.27 MiB 139.72 MB)
>      Array Size : 136448 (133.27 MiB 139.72 MB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 0
> 
>     Update Time : Mon Apr 18 07:36:56 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 6b5aea0c - correct
>          Events : 4878
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>    0     0       8        1        0      active sync   /dev/sda1
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       8       33        2      active sync   /dev/sdc1
>    3     3       8       49        3      active sync   /dev/sdd1
> 
> 
> >>> Please also try `mdadm --assemble --scan --verbose` and post the result.
> mdadm: No arrays found in config file
> 

I should have also asked for "mdadm --version".

However given that above, you want to:


   mdadm --assemble --verbose --force /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2

NeilBrown


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  2:58                               ` NeilBrown
@ 2011-04-24  6:30                                 ` John Valarti
  0 siblings, 0 replies; 22+ messages in thread
From: John Valarti @ 2011-04-24  6:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Robinson, linux-raid

On Sat, Apr 23, 2011 at 8:58 PM, NeilBrown <neilb@suse.de> wrote:
..
> I should have also asked for "mdadm --version".
2.6.9 10th March 2009

> However given that above, you want to:
>
>   mdadm --assemble --verbose --force /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2

mdadm: lookng for devices for /dev/md1
mdadm: no devices found for /dev/md1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  2:54                       ` NeilBrown
@ 2011-04-24  7:06                         ` John Valarti
  2011-04-24  8:41                           ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-24  7:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sat, Apr 23, 2011 at 8:54 PM, NeilBrown <neilb@suse.de> wrote:
..
> Could you please run this set of commands exactly as given and send me the
> complete output.  Maybe put them in a file and run
>  sh -x file > /path/on/usb 2>&1
>
>
> cat /proc/mdstat
> mdadm --stop --verbose /dev/md1
> ls -l /dev/sd[acd]2
> mdadm -E /dev/sd[acd]2
> mdadm -Afvv /dev/md1 /dev/sd[acd]2
> cat /proc/mdstat
> dmesg | tail -100
>

Here you go. I warn you it is a bit long!

+ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
unused devices: <none>
+ mdadm --stop --verbose /dev/md1
mdadm: stopped /dev/md1
+ ls -l /dev/sda2 /dev/sdb2 /dev/sdc2
brw------- 1 root root 8,  2 Apr 23 14:39 /dev/sda2
brw------- 1 root root 8, 18 Apr 23 14:39 /dev/sdb2
brw------- 1 root root 8, 34 Apr 23 14:39 /dev/sdc2
+ mdadm -E /dev/sda2 /dev/sdb2 /dev/sdc2
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 734925312 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:54 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce60 - correct
         Events : 28580020

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 734925312 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:51 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce6b - correct
         Events : 28580018

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
  Creation Time : Mon May 15 16:38:05 2006
     Raid Level : raid5
  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
     Array Size : 734925312 (700.88 GiB 752.56 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Mon Apr 18 07:48:54 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5674ce4e - correct
         Events : 28580020

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
+ mdadm -Afvv /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm: looking for devices for /dev/md1
mdadm: no devices found for /dev/md1
+ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
unused devices: <none>
+ dmesg
+ tail -100
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<6> sdd: sdd1
<5>sd 3:0:0:0: Attached scsi removable disk sdd
<7>usb-storage: device scan complete
<7>SELinux: initialized (dev sdd1, type vfat), uses genfs_contexts
<6>md: md0 stopped.
<6>md: md0 stopped.
<6>usb 1-3: USB disconnect, address 3
<6>md: md1 stopped.
<6>usb 1-3: new high speed USB device using ehci_hcd and address 4
<6>usb 1-3: configuration #1 chosen from 1 choice
<6>scsi4 : SCSI emulation for USB Mass Storage devices
<7>usb-storage: device found at 4
<7>usb-storage: waiting for device to settle before scanning
<5>  Vendor: Kingston  Model: DataTraveler R    Rev: PMAP
<5>  Type:   Direct-Access                      ANSI SCSI revision: 00
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<6> sdd: sdd1
<5>sd 4:0:0:0: Attached scsi removable disk sdd
<7>usb-storage: device scan complete
<7>SELinux: initialized (dev sdd1, type vfat), uses genfs_contexts
<6>usb 1-3: USB disconnect, address 4
<6>usb 1-3: new high speed USB device using ehci_hcd and address 5
<6>usb 1-3: configuration #1 chosen from 1 choice
<6>scsi5 : SCSI emulation for USB Mass Storage devices
<7>usb-storage: device found at 5
<7>usb-storage: waiting for device to settle before scanning
<5>  Vendor: Kingston  Model: DataTraveler R    Rev: PMAP
<5>  Type:   Direct-Access                      ANSI SCSI revision: 00
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<6> sdd: sdd1
<5>sd 5:0:0:0: Attached scsi removable disk sdd
<7>usb-storage: device scan complete
<7>SELinux: initialized (dev sdd1, type vfat), uses genfs_contexts
<6>usb 1-3: USB disconnect, address 5
<6>usb 1-3: new high speed USB device using ehci_hcd and address 6
<6>usb 1-3: configuration #1 chosen from 1 choice
<6>scsi6 : SCSI emulation for USB Mass Storage devices
<7>usb-storage: device found at 6
<7>usb-storage: waiting for device to settle before scanning
<5>  Vendor: Kingston  Model: DataTraveler R    Rev: PMAP
<5>  Type:   Direct-Access                      ANSI SCSI revision: 00
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<6> sdd: sdd1
<5>sd 6:0:0:0: Attached scsi removable disk sdd
<7>usb-storage: device scan complete
<7>SELinux: initialized (dev sdd1, type vfat), uses genfs_contexts
<6>usb 1-3: USB disconnect, address 6
<6>md: md1 stopped.
<6>md: md1 stopped.
<6>md: md1 stopped.
<6>md: md1 stopped.
<6>usb 1-3: new high speed USB device using ehci_hcd and address 7
<6>usb 1-3: configuration #1 chosen from 1 choice
<6>scsi7 : SCSI emulation for USB Mass Storage devices
<7>usb-storage: device found at 7
<7>usb-storage: waiting for device to settle before scanning
<5>  Vendor: Kingston  Model: DataTraveler R    Rev: PMAP
<5>  Type:   Direct-Access                      ANSI SCSI revision: 00
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<5>SCSI device sdd: 2030592 512-byte hdwr sectors (1040 MB)
<5>sdd: Write Protect is off
<7>sdd: Mode Sense: 23 00 00 00
<3>sdd: assuming drive cache: write through
<6> sdd: sdd1
<5>sd 7:0:0:0: Attached scsi removable disk sdd
<7>usb-storage: device scan complete
<7>SELinux: initialized (dev sdd1, type vfat), uses genfs_contexts
<6>md: md1 stopped.
<6>md: md1 stopped.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  7:06                         ` John Valarti
@ 2011-04-24  8:41                           ` NeilBrown
  2011-04-24 11:57                             ` John Robinson
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2011-04-24  8:41 UTC (permalink / raw)
  To: John Valarti; +Cc: linux-raid

On Sun, 24 Apr 2011 01:06:52 -0600 John Valarti <mdadmuser@gmail.com> wrote:

> On Sat, Apr 23, 2011 at 8:54 PM, NeilBrown <neilb@suse.de> wrote:
> ..
> > Could you please run this set of commands exactly as given and send me the
> > complete output.  Maybe put them in a file and run
> >  sh -x file > /path/on/usb 2>&1
> >
> >
> > cat /proc/mdstat
> > mdadm --stop --verbose /dev/md1
> > ls -l /dev/sd[acd]2
> > mdadm -E /dev/sd[acd]2
> > mdadm -Afvv /dev/md1 /dev/sd[acd]2
> > cat /proc/mdstat
> > dmesg | tail -100
> >
> 
> Here you go. I warn you it is a bit long!

Thanks.  Length is no problem.

Only it doesn't make sense at all.  I cannot see how mdadm would possibly be
generating just those messages.
It appear to be rejecting each device for some reason, but it not reporting
why it is rejecting the device...

What version of mdadm is this?
   mdadm --version

I should have asked that before.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24  8:41                           ` NeilBrown
@ 2011-04-24 11:57                             ` John Robinson
  2011-04-24 12:29                               ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: John Robinson @ 2011-04-24 11:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Valarti, linux-raid

On 24/04/2011 09:41, NeilBrown wrote:
> On Sun, 24 Apr 2011 01:06:52 -0600 John Valarti<mdadmuser@gmail.com>  wrote:
[...]
>> Here you go. I warn you it is a bit long!
>
> Thanks.  Length is no problem.
>
> Only it doesn't make sense at all.  I cannot see how mdadm would possibly be
> generating just those messages.
> It appear to be rejecting each device for some reason, but it not reporting
> why it is rejecting the device...
>
> What version of mdadm is this?
>     mdadm --version
>
> I should have asked that before.

I think John said 2.6.9 on the CentOS 5.5 rescue media. I think it's 
time to try something more recent: John, could you try SystemRescueCD 
from http://www.sysresccd.org/ and run
   mdadm -Evvs
and if that shows your RAID5 members again,
   mdadm -Afvv /dev/md1

Cheers,

John.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24 11:57                             ` John Robinson
@ 2011-04-24 12:29                               ` NeilBrown
  2011-04-24 16:04                                 ` John Valarti
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2011-04-24 12:29 UTC (permalink / raw)
  To: John Robinson; +Cc: John Valarti, linux-raid

On Sun, 24 Apr 2011 12:57:56 +0100 John Robinson
<john.robinson@anonymous.org.uk> wrote:

> On 24/04/2011 09:41, NeilBrown wrote:
> > On Sun, 24 Apr 2011 01:06:52 -0600 John Valarti<mdadmuser@gmail.com>  wrote:
> [...]
> >> Here you go. I warn you it is a bit long!
> >
> > Thanks.  Length is no problem.
> >
> > Only it doesn't make sense at all.  I cannot see how mdadm would possibly be
> > generating just those messages.
> > It appear to be rejecting each device for some reason, but it not reporting
> > why it is rejecting the device...
> >
> > What version of mdadm is this?
> >     mdadm --version
> >
> > I should have asked that before.
> 
> I think John said 2.6.9


Ahh, I see it.  This is a bug in there:  ->used isn't set to zero after 'dv'
is allocated.  This was fixed in 3.0.  I don't remember that bug...

I cannot see any easy way to work around that bug.
You could possibly:

  echo DEV /dev/sd[abc]2 > /tmp/mdadm.conf
  mdadm -Eb /dev/sda2 >> /tmp/mdadm.conf
  mdadm -Afvv /dev/md1 -c /tmp/mdadm.conf

I think that would work - but no promises.

 on the CentOS 5.5 rescue media. I think it's 
> time to try something more recent: John, could you try SystemRescueCD 
> from http://www.sysresccd.org/ and run
>    mdadm -Evvs
> and if that shows your RAID5 members again,
>    mdadm -Afvv /dev/md1

Getting a newer mdadm is definitely a good idea.

Safest to explicitly list the devices that you want
     mdadm -Afvv /dev/md1 /dev/sd[abc]2


NeilBrown

> 
> Cheers,
> 
> John.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24 12:29                               ` NeilBrown
@ 2011-04-24 16:04                                 ` John Valarti
  2011-04-24 16:15                                   ` John Valarti
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-24 16:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Robinson, linux-raid

On Sun, Apr 24, 2011 at 6:29 AM, NeilBrown <neilb@suse.de> wrote:

> Ahh, I see it.  This is a bug in there:  ->used isn't set to zero after 'dv'
> is allocated.  This was fixed in 3.0.  I don't remember that bug...
>
> I cannot see any easy way to work around that bug.
..
>  on the CentOS 5.5 rescue media. I think it's
>> time to try something more recent: John, could you try SystemRescueCD
>> from http://www.sysresccd.org/ and run
>>    mdadm -Evvs
>> and if that shows your RAID5 members again,
>>    mdadm -Afvv /dev/md1
>
> Getting a newer mdadm is definitely a good idea.
>
> Safest to explicitly list the devices that you want
>     mdadm -Afvv /dev/md1 /dev/sd[abc]2
>
>
> NeilBrown

OK, I have Fedora 14 install media handy, so I booted from that.
Once at a shell:
   mdadm --version:   3.12

mdadm -S /dev/md1
mdadm -S /dev/md1 /dev/sd[abc]2
WORKED!

cat/proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear]
md1 : active raid5 sdc2[0] sdb2[2] sda2[1]
      734925312 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]

unused devices: <none>

Rebooted the system, and it sees my RAID and my OS again.
As I write it is busy running journals and fsck
So far the only dubious part seems to be /tmp. No worry about that.

So, noe the next important part:
What to do next?
Attach another disk bigger than the RAID and copy everything to it?

Assuming yes, then what?
Speculating a bit here:

Add a new good disk and rebuild?
After that, remove the other disk that failed and we just forced back,
and rebuild again?
Then work my way through the other 2 old disks and rebuild 2 more time?

If yes, I could use some command line syntax to make sure I do it the
right way..

If "no" I am all ears as to what to do next.

Oh, and btw:
Thank you
Happy Easter.

--
John V
In a much better mood today.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24 16:04                                 ` John Valarti
@ 2011-04-24 16:15                                   ` John Valarti
  2011-04-24 16:31                                     ` Mathias Burén
  0 siblings, 1 reply; 22+ messages in thread
From: John Valarti @ 2011-04-24 16:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Robinson, linux-raid

On Sun, Apr 24, 2011 at 10:04 AM, John Valarti <mdadmuser@gmail.com> wrote:
..
> WORKED!

It seems I spoke a bit too soon.  sigh..

fsck ran quite a while on /dev/Main/samba

And then it bailed out with "an error occurred"

I know this is not exactly an mdadm question at this point, but you
folk know what you
are doing far better than me..

It dropped me to a basic root shell.

Any suggestions what to try next?

--
John V

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24 16:15                                   ` John Valarti
@ 2011-04-24 16:31                                     ` Mathias Burén
  2011-04-24 18:41                                       ` John Valarti
  0 siblings, 1 reply; 22+ messages in thread
From: Mathias Burén @ 2011-04-24 16:31 UTC (permalink / raw)
  To: John Valarti; +Cc: NeilBrown, John Robinson, linux-raid

On 24 April 2011 17:15, John Valarti <mdadmuser@gmail.com> wrote:
> On Sun, Apr 24, 2011 at 10:04 AM, John Valarti <mdadmuser@gmail.com> wrote:
> ..
>> WORKED!
>
> It seems I spoke a bit too soon.  sigh..
>
> fsck ran quite a while on /dev/Main/samba
>
> And then it bailed out with "an error occurred"
>
> I know this is not exactly an mdadm question at this point, but you
> folk know what you
> are doing far better than me..
>
> It dropped me to a basic root shell.
>
> Any suggestions what to try next?
>
> --
> John V
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Rescue all HDDs to HDD images and try to perform assembles/data rescue on them.

Run fsck manually on your array HDDs, capture the output.

Regards,
// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Server down-fail​ed RAID5-asking for some assistance
  2011-04-24 16:31                                     ` Mathias Burén
@ 2011-04-24 18:41                                       ` John Valarti
  0 siblings, 0 replies; 22+ messages in thread
From: John Valarti @ 2011-04-24 18:41 UTC (permalink / raw)
  To: Mathias Burén; +Cc: NeilBrown, John Robinson, linux-raid

On Sun, Apr 24, 2011 at 10:31 AM, Mathias Burén <mathias.buren@gmail.com> wrote:
..
> Rescue all HDDs to HDD images and try to perform assembles/data rescue on them.
>
> Run fsck manually on your array HDDs, capture the output.
>
> Regards,
> // M

Thanks, what I did in the meantime:
Booted again from rescue media (Fedora 14)
ran fsck on the filesystems using modern fsck.
Once they were clean, rebooted and checked and it is not OK, and boots to login.
Now I am installing CentOS 5.6 on new disks ( 2 x 1TB using RAID1)
Then I will reboot one more time, and selectively copy over useful
data and configuration info from
the old RAID to the new.

A VERY big "Thank You!" to all.

Enjoy your chocolate.

p.s.: Note to future:
Start with asking: "please run mdadm --version " and report back.
If less that 3.xx tell person to boot with something newer.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-04-24 18:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-21 18:29 Server down-fail​ed RAID5-asking for some assistance John Valarti
2011-04-21 19:59 ` David Brown
     [not found]   ` <BANLkTim18Sx6JdZO5PiAqnrakDPzy5PNJQ@mail.gmail.com>
2011-04-22  2:32     ` John Valarti
2011-04-22  2:57       ` Server " NeilBrown
2011-04-22  3:31         ` John Valarti
     [not found]         ` <BANLkTin0SoBzRAear8Jt+26MnVJWouXoNA@mail.gmail.com>
     [not found]           ` <20110423074411.78fef94f@notabene.brown>
     [not found]             ` <BANLkTik_ZY4uoV3E=ua1p+tUD9g8xqQDVg@mail.gmail.com>
     [not found]               ` <20110423184824.55ee7893@notabene.brown>
     [not found]                 ` <BANLkTi=sCfFFfmZTzj2g8-aDNhDqVK8e-A@mail.gmail.com>
     [not found]                   ` <20110424075101.6763309f@notabene.brown>
2011-04-24  0:07                     ` John Valarti
2011-04-24  0:37                       ` John Robinson
2011-04-24  1:49                         ` John Valarti
2011-04-24  2:12                           ` John Robinson
2011-04-24  2:28                             ` John Valarti
2011-04-24  2:58                               ` NeilBrown
2011-04-24  6:30                                 ` John Valarti
2011-04-24  2:54                       ` NeilBrown
2011-04-24  7:06                         ` John Valarti
2011-04-24  8:41                           ` NeilBrown
2011-04-24 11:57                             ` John Robinson
2011-04-24 12:29                               ` NeilBrown
2011-04-24 16:04                                 ` John Valarti
2011-04-24 16:15                                   ` John Valarti
2011-04-24 16:31                                     ` Mathias Burén
2011-04-24 18:41                                       ` John Valarti
2011-04-22 11:19       ` David Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.