How to un-degrade an array after a totally spurious failure?

All of lore.kernel.org
 help / color / mirror / Atom feed

* How to un-degrade an array after a totally spurious failure?
@ 2009-05-20 23:10 Nix
  2009-05-21  2:49 ` NeilBrown
  2020-01-12  9:48 ` mdadm not sending email Leslie Rhorer
  0 siblings, 2 replies; 19+ messages in thread
From: Nix @ 2009-05-20 23:10 UTC (permalink / raw)
  To: linux-raid

So this just happened on one of my older machines:

sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
sd 0:0:0:0: [sda] ABORT operation started
sd 0:0:0:0: ABORT operation timed-out.
sd 0:0:0:0: [sda] ABORT operation started
sd 0:0:0:0: ABORT operation timed-out.
sd 0:0:2:0: [sdb] ABORT operation started
sd 0:0:2:0: ABORT operation timed-out.
sd 0:0:0:0: [sda] DEVICE RESET operation started
sd 0:0:0:0: DEVICE RESET operation timed-out.
sd 0:0:2:0: [sdb] DEVICE RESET operation started
sd 0:0:2:0: DEVICE RESET operation timed-out.
sd 0:0:0:0: [sda] BUS RESET operation started
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.
sd 0:0:0:0: BUS RESET operation complete.
end_request: I/O error, dev sdb, sector 128591
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on sdb6, disabling device.
raid5: Operation continuing on 2 devices.
RAID5 conf printout:
 --- rd:3 wd:2
 disk 0, o:1, dev:sda6
 disk 1, o:0, dev:sdb6
 disk 2, o:1, dev:sdd5
RAID5 conf printout:
 --- rd:3 wd:2
 disk 0, o:1, dev:sda6
 disk 2, o:1, dev:sdd5

This failure is quasi-spurious: nothing is actually wrong with the disks
(just one cable, which throws an error like this about once every two
years and otherwise works perfectly well, though the error has never
overlapped with a RAID superblock write before), so I'd like the drive
to be pulled back into the array sharpish. But it's quite unclear how to
do that. I can't afford to take the array down, but will accept (because
I must) the background hit of an array reconstruction.

Normally I'd just try things until one works, but if I get a command
wrong now then several rather important and long-running (months)
processes trickling writes to that array will be interrupted and I'll be
in rather a lot of trouble.

So, anyone got a command that would help? I'm not even sure if this is
assembly or growth: it doesn't quite fit into either of those
categories. There must be a way to do this, surely?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: How to un-degrade an array after a totally spurious failure?
  2009-05-20 23:10 How to un-degrade an array after a totally spurious failure? Nix
@ 2009-05-21  2:49 ` NeilBrown
  2009-05-21  7:32   ` Nix
  2009-05-26  8:25   ` Leslie Rhorer
  2020-01-12  9:48 ` mdadm not sending email Leslie Rhorer
  1 sibling, 2 replies; 19+ messages in thread
From: NeilBrown @ 2009-05-21  2:49 UTC (permalink / raw)
  To: Nix; +Cc: linux-raid

On Thu, May 21, 2009 9:10 am, Nix wrote:

> So, anyone got a command that would help? I'm not even sure if this is
> assembly or growth: it doesn't quite fit into either of those
> categories. There must be a way to do this, surely?

It is neither.  It is management.

 mdadm --manage /dev/mdX --remove /dev/sdb6
 mdadm --manage /dev/mdX --add /dev/sdb6

(The --manage is not actually needed, but it doesn't hurt).

NeilBrown


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: How to un-degrade an array after a totally spurious failure?
  2009-05-21  2:49 ` NeilBrown
@ 2009-05-21  7:32   ` Nix
  2009-05-26  8:25   ` Leslie Rhorer
  1 sibling, 0 replies; 19+ messages in thread
From: Nix @ 2009-05-21  7:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 21 May 2009, NeilBrown outgrape:

> On Thu, May 21, 2009 9:10 am, Nix wrote:
>
>> So, anyone got a command that would help? I'm not even sure if this is
>> assembly or growth: it doesn't quite fit into either of those
>> categories. There must be a way to do this, surely?
>
> It is neither.  It is management.

And it says as much in the manpage quite clearly now I look specifically
for it. Maybe trying to fix things at 11pm isn't the best idea.

It's reconstructing now (*slowly*: this ancient Symbios SCSI card goes
into a sort of ultra-sluggish mode when it loses parity --- I think it
stops doing DMA --- so we're seeing astounding 1Mb/s transfer rates.)

But it should be finished by, oh, tomorrow, and the annealing jobs can
keep running, as long as they're not too discomfited by their filesystem
suddenly deciding to imitate a slow tape drive. :)

>  mdadm --manage /dev/mdX --remove /dev/sdb6
>  mdadm --manage /dev/mdX --add /dev/sdb6
>
> (The --manage is not actually needed, but it doesn't hurt).

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-05-21  2:49 ` NeilBrown
  2009-05-21  7:32   ` Nix
@ 2009-05-26  8:25   ` Leslie Rhorer
  2009-05-26 10:47     ` NeilBrown
  1 sibling, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2009-05-26  8:25 UTC (permalink / raw)
  To: 'NeilBrown', 'Nix'; +Cc: linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of NeilBrown
> Sent: Wednesday, May 20, 2009 9:49 PM
> To: Nix
> Cc: linux-raid@vger.kernel.org
> Subject: Re: How to un-degrade an array after a totally spurious failure?
> 
> On Thu, May 21, 2009 9:10 am, Nix wrote:
> 
> > So, anyone got a command that would help? I'm not even sure if this is
> > assembly or growth: it doesn't quite fit into either of those
> > categories. There must be a way to do this, surely?
> 
> It is neither.  It is management.
> 
>  mdadm --manage /dev/mdX --remove /dev/sdb6
>  mdadm --manage /dev/mdX --add /dev/sdb6
> 
> (The --manage is not actually needed, but it doesn't hurt).
> 
> NeilBrown

	I have exactly the same situation, except there are two "failed"
disks on a RAID5 array.  As for the OP, the "failures" are spurious.
Running the remove and then the add command puts the disks back in as spare
disks, not live ones, and then the array just sits there, doing nothing.  I
tried the trick of doing

echo repair > /sys/block/md0/md/sync_action

but the array still just sits there saying it is "clean, degraded", with 2
spare and 5 working devices.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-05-26  8:25   ` Leslie Rhorer
@ 2009-05-26 10:47     ` NeilBrown
  2009-06-08  1:43       ` Leslie Rhorer
  2009-08-03  7:30       ` Leslie Rhorer
  0 siblings, 2 replies; 19+ messages in thread
From: NeilBrown @ 2009-05-26 10:47 UTC (permalink / raw)
  To: lrhorer; +Cc: 'Nix', linux-raid

On Tue, May 26, 2009 6:25 pm, Leslie Rhorer wrote:
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of NeilBrown
>> Sent: Wednesday, May 20, 2009 9:49 PM
>> To: Nix
>> Cc: linux-raid@vger.kernel.org
>> Subject: Re: How to un-degrade an array after a totally spurious
>> failure?
>>
>> On Thu, May 21, 2009 9:10 am, Nix wrote:
>>
>> > So, anyone got a command that would help? I'm not even sure if this is
>> > assembly or growth: it doesn't quite fit into either of those
>> > categories. There must be a way to do this, surely?
>>
>> It is neither.  It is management.
>>
>>  mdadm --manage /dev/mdX --remove /dev/sdb6
>>  mdadm --manage /dev/mdX --add /dev/sdb6
>>
>> (The --manage is not actually needed, but it doesn't hurt).
>>
>> NeilBrown
>
> 	I have exactly the same situation, except there are two "failed"
> disks on a RAID5 array.  As for the OP, the "failures" are spurious.
> Running the remove and then the add command puts the disks back in as
> spare
> disks, not live ones, and then the array just sits there, doing nothing.
> I
> tried the trick of doing
>
> echo repair > /sys/block/md0/md/sync_action
>
> but the array still just sits there saying it is "clean, degraded", with 2
> spare and 5 working devices.
>

No such a good option when you have two failures.
If you have two failures you need to stop the array, then assemble
it again using --force.
It is now too late for that:  adding them with "--add" will have erased
the old metadata.

Your only option is to re-create the array.  Make sure you use the
same parameters (e.g. chunk size) as when you first created the array.
You can check the correct parameters by looking at a device with
"--examine".
Also make sure you put the devices in the correct order.

The best thing to do is to try creating the array, using
"--assume-clean" so it won't trigger a resync.
 Then use "fsck", the "mount" to make sure the data is good.
Once you are satisfied that the data is good and that you created
the array with the right parameters, use "echo repair > ...."
to make sure the array really is 'clean'.

I guess I should stop mdadm from trashing the superblock when you
add a spare to an array which has failed....

NeilBrown

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-05-26 10:47     ` NeilBrown
@ 2009-06-08  1:43       ` Leslie Rhorer
  2009-06-08  1:54         ` Carlos Carvalho
  2009-08-03  7:30       ` Leslie Rhorer
  1 sibling, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2009-06-08  1:43 UTC (permalink / raw)
  To: linux-raid

> >> -----Original Message-----
> >> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> >> owner@vger.kernel.org] On Behalf Of NeilBrown
> >> Sent: Wednesday, May 20, 2009 9:49 PM
> >> To: Nix
> >> Cc: linux-raid@vger.kernel.org
> >> Subject: Re: How to un-degrade an array after a totally spurious
> >> failure?
> >>
> >> On Thu, May 21, 2009 9:10 am, Nix wrote:
> >>
> >> > So, anyone got a command that would help? I'm not even sure if this
> is
> >> > assembly or growth: it doesn't quite fit into either of those
> >> > categories. There must be a way to do this, surely?
> >>
> >> It is neither.  It is management.
> >>
> >>  mdadm --manage /dev/mdX --remove /dev/sdb6
> >>  mdadm --manage /dev/mdX --add /dev/sdb6
> >>
> >> (The --manage is not actually needed, but it doesn't hurt).
> >>
> >> NeilBrown
> >
> > 	I have exactly the same situation, except there are two "failed"
> > disks on a RAID5 array.  As for the OP, the "failures" are spurious.
> > Running the remove and then the add command puts the disks back in as
> > spare
> > disks, not live ones, and then the array just sits there, doing nothing.
> > I
> > tried the trick of doing
> >
> > echo repair > /sys/block/md0/md/sync_action
> >
> > but the array still just sits there saying it is "clean, degraded", with
> 2
> > spare and 5 working devices.
> >
> 
> No such a good option when you have two failures.
> If you have two failures you need to stop the array, then assemble
> it again using --force.
> It is now too late for that:  adding them with "--add" will have erased
> the old metadata.

OK, what about a RAID 6 array with one failed disk?  Is the remove & add the
best option there, or should one apply a different method?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-06-08  1:43       ` Leslie Rhorer
@ 2009-06-08  1:54         ` Carlos Carvalho
  2009-06-08  1:56           ` Leslie Rhorer
  0 siblings, 1 reply; 19+ messages in thread
From: Carlos Carvalho @ 2009-06-08  1:54 UTC (permalink / raw)
  To: linux-raid

Leslie Rhorer (lrhorer@satx.rr.com) wrote on 7 June 2009 20:43:
 >OK, what about a RAID 6 array with one failed disk?  Is the remove & add the
 >best option there, or should one apply a different method?

It is, since that's what the raid is for. It's still operational.
We've done it here and it works.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-06-08  1:54         ` Carlos Carvalho
@ 2009-06-08  1:56           ` Leslie Rhorer
  2009-06-08  7:51             ` Robin Hill
  0 siblings, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2009-06-08  1:56 UTC (permalink / raw)
  To: linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Carlos Carvalho
> Sent: Sunday, June 07, 2009 8:54 PM
> To: linux-raid@vger.kernel.org
> Subject: RE: How to un-degrade an array after a totally spurious failure?
> 
> Leslie Rhorer (lrhorer@satx.rr.com) wrote on 7 June 2009 20:43:
>  >OK, what about a RAID 6 array with one failed disk?  Is the remove & add
> the
>  >best option there, or should one apply a different method?
> 
> It is, since that's what the raid is for. It's still operational.
> We've done it here and it works.

Well, yes, I know it should work.  The question is, "Is it the best method?"


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: How to un-degrade an array after a totally spurious failure?
  2009-06-08  1:56           ` Leslie Rhorer
@ 2009-06-08  7:51             ` Robin Hill
  2009-06-08 13:12               ` Carlos Carvalho
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Hill @ 2009-06-08  7:51 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On Sun Jun 07, 2009 at 08:56:42PM -0500, Leslie Rhorer wrote:

> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Carlos Carvalho
> > Sent: Sunday, June 07, 2009 8:54 PM
> > To: linux-raid@vger.kernel.org
> > Subject: RE: How to un-degrade an array after a totally spurious failure?
> > 
> > Leslie Rhorer (lrhorer@satx.rr.com) wrote on 7 June 2009 20:43:
> >  >OK, what about a RAID 6 array with one failed disk?  Is the remove & add
> >  >the best option there, or should one apply a different method?
> > 
> > It is, since that's what the raid is for. It's still operational.
> > We've done it here and it works.
> 
> Well, yes, I know it should work.  The question is, "Is it the best method?"
> 
Yes - as the array is still active then it's likely to have changed
since the drive failed.  This means that stopping the array and
re-assembling, forcing the failed drive active, could lead to
corruption.  Removing & adding it will ensure the data is updated to
match the current array state.

Of course, neither option should be pursued unless you know _exactly_
why the drive failed and know that it's safe to re-add it to the array.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: How to un-degrade an array after a totally spurious failure?
  2009-06-08  7:51             ` Robin Hill
@ 2009-06-08 13:12               ` Carlos Carvalho
  2009-06-09  1:55                 ` Leslie Rhorer
  0 siblings, 1 reply; 19+ messages in thread
From: Carlos Carvalho @ 2009-06-08 13:12 UTC (permalink / raw)
  To: linux-raid

 >On Sun Jun 07, 2009 at 08:56:42PM -0500, Leslie Rhorer wrote:
 >
 >> > -----Original Message-----
 >> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
 >> > owner@vger.kernel.org] On Behalf Of Carlos Carvalho
 >> > Sent: Sunday, June 07, 2009 8:54 PM
 >> > To: linux-raid@vger.kernel.org
 >> > Subject: RE: How to un-degrade an array after a totally spurious failure?
 >> > 
 >> > Leslie Rhorer (lrhorer@satx.rr.com) wrote on 7 June 2009 20:43:
 >> >  >OK, what about a RAID 6 array with one failed disk?  Is the remove & add
 >> >  >the best option there, or should one apply a different method?
 >> > 
 >> > It is, since that's what the raid is for. It's still operational.
 >> > We've done it here and it works.
 >> 
 >> Well, yes, I know it should work.  The question is, "Is it the best method?"

I don't understand what you mean. It is the only method. Everything
should be perfectly fine because of the redundancy. You only have to
do tricks when you have more failed disks than the array supports,
which always involves a certain risk.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-06-08 13:12               ` Carlos Carvalho
@ 2009-06-09  1:55                 ` Leslie Rhorer
  0 siblings, 0 replies; 19+ messages in thread
From: Leslie Rhorer @ 2009-06-09  1:55 UTC (permalink / raw)
  To: linux-raid

>  >> > Leslie Rhorer (lrhorer@satx.rr.com) wrote on 7 June 2009 20:43:
>  >> >  >OK, what about a RAID 6 array with one failed disk?  Is the remove
> & add
>  >> >  >the best option there, or should one apply a different method?
>  >> >
>  >> > It is, since that's what the raid is for. It's still operational.
>  >> > We've done it here and it works.
>  >>
>  >> Well, yes, I know it should work.  The question is, "Is it the best
> method?"
> 
> I don't understand what you mean. It is the only method.

We've discussed others, here, including stopping the array and doing an
assemble --force.  I think re-add will also work, won't it?

> Everything
> should be perfectly fine because of the redundancy. You only have to
> do tricks when you have more failed disks than the array supports,
> which always involves a certain risk.

If there are other "tricks", then by definition a remove followed by an add
is not the only possible method.  I can certainly accept it is the best one,
but I was not certain of it, which is why I asked.  OF course, one can
define "best" in a number of ways, but I would say the fastest recovery
method that still maintains data integrity is the best.  The add + resync is
not exactly fast.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-05-26 10:47     ` NeilBrown
  2009-06-08  1:43       ` Leslie Rhorer
@ 2009-08-03  7:30       ` Leslie Rhorer
  2009-08-03  7:43         ` NeilBrown
  1 sibling, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2009-08-03  7:30 UTC (permalink / raw)
  To: linux-raid



> >> NeilBrown
> >
> > 	I have exactly the same situation, except there are two "failed"
> > disks on a RAID5 array.  As for the OP, the "failures" are spurious.
> > Running the remove and then the add command puts the disks back in as
> > spare
> > disks, not live ones, and then the array just sits there, doing nothing.
> > I
> > tried the trick of doing
> >
> > echo repair > /sys/block/md0/md/sync_action
> >
> > but the array still just sits there saying it is "clean, degraded", with
> 2
> > spare and 5 working devices.
> >
> 
> No such a good option when you have two failures.
> If you have two failures you need to stop the array, then assemble
> it again using --force.
> It is now too late for that:  adding them with "--add" will have erased
> the old metadata.
> 
> Your only option is to re-create the array.  Make sure you use the
> same parameters (e.g. chunk size) as when you first created the array.
> You can check the correct parameters by looking at a device with
> "--examine".
> Also make sure you put the devices in the correct order.
> 
> The best thing to do is to try creating the array, using
> "--assume-clean" so it won't trigger a resync.
>  Then use "fsck", the "mount" to make sure the data is good.
> Once you are satisfied that the data is good and that you created
> the array with the right parameters, use "echo repair > ...."
> to make sure the array really is 'clean'.
> 
> I guess I should stop mdadm from trashing the superblock when you
> add a spare to an array which has failed....
> 
> NeilBrown

OK, Neil, I've had this occur again.  A prolonged power failure took one of
the systems offline, and now it's convicting 3 of 7 disks in a RAID5 array.
I've done nothing but stop the array.  Prior to stopping the array, mdadm
reported:

Backup:/# mdadm -Dt /dev/md0
/dev/md0:
        Version : 01.02
  Creation Time : Sun Jul 12 20:44:02 2009
     Raid Level : raid5
     Array Size : 8790830592 (8383.59 GiB 9001.81 GB)
  Used Dev Size : 2930276864 (2794.53 GiB 3000.60 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Aug  2 04:01:52 2009
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 3
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           Name : 'Backup':0
           UUID : 940ae4e4:04057ffc:5e92d2fb:63e3efb7
         Events : 14

    Number   Major   Minor   RaidDevice State
       0       8       80        0      active sync   /dev/sdf
       1       8       96        1      active sync   /dev/sdg
       2       8        0        2      active sync   /dev/sda
       3       8       16        3      active sync   /dev/sdb
       4       0        0        4      removed
       5       0        0        5      removed
       6       0        0        6      removed

       4       8       32        -      faulty spare   /dev/sdc
       5       8       48        -      faulty spare   /dev/sdd
       6       8       64        -      faulty spare   /dev/sde

	The array did have a bitmap, but mdadm didn't report it.  When I
attempt using --assemble and --force, I get:

Backup:/# mdadm --assemble --force /dev/md0
mdadm: /dev/md0 not identified in config file.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-08-03  7:30       ` Leslie Rhorer
@ 2009-08-03  7:43         ` NeilBrown
  2009-08-03  8:28           ` Leslie Rhorer
  0 siblings, 1 reply; 19+ messages in thread
From: NeilBrown @ 2009-08-03  7:43 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: linux-raid

On Mon, August 3, 2009 5:30 pm, Leslie Rhorer wrote:
>
>
>> >> NeilBrown
>> >
>> > 	I have exactly the same situation, except there are two "failed"
>> > disks on a RAID5 array.  As for the OP, the "failures" are spurious.
>> > Running the remove and then the add command puts the disks back in as
>> > spare
>> > disks, not live ones, and then the array just sits there, doing
>> nothing.
>> > I
>> > tried the trick of doing
>> >
>> > echo repair > /sys/block/md0/md/sync_action
>> >
>> > but the array still just sits there saying it is "clean, degraded",
>> with
>> 2
>> > spare and 5 working devices.
>> >
>>
>> No such a good option when you have two failures.
>> If you have two failures you need to stop the array, then assemble
>> it again using --force.
>> It is now too late for that:  adding them with "--add" will have erased
>> the old metadata.
>>
>> Your only option is to re-create the array.  Make sure you use the
>> same parameters (e.g. chunk size) as when you first created the array.
>> You can check the correct parameters by looking at a device with
>> "--examine".
>> Also make sure you put the devices in the correct order.
>>
>> The best thing to do is to try creating the array, using
>> "--assume-clean" so it won't trigger a resync.
>>  Then use "fsck", the "mount" to make sure the data is good.
>> Once you are satisfied that the data is good and that you created
>> the array with the right parameters, use "echo repair > ...."
>> to make sure the array really is 'clean'.
>>
>> I guess I should stop mdadm from trashing the superblock when you
>> add a spare to an array which has failed....
>>
>> NeilBrown
>
> OK, Neil, I've had this occur again.  A prolonged power failure took one
> of
> the systems offline, and now it's convicting 3 of 7 disks in a RAID5
> array.
> I've done nothing but stop the array.  Prior to stopping the array, mdadm
> reported:
>
> Backup:/# mdadm -Dt /dev/md0
> /dev/md0:
>         Version : 01.02
>   Creation Time : Sun Jul 12 20:44:02 2009
>      Raid Level : raid5
>      Array Size : 8790830592 (8383.59 GiB 9001.81 GB)
>   Used Dev Size : 2930276864 (2794.53 GiB 3000.60 GB)
>    Raid Devices : 7
>   Total Devices : 7
> Preferred Minor : 0
>     Persistence : Superblock is persistent
>
>     Update Time : Sun Aug  2 04:01:52 2009
>           State : clean, degraded
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 3
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 256K
>
>            Name : 'Backup':0
>            UUID : 940ae4e4:04057ffc:5e92d2fb:63e3efb7
>          Events : 14
>
>     Number   Major   Minor   RaidDevice State
>        0       8       80        0      active sync   /dev/sdf
>        1       8       96        1      active sync   /dev/sdg
>        2       8        0        2      active sync   /dev/sda
>        3       8       16        3      active sync   /dev/sdb
>        4       0        0        4      removed
>        5       0        0        5      removed
>        6       0        0        6      removed
>
>        4       8       32        -      faulty spare   /dev/sdc
>        5       8       48        -      faulty spare   /dev/sdd
>        6       8       64        -      faulty spare   /dev/sde
>
> 	The array did have a bitmap, but mdadm didn't report it.  When I
> attempt using --assemble and --force, I get:
>
> Backup:/# mdadm --assemble --force /dev/md0
> mdadm: /dev/md0 not identified in config file.

So try
  mdadm --assemble --force /dev/md0 /dev/sd[abcdefg]

NeilBrown


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: How to un-degrade an array after a totally spurious failure?
  2009-08-03  7:43         ` NeilBrown
@ 2009-08-03  8:28           ` Leslie Rhorer
  0 siblings, 0 replies; 19+ messages in thread
From: Leslie Rhorer @ 2009-08-03  8:28 UTC (permalink / raw)
  To: 'NeilBrown'; +Cc: linux-raid

> >            UUID : 940ae4e4:04057ffc:5e92d2fb:63e3efb7
> >          Events : 14
> >
> >     Number   Major   Minor   RaidDevice State
> >        0       8       80        0      active sync   /dev/sdf
> >        1       8       96        1      active sync   /dev/sdg
> >        2       8        0        2      active sync   /dev/sda
> >        3       8       16        3      active sync   /dev/sdb
> >        4       0        0        4      removed
> >        5       0        0        5      removed
> >        6       0        0        6      removed
> >
> >        4       8       32        -      faulty spare   /dev/sdc
> >        5       8       48        -      faulty spare   /dev/sdd
> >        6       8       64        -      faulty spare   /dev/sde
> >
> > 	The array did have a bitmap, but mdadm didn't report it.  When I
> > attempt using --assemble and --force, I get:
> >
> > Backup:/# mdadm --assemble --force /dev/md0
> > mdadm: /dev/md0 not identified in config file.
> 
> So try
>   mdadm --assemble --force /dev/md0 /dev/sd[abcdefg]

	Thanks.  I managed to get it working using

-u 940ae4e4:04057ffc:5e92d2fb:63e3efb7

	It take it if I append the output of --examine --scan to the
mdadm.conf file, I won't have the problem in the future?  The output of the
command does not include the bitmap file, but the man page for mdadm.conf
does list the bitmap= parameter.  I'm not quite clear, however, if the
bitmap= parameter is supposed to go on the same line as the output from
--examine --scan, or on a separate line.  I went ahead and put it all one
one line:

ARRAY /dev/md/0 level=raid5 metadata=1.2 num-devices=7
UUID=940ae4e4:04057ffc:5e92d2fb:63e3efb7 name='Backup':0
bitmap=/etc/mdadm/bitmap/md0.map


^ permalink raw reply	[flat|nested] 19+ messages in thread

* mdadm not sending email
  2009-05-20 23:10 How to un-degrade an array after a totally spurious failure? Nix
  2009-05-21  2:49 ` NeilBrown
@ 2020-01-12  9:48 ` Leslie Rhorer
  2020-01-12 12:47   ` John Stoffel
  1 sibling, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2020-01-12  9:48 UTC (permalink / raw)
  To: linux-raid

 ??? I recently upgraded one of my servers to Debian Buster.? I have 
been using sSMTP as my MTA, but unfortunately it is no longer 
maintained.? I installed msmtp, instead, but now my mesages are no 
longer going out from mdadm.? I can run the command:

echo "Subject: Test: | sendmail lesrhorer@att.net

and it works.? If I try:

mdadm --monitor --scan --test -1

I get:

sendmail: the server did not accept the mail
sendmail: server message: 550 Request failed; Mailbox unavailable
sendmail: could not send mail (account default from /etc/msmtprc)

and from /var/log/mail.err:

Jan 12 03:43:40 RAID-Server msmtp: host=outbound.att.net tls=on auth=on 
user=lesrhorer@att.net from=lesrhorer@att.net 
recipients=lesrhorer@att.net smtpstatus=550 smtpmsg='550 Request failed; 
Mailbox unavailable' errormsg='the server did not accept the mail' 
exitcode=EX_UNAVAILABLE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mdadm not sending email
  2020-01-12  9:48 ` mdadm not sending email Leslie Rhorer
@ 2020-01-12 12:47   ` John Stoffel
  2020-01-13 14:00     ` Leslie Rhorer
  0 siblings, 1 reply; 19+ messages in thread
From: John Stoffel @ 2020-01-12 12:47 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: linux-raid

Leslie>  ??? I recently upgraded one of my servers to Debian Buster.?
Leslie> I have been using sSMTP as my MTA, but unfortunately it is no
Leslie> longer maintained.? I installed msmtp, instead, but now my
Leslie> mesages are no longer going out from mdadm.? I can run the
Leslie> command:

This is a problem with your mail setup, not with mdadm.  I suspect you
need to configure msmtp to use TLS and/or to submit the email to port
587 on att.net, where you do a full authenticated login.

Look at the examples here:

https://wiki.alpinelinux.org/wiki/Relay_email_to_gmail_(msmtp,_mailx,_sendmail
https://wiki.debian.org/msmtp

And follow the debuging info these guides give.  Once you get email
working properly, mdadm will send emails.  

Leslie> echo "Subject: Test: | sendmail lesrhorer@att.net

Leslie> and it works.? If I try:

Leslie> mdadm --monitor --scan --test -1

Leslie> I get:

Leslie> sendmail: the server did not accept the mail
Leslie> sendmail: server message: 550 Request failed; Mailbox unavailable
Leslie> sendmail: could not send mail (account default from /etc/msmtprc)

Leslie> and from /var/log/mail.err:

Leslie> Jan 12 03:43:40 RAID-Server msmtp: host=outbound.att.net tls=on auth=on 
Leslie> user=lesrhorer@att.net from=lesrhorer@att.net 
Leslie> recipients=lesrhorer@att.net smtpstatus=550 smtpmsg='550 Request failed; 
Leslie> Mailbox unavailable' errormsg='the server did not accept the mail' 
Leslie> exitcode=EX_UNAVAILABLE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mdadm not sending email
  2020-01-12 12:47   ` John Stoffel
@ 2020-01-13 14:00     ` Leslie Rhorer
  2020-01-29 20:46       ` Leslie Rhorer
  0 siblings, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2020-01-13 14:00 UTC (permalink / raw)
  To: John Stoffel, linux-raid

	I forgot to send this out to the list.  I apologize for any duplicates.

On 1/12/2020 6:47 AM, John Stoffel wrote:
> Leslie>  ??? I recently upgraded one of my servers to Debian Buster.?
> Leslie> I have been using sSMTP as my MTA, but unfortunately it is no
> Leslie> longer maintained.? I installed msmtp, instead, but now my
> Leslie> mesages are no longer going out from mdadm.? I can run the
> Leslie> command:
>
> This is a problem with your mail setup, not with mdadm.  I suspect you
> need to configure msmtp to use TLS and/or to submit the email to port
     It is using TLS.
> 587 on att.net, where you do a full authenticated login.

     Nope, 465, which by the way is the default for SSL/TLS, and I am 
doing a full authenticated login.  Now, it is certainly arguable, 
perhaps even likely, my mail setup has a problem, but without knowing 
specifically what mdadm is sending out, I am going to be hard pressed to 
know what I need to modify in my mail setup.

     In the earlier version of mdadm, the mail utility (specified in 
/etc/mdadm/mdadm.conf) was the script /usr/bin/mdadm_notify.  I have no 
idea how or whatt he newer version f mdadm sends out.

>
> Look at the examples here:
>
> https://wiki.alpinelinux.org/wiki/Relay_email_to_gmail_(msmtp,_mailx,_sendmail
> https://wiki.debian.org/msmtp

     I had already looked at both of those, and although the 
configuration for att.net is different than for gmail, nothing jumps out 
at me.

Here is my configuration for msmtp:

# Example for a system wide configuration file

# A system wide configuration file is optional.
# If it exists, it usually defines a default account.
# This allows msmtp to be used like /usr/sbin/sendmail.
account default

# The SMTP smarthost
host outbound.att.net

# Use TLS on port 465
port 465
tls on
tls_starttls off

# Construct envelope-from addresses of the form "user@oursite.example"
#auto_from on
#maildomain att.net

from lesrhorer@att.net
user lesrhorer@att.net
auth on
password XXXXXXXXXXX

# Syslog logging with facility LOG_MAIL instead of the default LOG_USER
syslog LOG_MAIL

Mail is working on another server that still uses ssmtp.  Here is the 
configuration:

# Config file for sSMTP sendmail
#
# The person who gets all mail for userids < 1000
# Make this empty to disable rewriting.
root=lesrhorer

# The place where the mail goes. The actual machine name is required no
# MX records are consulted. Commonly mailhosts are named mail.domain.com
mailhub=outbound.att.net

# Where will the mail seem to come from?
rewriteDomain=att.net

# The full hostname
hostname=localhost

# Are users allowed to set their own From: address?
# YES - Allow the user to specify their own From: address
# NO - Use the system generated From: address
FromLineOverride=YES

AuthUser=lesrhorer@att.net
AuthPass=XXXXXXXXXXXXX
UseTLS=YES

>
> And follow the debuging info these guides give.  Once you get email

     I don't see any debugging recommendations in either document.

     One interesting thing: When I run the monitor / test command on the 
older system, sendmail complains about the mailbox being unavailable, 
but it still sends out the email.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mdadm not sending email
  2020-01-13 14:00     ` Leslie Rhorer
@ 2020-01-29 20:46       ` Leslie Rhorer
       [not found]         ` <CALc6PW4mf0kkU2y8mPvQsM3N-EMG2kLV3Y9-8EV-XQgLBmy_YA@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Leslie Rhorer @ 2020-01-29 20:46 UTC (permalink / raw)
  To: linux-raid

     I believe I have fixed the problem.  For anyone else who runs 
across this:

1. Add both the following ines into /etc/mdadm/mdadm.conf

MAILADDR <recipient>

MAILFROM <sender>

2. Create a symlink in /usr/sbin

cd /usr/sbin

ln -s ../bin/msmtp sendmail

On 1/13/2020 8:00 AM, Leslie Rhorer wrote:
>     I forgot to send this out to the list.  I apologize for any 
> duplicates.
>
> On 1/12/2020 6:47 AM, John Stoffel wrote:
>> Leslie>  ??? I recently upgraded one of my servers to Debian Buster.?
>> Leslie> I have been using sSMTP as my MTA, but unfortunately it is no
>> Leslie> longer maintained.? I installed msmtp, instead, but now my
>> Leslie> mesages are no longer going out from mdadm.? I can run the
>> Leslie> command:
>>
>> This is a problem with your mail setup, not with mdadm.  I suspect you
>> need to configure msmtp to use TLS and/or to submit the email to port
>     It is using TLS.
>> 587 on att.net, where you do a full authenticated login.
>
>     Nope, 465, which by the way is the default for SSL/TLS, and I am 
> doing a full authenticated login.  Now, it is certainly arguable, 
> perhaps even likely, my mail setup has a problem, but without knowing 
> specifically what mdadm is sending out, I am going to be hard pressed 
> to know what I need to modify in my mail setup.
>
>     In the earlier version of mdadm, the mail utility (specified in 
> /etc/mdadm/mdadm.conf) was the script /usr/bin/mdadm_notify.  I have 
> no idea how or whatt he newer version f mdadm sends out.
>
>>
>> Look at the examples here:
>>
>> https://wiki.alpinelinux.org/wiki/Relay_email_to_gmail_(msmtp,_mailx,_sendmail 
>>
>> https://wiki.debian.org/msmtp
>
>
>     I had already looked at both of those, and although the 
> configuration for att.net is different than for gmail, nothing jumps 
> out at me.
>
>
> Here is my configuration for msmtp:
>
> # Example for a system wide configuration file
>
> # A system wide configuration file is optional.
> # If it exists, it usually defines a default account.
> # This allows msmtp to be used like /usr/sbin/sendmail.
> account default
>
> # The SMTP smarthost
> host outbound.att.net
>
> # Use TLS on port 465
> port 465
> tls on
> tls_starttls off
>
> # Construct envelope-from addresses of the form "user@oursite.example"
> #auto_from on
> #maildomain att.net
>
> from lesrhorer@att.net
> user lesrhorer@att.net
> auth on
> password XXXXXXXXXXX
>
> # Syslog logging with facility LOG_MAIL instead of the default LOG_USER
> syslog LOG_MAIL
>
> Mail is working on another server that still uses ssmtp.  Here is the 
> configuration:
>
> # Config file for sSMTP sendmail
> #
> # The person who gets all mail for userids < 1000
> # Make this empty to disable rewriting.
> root=lesrhorer
>
> # The place where the mail goes. The actual machine name is required no
> # MX records are consulted. Commonly mailhosts are named mail.domain.com
> mailhub=outbound.att.net
>
> # Where will the mail seem to come from?
> rewriteDomain=att.net
>
> # The full hostname
> hostname=localhost
>
> # Are users allowed to set their own From: address?
> # YES - Allow the user to specify their own From: address
> # NO - Use the system generated From: address
> FromLineOverride=YES
>
> AuthUser=lesrhorer@att.net
> AuthPass=XXXXXXXXXXXXX
> UseTLS=YES
>
>>
>> And follow the debuging info these guides give.  Once you get email
>
>     I don't see any debugging recommendations in either document.
>
>
>     One interesting thing: When I run the monitor / test command on 
> the older system, sendmail complains about the mailbox being 
> unavailable, but it still sends out the email.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mdadm not sending email
       [not found]         ` <CALc6PW4mf0kkU2y8mPvQsM3N-EMG2kLV3Y9-8EV-XQgLBmy_YA@mail.gmail.com>
@ 2020-01-30 22:14           ` Leslie Rhorer
  0 siblings, 0 replies; 19+ messages in thread
From: Leslie Rhorer @ 2020-01-30 22:14 UTC (permalink / raw)
  To: William Morgan; +Cc: linux-raid

mdadm /dev/mdX --monitor --test

On 1/29/2020 4:56 PM, William Morgan wrote:
> How do you test this? Is there a way to simulate an issue to see if 
> the mail will be sent properly?
>
> Thanks,
> Bill
>
> On Wed, Jan 29, 2020, 14:47 Leslie Rhorer <lesrhorer@att.net 
> <mailto:lesrhorer@att.net>> wrote:
>
>          I believe I have fixed the problem.  For anyone else who runs
>     across this:
>
>     1. Add both the following ines into /etc/mdadm/mdadm.conf
>
>     MAILADDR <recipient>
>
>     MAILFROM <sender>
>
>     2. Create a symlink in /usr/sbin
>
>     cd /usr/sbin
>
>     ln -s ../bin/msmtp sendmail
>
>     On 1/13/2020 8:00 AM, Leslie Rhorer wrote:
>     >     I forgot to send this out to the list.  I apologize for any
>     > duplicates.
>     >
>     > On 1/12/2020 6:47 AM, John Stoffel wrote:
>     >> Leslie>  ??? I recently upgraded one of my servers to Debian
>     Buster.?
>     >> Leslie> I have been using sSMTP as my MTA, but unfortunately it
>     is no
>     >> Leslie> longer maintained.? I installed msmtp, instead, but now my
>     >> Leslie> mesages are no longer going out from mdadm.? I can run the
>     >> Leslie> command:
>     >>
>     >> This is a problem with your mail setup, not with mdadm.  I
>     suspect you
>     >> need to configure msmtp to use TLS and/or to submit the email
>     to port
>     >     It is using TLS.
>     >> 587 on att.net <http://att.net>, where you do a full
>     authenticated login.
>     >
>     >     Nope, 465, which by the way is the default for SSL/TLS, and
>     I am
>     > doing a full authenticated login.  Now, it is certainly arguable,
>     > perhaps even likely, my mail setup has a problem, but without
>     knowing
>     > specifically what mdadm is sending out, I am going to be hard
>     pressed
>     > to know what I need to modify in my mail setup.
>     >
>     >     In the earlier version of mdadm, the mail utility (specified in
>     > /etc/mdadm/mdadm.conf) was the script /usr/bin/mdadm_notify.  I
>     have
>     > no idea how or whatt he newer version f mdadm sends out.
>     >
>     >>
>     >> Look at the examples here:
>     >>
>     >>
>     https://wiki.alpinelinux.org/wiki/Relay_email_to_gmail_(msmtp,_mailx,_sendmail
>
>     >>
>     >> https://wiki.debian.org/msmtp
>     >
>     >
>     >     I had already looked at both of those, and although the
>     > configuration for att.net <http://att.net> is different than for
>     gmail, nothing jumps
>     > out at me.
>     >
>     >
>     > Here is my configuration for msmtp:
>     >
>     > # Example for a system wide configuration file
>     >
>     > # A system wide configuration file is optional.
>     > # If it exists, it usually defines a default account.
>     > # This allows msmtp to be used like /usr/sbin/sendmail.
>     > account default
>     >
>     > # The SMTP smarthost
>     > host outbound.att.net <http://outbound.att.net>
>     >
>     > # Use TLS on port 465
>     > port 465
>     > tls on
>     > tls_starttls off
>     >
>     > # Construct envelope-from addresses of the form
>     "user@oursite.example"
>     > #auto_from on
>     > #maildomain att.net <http://att.net>
>     >
>     > from lesrhorer@att.net <mailto:lesrhorer@att.net>
>     > user lesrhorer@att.net <mailto:lesrhorer@att.net>
>     > auth on
>     > password XXXXXXXXXXX
>     >
>     > # Syslog logging with facility LOG_MAIL instead of the default
>     LOG_USER
>     > syslog LOG_MAIL
>     >
>     > Mail is working on another server that still uses ssmtp. Here is
>     the
>     > configuration:
>     >
>     > # Config file for sSMTP sendmail
>     > #
>     > # The person who gets all mail for userids < 1000
>     > # Make this empty to disable rewriting.
>     > root=lesrhorer
>     >
>     > # The place where the mail goes. The actual machine name is
>     required no
>     > # MX records are consulted. Commonly mailhosts are named
>     mail.domain.com <http://mail.domain.com>
>     > mailhub=outbound.att.net <http://outbound.att.net>
>     >
>     > # Where will the mail seem to come from?
>     > rewriteDomain=att.net <http://att.net>
>     >
>     > # The full hostname
>     > hostname=localhost
>     >
>     > # Are users allowed to set their own From: address?
>     > # YES - Allow the user to specify their own From: address
>     > # NO - Use the system generated From: address
>     > FromLineOverride=YES
>     >
>     > AuthUser=lesrhorer@att.net <mailto:lesrhorer@att.net>
>     > AuthPass=XXXXXXXXXXXXX
>     > UseTLS=YES
>     >
>     >>
>     >> And follow the debuging info these guides give.  Once you get email
>     >
>     >     I don't see any debugging recommendations in either document.
>     >
>     >
>     >     One interesting thing: When I run the monitor / test command on
>     > the older system, sendmail complains about the mailbox being
>     > unavailable, but it still sends out the email.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-01-30 22:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-20 23:10 How to un-degrade an array after a totally spurious failure? Nix
2009-05-21  2:49 ` NeilBrown
2009-05-21  7:32   ` Nix
2009-05-26  8:25   ` Leslie Rhorer
2009-05-26 10:47     ` NeilBrown
2009-06-08  1:43       ` Leslie Rhorer
2009-06-08  1:54         ` Carlos Carvalho
2009-06-08  1:56           ` Leslie Rhorer
2009-06-08  7:51             ` Robin Hill
2009-06-08 13:12               ` Carlos Carvalho
2009-06-09  1:55                 ` Leslie Rhorer
2009-08-03  7:30       ` Leslie Rhorer
2009-08-03  7:43         ` NeilBrown
2009-08-03  8:28           ` Leslie Rhorer
2020-01-12  9:48 ` mdadm not sending email Leslie Rhorer
2020-01-12 12:47   ` John Stoffel
2020-01-13 14:00     ` Leslie Rhorer
2020-01-29 20:46       ` Leslie Rhorer
     [not found]         ` <CALc6PW4mf0kkU2y8mPvQsM3N-EMG2kLV3Y9-8EV-XQgLBmy_YA@mail.gmail.com>
2020-01-30 22:14           ` Leslie Rhorer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.