All of lore.kernel.org
 help / color / mirror / Atom feed
* Need help with degraded raid 5
@ 2020-03-05  0:31 William Morgan
  2020-03-05 14:53 ` Jinpu Wang
  0 siblings, 1 reply; 6+ messages in thread
From: William Morgan @ 2020-03-05  0:31 UTC (permalink / raw)
  To: linux-raid

Hello,

I'm working with a 4 disk raid 5. In the past I have experienced a
problem that resulted in the array being set to "inactive", but with
some guidance from the list, I was able to rebuild with no loss of
data. Recently I have a slightly different situation where one disk
was "removed" and marked as "spare", so the array is still active, but
degraded.

I've been monitoring the array, and I got a "Fail event" notification
right after a power blip which showed this mdstat:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sdm1[4](F) sdj1[0] sdk1[1] sdl1[2]
      23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      bitmap: 0/59 pages [0KB], 65536KB chunk

unused devices: <none>

A little while later I got a "DegradedArray event" notification with
the following mdstat:

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 sdl1[4] sdj1[1] sdk1[2] sdi1[0]
      23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      [>....................]  recovery =  0.0% (12600/7813893120)
finish=113621.8min speed=1145K/sec
      bitmap: 2/59 pages [8KB], 65536KB chunk

unused devices: <none>

which seemed to imply that sdl was being rebuilt, but then I got
another "DegradedArray event" notification with this:

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 sdl1[4](S) sdj1[1] sdk1[2] sdi1[0]
      23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      bitmap: 2/59 pages [8KB], 65536KB chunk

unused devices: <none>


I don't think anything is really wrong with the removed disk however.
So assuming I've got backups, what do I need to do to reinsert the
disk and get the array back to a normal state? Or does that disk's
data need to be completely rebuilt? And how do I initiate that?

I'm using the latest mdadm and a very recent kernel. Currently I get this:

bill@bill-desk:~$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Sat Sep 22 19:10:10 2018
        Raid Level : raid5
        Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
     Used Dev Size : 7813893120 (7451.91 GiB 8001.43 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Mar  2 17:41:32 2020
             State : clean, degraded
    Active Devices : 3
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 1

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : bitmap

              Name : bill-desk:0  (local to host bill-desk)
              UUID : 06ad8de5:3a7a15ad:88116f44:fcdee150
            Events : 10407

    Number   Major   Minor   RaidDevice State
       0       8      129        0      active sync   /dev/sdi1
       1       8      145        1      active sync   /dev/sdj1
       2       8      161        2      active sync   /dev/sdk1
       -       0        0        3      removed

       4       8      177        -      spare   /dev/sdl1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need help with degraded raid 5
  2020-03-05  0:31 Need help with degraded raid 5 William Morgan
@ 2020-03-05 14:53 ` Jinpu Wang
  2020-03-05 17:22   ` Wols Lists
  0 siblings, 1 reply; 6+ messages in thread
From: Jinpu Wang @ 2020-03-05 14:53 UTC (permalink / raw)
  To: William Morgan; +Cc: linux-raid

William Morgan <therealbrewer@gmail.com> 于2020年3月5日周四 上午1:33写道:
>
> Hello,
>
> I'm working with a 4 disk raid 5. In the past I have experienced a
> problem that resulted in the array being set to "inactive", but with
> some guidance from the list, I was able to rebuild with no loss of
> data. Recently I have a slightly different situation where one disk
> was "removed" and marked as "spare", so the array is still active, but
> degraded.
>
> I've been monitoring the array, and I got a "Fail event" notification
> right after a power blip which showed this mdstat:
>
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : active raid5 sdm1[4](F) sdj1[0] sdk1[1] sdl1[2]
>       23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>       bitmap: 0/59 pages [0KB], 65536KB chunk
>
> unused devices: <none>
>
> A little while later I got a "DegradedArray event" notification with
> the following mdstat:
>
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
> [raid1] [raid10]
> md0 : active raid5 sdl1[4] sdj1[1] sdk1[2] sdi1[0]
>       23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>       [>....................]  recovery =  0.0% (12600/7813893120)
> finish=113621.8min speed=1145K/sec
>       bitmap: 2/59 pages [8KB], 65536KB chunk
>
> unused devices: <none>
>
> which seemed to imply that sdl was being rebuilt, but then I got
> another "DegradedArray event" notification with this:
>
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
> [raid1] [raid10]
> md0 : active raid5 sdl1[4](S) sdj1[1] sdk1[2] sdi1[0]
>       23441679360 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>       bitmap: 2/59 pages [8KB], 65536KB chunk
>
> unused devices: <none>
>
>
> I don't think anything is really wrong with the removed disk however.
> So assuming I've got backups, what do I need to do to reinsert the
> disk and get the array back to a normal state? Or does that disk's
> data need to be completely rebuilt? And how do I initiate that?
>
> I'm using the latest mdadm and a very recent kernel. Currently I get this:
>
> bill@bill-desk:~$ sudo mdadm --detail /dev/md0
> /dev/md0:
>            Version : 1.2
>      Creation Time : Sat Sep 22 19:10:10 2018
>         Raid Level : raid5
>         Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
>      Used Dev Size : 7813893120 (7451.91 GiB 8001.43 GB)
>       Raid Devices : 4
>      Total Devices : 4
>        Persistence : Superblock is persistent
>
>      Intent Bitmap : Internal
>
>        Update Time : Mon Mar  2 17:41:32 2020
>              State : clean, degraded
>     Active Devices : 3
>    Working Devices : 4
>     Failed Devices : 0
>      Spare Devices : 1
>
>             Layout : left-symmetric
>         Chunk Size : 64K
>
> Consistency Policy : bitmap
>
>               Name : bill-desk:0  (local to host bill-desk)
>               UUID : 06ad8de5:3a7a15ad:88116f44:fcdee150
>             Events : 10407
>
>     Number   Major   Minor   RaidDevice State
>        0       8      129        0      active sync   /dev/sdi1
>        1       8      145        1      active sync   /dev/sdj1
>        2       8      161        2      active sync   /dev/sdk1
>        -       0        0        3      removed
>
>        4       8      177        -      spare   /dev/sdl1

"mdadm /dev/md0 -a /dev/sdl1"  should work for you to add the disk
back to array, maybe you can check first with "mdadm -E /dev/sdl1" to
make sure.

Regards,
Jack Wang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need help with degraded raid 5
  2020-03-05 14:53 ` Jinpu Wang
@ 2020-03-05 17:22   ` Wols Lists
  2020-03-06 21:33     ` William Morgan
  0 siblings, 1 reply; 6+ messages in thread
From: Wols Lists @ 2020-03-05 17:22 UTC (permalink / raw)
  To: Jinpu Wang, William Morgan; +Cc: linux-raid

On 05/03/20 14:53, Jinpu Wang wrote:
> "mdadm /dev/md0 -a /dev/sdl1"  should work for you to add the disk
> back to array, maybe you can check first with "mdadm -E /dev/sdl1" to
> make sure.

Or better, --re-add or whatever the option is. If it can find the
relevant data in the superblock, like bitmap or journal or whatever, it
will just bring the disk up-to-date. If it can't, it functions just like
add, so you've lost nothing but might gain a lot.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need help with degraded raid 5
  2020-03-05 17:22   ` Wols Lists
@ 2020-03-06 21:33     ` William Morgan
  2020-03-06 22:55       ` David C. Rankin
  2020-03-09  8:39       ` Jack Wang
  0 siblings, 2 replies; 6+ messages in thread
From: William Morgan @ 2020-03-06 21:33 UTC (permalink / raw)
  To: Wols Lists; +Cc: Jinpu Wang, linux-raid

On Thu, Mar 5, 2020 at 11:22 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 05/03/20 14:53, Jinpu Wang wrote:
> > "mdadm /dev/md0 -a /dev/sdl1"  should work for you to add the disk
> > back to array, maybe you can check first with "mdadm -E /dev/sdl1" to
> > make sure.
>
> Or better, --re-add or whatever the option is. If it can find the
> relevant data in the superblock, like bitmap or journal or whatever, it
> will just bring the disk up-to-date. If it can't, it functions just like
> add, so you've lost nothing but might gain a lot.
>
> Cheers,
> Wol

I tried re-add and I get the following error:

bill@bill-desk:~$ sudo mdadm /dev/md0 --re-add /dev/sdl1
mdadm: Cannot open /dev/sdl1: Device or resource busy

sdl is not mounted, and it doesn't seem to be a device mapper issue:

bill@bill-desk:~$ sudo dmsetup table
No devices found

Here is the current state of sdl:

bill@bill-desk:~$ sudo mdadm -E /dev/sdl1
/dev/sdl1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : 06ad8de5:3a7a15ad:88116f44:fcdee150
           Name : bill-desk:0  (local to host bill-desk)
  Creation Time : Sat Sep 22 19:10:10 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 15627786240 (7451.91 GiB 8001.43 GB)
     Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : 8c628aed:802a5dc8:9d8a8910:9794ec02

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Mar  2 17:41:32 2020
  Bad Block Log : 512 entries available at offset 40 sectors - bad
blocks present.
       Checksum : 7b89f1e6 - correct
         Events : 10749

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

What am I overlooking?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need help with degraded raid 5
  2020-03-06 21:33     ` William Morgan
@ 2020-03-06 22:55       ` David C. Rankin
  2020-03-09  8:39       ` Jack Wang
  1 sibling, 0 replies; 6+ messages in thread
From: David C. Rankin @ 2020-03-06 22:55 UTC (permalink / raw)
  To: mdraid

On 03/06/2020 03:33 PM, William Morgan wrote:
> I tried re-add and I get the following error:
> 
> bill@bill-desk:~$ sudo mdadm /dev/md0 --re-add /dev/sdl1
> mdadm: Cannot open /dev/sdl1: Device or resource busy
> 
> sdl is not mounted, and it doesn't seem to be a device mapper issue:

  cat /proc/mdstat  and/or cat /proc/partitions

  and see if the disk was brought up as an array of its own (something like
/dev/md127, etc..), If so, simply

  sudo mdadm --stop /dev/md127

  The try you re-add again. I recently had that occur when I put in a
replacement disk for a raid1 array. Even though I just cut the plastic
anti-static bag off the brand-new drive, when I booted the system, it came up
as an array (of what I don't know). I got the same device busy and simply used
--stop on the obviously not-an-array array, The --re-add worked just fine
afterwards.

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need help with degraded raid 5
  2020-03-06 21:33     ` William Morgan
  2020-03-06 22:55       ` David C. Rankin
@ 2020-03-09  8:39       ` Jack Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Jack Wang @ 2020-03-09  8:39 UTC (permalink / raw)
  To: William Morgan; +Cc: Wols Lists, Jinpu Wang, linux-raid

William Morgan <therealbrewer@gmail.com> 于2020年3月6日周五 下午10:35写道:
>
> On Thu, Mar 5, 2020 at 11:22 AM Wols Lists <antlists@youngman.org.uk> wrote:
> >
> > On 05/03/20 14:53, Jinpu Wang wrote:
> > > "mdadm /dev/md0 -a /dev/sdl1"  should work for you to add the disk
> > > back to array, maybe you can check first with "mdadm -E /dev/sdl1" to
> > > make sure.
> >
> > Or better, --re-add or whatever the option is. If it can find the
> > relevant data in the superblock, like bitmap or journal or whatever, it
> > will just bring the disk up-to-date. If it can't, it functions just like
> > add, so you've lost nothing but might gain a lot.
> >
> > Cheers,
> > Wol
>
> I tried re-add and I get the following error:
>
> bill@bill-desk:~$ sudo mdadm /dev/md0 --re-add /dev/sdl1
> mdadm: Cannot open /dev/sdl1: Device or resource busy
>
> sdl is not mounted, and it doesn't seem to be a device mapper issue:
>
> bill@bill-desk:~$ sudo dmsetup table
> No devices found
This is strange.
have you checked if any other process is using sdl1?
"sudo lsof /dev/sdl1"


>
> Here is the current state of sdl:
>
> bill@bill-desk:~$ sudo mdadm -E /dev/sdl1
> /dev/sdl1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x9
>      Array UUID : 06ad8de5:3a7a15ad:88116f44:fcdee150
>            Name : bill-desk:0  (local to host bill-desk)
>   Creation Time : Sat Sep 22 19:10:10 2018
>      Raid Level : raid5
>    Raid Devices : 4
>
>  Avail Dev Size : 15627786240 (7451.91 GiB 8001.43 GB)
>      Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
>     Data Offset : 264192 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=264112 sectors, after=0 sectors
>           State : clean
>     Device UUID : 8c628aed:802a5dc8:9d8a8910:9794ec02
>
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Mon Mar  2 17:41:32 2020
>   Bad Block Log : 512 entries available at offset 40 sectors - bad
> blocks present.
>        Checksum : 7b89f1e6 - correct
>          Events : 10749
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>    Device Role : spare
>    Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
>
The metadata looks fine.
If no one holds the disk, maybe last resort to zero out the metadata
and add the disk back, maybe first try David's suggestion, stop array
first and try re-add.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-09  8:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05  0:31 Need help with degraded raid 5 William Morgan
2020-03-05 14:53 ` Jinpu Wang
2020-03-05 17:22   ` Wols Lists
2020-03-06 21:33     ` William Morgan
2020-03-06 22:55       ` David C. Rankin
2020-03-09  8:39       ` Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.