All of lore.kernel.org
 help / color / mirror / Atom feed
* Reshape using drives not partitions, RAID gone after reboot
@ 2020-07-14 19:50 Adam Barnett
  2020-07-14 22:54 ` antlists
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Barnett @ 2020-07-14 19:50 UTC (permalink / raw)
  To: linux-raid

Hi everyone I had something happen to my mdadm raid after a reshape and 
reboot.

My mdadm RAID5 array just underwent a 5>8 disk grow and reshape. This 
took several days and went uninterrupted. When cat /proc/mdstat said it 
was complete I rebooted the system and now the array no longer shows.

One potential problem I can see is that I used the full disk when adding 
the new drives (e.g. /dev/sda not /dev/sda1). However these drives had 
partitions on them that should span the entire drive. I now realize this 
was pretty dumb.

I have tried:

$ sudo mdadm --assemble --scan
mdadm: No arrays found in config file or automatically

The three newly added drives do not appear to have md superblocks:

$ sudo mdadm --examine /dev/sd[kln]
/dev/sdk:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdl:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdn:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

$ sudo mdadm --examine /dev/sd[kln]1
mdadm: No md superblock detected on /dev/sdk1.
mdadm: No md superblock detected on /dev/sdl1.
mdadm: No md superblock detected on /dev/sdn1.

the five others do and show the correct stats for the array:

$ sudo mdadm --examine /dev/sd[ijmop]1
/dev/sdi1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 7399b735:98d9a6fb:2e0f3ee8:7fb9397e
            Name : Freedom-2:127
   Creation Time : Mon Apr  2 18:09:19 2018
      Raid Level : raid5
    Raid Devices : 8

  Avail Dev Size : 15627795456 (7451.91 GiB 8001.43 GB)
      Array Size : 54697259008 (52163.37 GiB 56009.99 GB)
   Used Dev Size : 15627788288 (7451.91 GiB 8001.43 GB)
     Data Offset : 254976 sectors
    Super Offset : 8 sectors
    Unused Space : before=254888 sectors, after=7168 sectors
           State : clean
     Device UUID : ca3cd591:665d102b:7ab8921f:f1b55d62

Internal Bitmap : 8 sectors from superblock
     Update Time : Tue Jul 14 11:46:37 2020
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 6a1bca88 - correct
          Events : 401415

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

...ect

Forcing the assembly does not work:

$ sudo mdadm /dev/md1 --assemble --force /dev/sd[ijmop]1 /dev/sd[kln]
mdadm: /dev/sdi1 is busy - skipping
mdadm: /dev/sdj1 is busy - skipping
mdadm: /dev/sdm1 is busy - skipping
mdadm: /dev/sdo1 is busy - skipping
mdadm: /dev/sdp1 is busy - skipping
mdadm: Cannot assemble mbr metadata on /dev/sdk
mdadm: /dev/sdk has no superblock - assembly aborted

 From my looking around I now know that sometimes adding full drives can 
have issues with md superblocks being destroyed, and that I may be able 
to proceed with the --create --assume-clean command. I would like to get 
a second opinion before I go that route as it is somewhat of a last 
resort. I also may need help understanding how to transition from the 
full drives to partitions on those drives if I need to go that route.

Thank you so much for any and all help.


-Adam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-14 19:50 Reshape using drives not partitions, RAID gone after reboot Adam Barnett
@ 2020-07-14 22:54 ` antlists
  2020-07-14 23:27   ` Roger Heflin
  0 siblings, 1 reply; 7+ messages in thread
From: antlists @ 2020-07-14 22:54 UTC (permalink / raw)
  To: Adam Barnett, linux-raid

On 14/07/2020 20:50, Adam Barnett wrote:
> Forcing the assembly does not work:
> 
> $ sudo mdadm /dev/md1 --assemble --force /dev/sd[ijmop]1 /dev/sd[kln]
> mdadm: /dev/sdi1 is busy - skipping
> mdadm: /dev/sdj1 is busy - skipping
> mdadm: /dev/sdm1 is busy - skipping
> mdadm: /dev/sdo1 is busy - skipping
> mdadm: /dev/sdp1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdk
> mdadm: /dev/sdk has no superblock - assembly aborted

Did you do an "mdadm --stop /dev/md1" before trying that? That's a 
classic error from an array that's previously been partially assembled 
and failed ...

The other thing I'd do is make sure there aren't any other unepected 
partially assembled arrays. I doubt it applies here, but I have come 
across mirrors that get broken in half and you get two failed arrays 
instead of one working one ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-14 22:54 ` antlists
@ 2020-07-14 23:27   ` Roger Heflin
  2020-07-15  0:24     ` antlists
  2020-07-15  2:28     ` Adam Barnett
  0 siblings, 2 replies; 7+ messages in thread
From: Roger Heflin @ 2020-07-14 23:27 UTC (permalink / raw)
  To: antlists; +Cc: Adam Barnett, Linux RAID

Did you create the partition before you added the disk to mdadm or
after?  If after was it a dos or a gpt?  Dos should have only cleared
the first 512byte block.  If gpt it will have written to the first
block and to at least 1 more location on the disk, possibly causing
data loss.

If before then you at least need to get rid of the partition table
completely.   Having a partition on a device will often cause a number
of things to ignore the whole disk.  I have debugged "lost" pv's where
the partition effectively "blocked" lvm from even looking at the
entire device that the pv was one.

If it is a dos partition table then:
save a backup of each disk first (always a good idea, you can dd it
back if you screwed up so long as you carefully create the backups and
name them properly).
dd if=/dev/sdx of=/root/sdxbackup.img bs=512 count=1
then clear the partition table space, rebooting is probably the
easiest way to clear out the mappings, it can be done without
rebooting but I have to do it trial and error to figure out the exact
order and commands.
dd if=/dev/zero of=/dev/sdX bs=512 count=1

On Tue, Jul 14, 2020 at 6:11 PM antlists <antlists@youngman.org.uk> wrote:
>
> On 14/07/2020 20:50, Adam Barnett wrote:
> > Forcing the assembly does not work:
> >
> > $ sudo mdadm /dev/md1 --assemble --force /dev/sd[ijmop]1 /dev/sd[kln]
> > mdadm: /dev/sdi1 is busy - skipping
> > mdadm: /dev/sdj1 is busy - skipping
> > mdadm: /dev/sdm1 is busy - skipping
> > mdadm: /dev/sdo1 is busy - skipping
> > mdadm: /dev/sdp1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdk
> > mdadm: /dev/sdk has no superblock - assembly aborted
>
> Did you do an "mdadm --stop /dev/md1" before trying that? That's a
> classic error from an array that's previously been partially assembled
> and failed ...
>
> The other thing I'd do is make sure there aren't any other unepected
> partially assembled arrays. I doubt it applies here, but I have come
> across mirrors that get broken in half and you get two failed arrays
> instead of one working one ...
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-14 23:27   ` Roger Heflin
@ 2020-07-15  0:24     ` antlists
  2020-07-15 20:48       ` AdsGroup
  2020-07-15  2:28     ` Adam Barnett
  1 sibling, 1 reply; 7+ messages in thread
From: antlists @ 2020-07-15  0:24 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Adam Barnett, Linux RAID

On 15/07/2020 00:27, Roger Heflin wrote:
> Did you create the partition before you added the disk to mdadm or
> after?  If after was it a dos or a gpt?  Dos should have only cleared
> the first 512byte block.  If gpt it will have written to the first
> block and to at least 1 more location on the disk, possibly causing
> data loss.
> 
> If before then you at least need to get rid of the partition table
> completely.   Having a partition on a device will often cause a number
> of things to ignore the whole disk.  I have debugged "lost" pv's where
> the partition effectively "blocked" lvm from even looking at the
> entire device that the pv was one.

If an explicit assemble works, then if you can get hold of a temporary 
spare/loan disk, I'd slowly move the new disks across to partitions by 
doing a --replace, not a --remove / --add. A replace will both keep the 
array protected against failure, and also not stress the array because 
it will just copy the old disk to the new, rather than rebuilding the 
new disk from all the others.

I'm not sure about the commands, but iirc mdadm has a --wipe-superblock 
command or something, as does fdisk have something to wipe a gpt, so 
make sure you clear that stuff out before re-initialising a disk.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-14 23:27   ` Roger Heflin
  2020-07-15  0:24     ` antlists
@ 2020-07-15  2:28     ` Adam Barnett
  2020-07-15 16:10       ` Roger Heflin
  1 sibling, 1 reply; 7+ messages in thread
From: Adam Barnett @ 2020-07-15  2:28 UTC (permalink / raw)
  To: Roger Heflin, antlists; +Cc: Linux RAID

Thanks for all the the replies. The drives had single gpt partitions 
that were created before adding the drives to the arrays. So I'll give 
removing the partition tables a try and forcing reassembly.

I also tried stopping the array before forcing reassembly but this issue 
is that the newly added drives appear to have no superblocks, so mdadm 
aborts the assembly.

My current plan is to try to --create --assume-clean the array, but I 
have been reading about using overlay files to preserve the drives. If 
anyone could help me understand exactly how that is done I would be very 
appreciative.

I don't think the list allows links(?) but I'm following the steps on 
the kernel wiki under "Recovering_a_failed_software_RAID" but the bash 
commands are a bit confusing due to the use of the parallel command.

Thanks again all!

-Adam

On 7/14/20 5:27 PM, Roger Heflin wrote:
> Did you create the partition before you added the disk to mdadm or
> after?  If after was it a dos or a gpt?  Dos should have only cleared
> the first 512byte block.  If gpt it will have written to the first
> block and to at least 1 more location on the disk, possibly causing
> data loss.
>
> If before then you at least need to get rid of the partition table
> completely.   Having a partition on a device will often cause a number
> of things to ignore the whole disk.  I have debugged "lost" pv's where
> the partition effectively "blocked" lvm from even looking at the
> entire device that the pv was one.
>
> If it is a dos partition table then:
> save a backup of each disk first (always a good idea, you can dd it
> back if you screwed up so long as you carefully create the backups and
> name them properly).
> dd if=/dev/sdx of=/root/sdxbackup.img bs=512 count=1
> then clear the partition table space, rebooting is probably the
> easiest way to clear out the mappings, it can be done without
> rebooting but I have to do it trial and error to figure out the exact
> order and commands.
> dd if=/dev/zero of=/dev/sdX bs=512 count=1
>
> On Tue, Jul 14, 2020 at 6:11 PM antlists <antlists@youngman.org.uk> wrote:
>> On 14/07/2020 20:50, Adam Barnett wrote:
>>> Forcing the assembly does not work:
>>>
>>> $ sudo mdadm /dev/md1 --assemble --force /dev/sd[ijmop]1 /dev/sd[kln]
>>> mdadm: /dev/sdi1 is busy - skipping
>>> mdadm: /dev/sdj1 is busy - skipping
>>> mdadm: /dev/sdm1 is busy - skipping
>>> mdadm: /dev/sdo1 is busy - skipping
>>> mdadm: /dev/sdp1 is busy - skipping
>>> mdadm: Cannot assemble mbr metadata on /dev/sdk
>>> mdadm: /dev/sdk has no superblock - assembly aborted
>> Did you do an "mdadm --stop /dev/md1" before trying that? That's a
>> classic error from an array that's previously been partially assembled
>> and failed ...
>>
>> The other thing I'd do is make sure there aren't any other unepected
>> partially assembled arrays. I doubt it applies here, but I have come
>> across mirrors that get broken in half and you get two failed arrays
>> instead of one working one ...
>>
>> Cheers,
>> Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-15  2:28     ` Adam Barnett
@ 2020-07-15 16:10       ` Roger Heflin
  0 siblings, 0 replies; 7+ messages in thread
From: Roger Heflin @ 2020-07-15 16:10 UTC (permalink / raw)
  To: Adam Barnett; +Cc: antlists, Linux RAID

there is a chance that removing the partition tables that should not
be there will just fix it and are what is blocking the devices.

So once you have a copy of the disks, remove the partition table
completely from the 3 and reboot, and it may just start since it will
be able to find the devices.   Or it may now be startable with the
command you attempted to start it with before.

I have had partitioned devices (say /dev/sda1 have a partition table
put on it, and that "block" lvm from looking at /dev/sda1, removing it
and removing the mappings immediately makes the missing pv's show up,
and I am pretty sure mdadm has the same rules.


You must have the exactly right order for assume-clean to work, and
mistake and there will be data issues.

On Tue, Jul 14, 2020 at 9:28 PM Adam Barnett
<adamtravisbarnett@gmail.com> wrote:
>
> Thanks for all the the replies. The drives had single gpt partitions
> that were created before adding the drives to the arrays. So I'll give
> removing the partition tables a try and forcing reassembly.
>
> I also tried stopping the array before forcing reassembly but this issue
> is that the newly added drives appear to have no superblocks, so mdadm
> aborts the assembly.
>
> My current plan is to try to --create --assume-clean the array, but I
> have been reading about using overlay files to preserve the drives. If
> anyone could help me understand exactly how that is done I would be very
> appreciative.
>
> I don't think the list allows links(?) but I'm following the steps on
> the kernel wiki under "Recovering_a_failed_software_RAID" but the bash
> commands are a bit confusing due to the use of the parallel command.
>
> Thanks again all!
>
> -Adam
>
> On 7/14/20 5:27 PM, Roger Heflin wrote:
> > Did you create the partition before you added the disk to mdadm or
> > after?  If after was it a dos or a gpt?  Dos should have only cleared
> > the first 512byte block.  If gpt it will have written to the first
> > block and to at least 1 more location on the disk, possibly causing
> > data loss.
> >
> > If before then you at least need to get rid of the partition table
> > completely.   Having a partition on a device will often cause a number
> > of things to ignore the whole disk.  I have debugged "lost" pv's where
> > the partition effectively "blocked" lvm from even looking at the
> > entire device that the pv was one.
> >
> > If it is a dos partition table then:
> > save a backup of each disk first (always a good idea, you can dd it
> > back if you screwed up so long as you carefully create the backups and
> > name them properly).
> > dd if=/dev/sdx of=/root/sdxbackup.img bs=512 count=1
> > then clear the partition table space, rebooting is probably the
> > easiest way to clear out the mappings, it can be done without
> > rebooting but I have to do it trial and error to figure out the exact
> > order and commands.
> > dd if=/dev/zero of=/dev/sdX bs=512 count=1
> >
> > On Tue, Jul 14, 2020 at 6:11 PM antlists <antlists@youngman.org.uk> wrote:
> >> On 14/07/2020 20:50, Adam Barnett wrote:
> >>> Forcing the assembly does not work:
> >>>
> >>> $ sudo mdadm /dev/md1 --assemble --force /dev/sd[ijmop]1 /dev/sd[kln]
> >>> mdadm: /dev/sdi1 is busy - skipping
> >>> mdadm: /dev/sdj1 is busy - skipping
> >>> mdadm: /dev/sdm1 is busy - skipping
> >>> mdadm: /dev/sdo1 is busy - skipping
> >>> mdadm: /dev/sdp1 is busy - skipping
> >>> mdadm: Cannot assemble mbr metadata on /dev/sdk
> >>> mdadm: /dev/sdk has no superblock - assembly aborted
> >> Did you do an "mdadm --stop /dev/md1" before trying that? That's a
> >> classic error from an array that's previously been partially assembled
> >> and failed ...
> >>
> >> The other thing I'd do is make sure there aren't any other unepected
> >> partially assembled arrays. I doubt it applies here, but I have come
> >> across mirrors that get broken in half and you get two failed arrays
> >> instead of one working one ...
> >>
> >> Cheers,
> >> Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape using drives not partitions, RAID gone after reboot
  2020-07-15  0:24     ` antlists
@ 2020-07-15 20:48       ` AdsGroup
  0 siblings, 0 replies; 7+ messages in thread
From: AdsGroup @ 2020-07-15 20:48 UTC (permalink / raw)
  To: antlists, Roger Heflin; +Cc: Adam Barnett, Linux RAID

On 2020-07-14 6:24 p.m., antlists wrote:
> On 15/07/2020 00:27, Roger Heflin wrote:
>> Did you create the partition before you added the disk to mdadm or
>> after?  If after was it a dos or a gpt?  Dos should have only cleared
>> the first 512byte block.  If gpt it will have written to the first
>> block and to at least 1 more location on the disk, possibly causing
>> data loss.
>>
>> If before then you at least need to get rid of the partition table
>> completely.   Having a partition on a device will often cause a number
>> of things to ignore the whole disk.  I have debugged "lost" pv's where
>> the partition effectively "blocked" lvm from even looking at the
>> entire device that the pv was one.
>
> If an explicit assemble works, then if you can get hold of a temporary 
> spare/loan disk, I'd slowly move the new disks across to partitions by 
> doing a --replace, not a --remove / --add. A replace will both keep 
> the array protected against failure, and also not stress the array 
> because it will just copy the old disk to the new, rather than 
> rebuilding the new disk from all the others.
>
> I'm not sure about the commands, but iirc mdadm has a 
> --wipe-superblock command or something, as does fdisk have something 
> to wipe a gpt, so make sure you clear that stuff out before 
> re-initialising a disk.
>
> Cheers,
> Wol

The mdadm command is --zero-superblock.

Gdisk has an expert command (option x) called zap (z) that wipes both 
the gpt and mbr.

I also in addition use dd when 're-using/re-purposing' a disk.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-15 20:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-14 19:50 Reshape using drives not partitions, RAID gone after reboot Adam Barnett
2020-07-14 22:54 ` antlists
2020-07-14 23:27   ` Roger Heflin
2020-07-15  0:24     ` antlists
2020-07-15 20:48       ` AdsGroup
2020-07-15  2:28     ` Adam Barnett
2020-07-15 16:10       ` Roger Heflin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.