All of lore.kernel.org
 help / color / mirror / Atom feed
* Two disk RAID10 inactive on boot if partition is missing
@ 2016-05-17  1:52 Peter Kay
  2016-05-17  8:40 ` Mikael Abrahamsson
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Kay @ 2016-05-17  1:52 UTC (permalink / raw)
  To: linux-raid

I have a two disk RAID10 device consisting of a partition on one SSD
(/dev/sda12), and an entire other SSD (/dev/sdc). This is used to
create an mdraid device, acting as a cache for bcache.

This works fine if it boots up with the ssd and partition, and also if
one device goes offline whilst the system is still up, and then the
device is added again.

If the entire SSD device is missing on boot, the mdraid fails to exist
(/dev/md/mdcache is not present), saying it's inactive and the
partition device is a spare, thus causing the bcache to fail to exist
also.

One factor, that probably is not significant due to the metadata, is
that the dev/sdc will be mapped by the kernel to another disk if the
SSD does not exist on startup i.e. because the BIOS skips over it
/dev/sdc will now be a hard drive used by another disk array, not the
missing SSD. Shouldn't be a problem as the devices are not hardcoded :

I had no mdadm.conf, I've changed it to the following with no effect :

DEVICE /dev/sda12 /dev/sd*
ARRAY /dev/md/mdcache  metadata=1.2
UUID=e634085b:95d697c9:7a422bc2:c94b142d name=gladstone:mdcache
ARRAY /dev/md/mdbigraid  metadata=1.2
UUID=b5d09362:28d19835:21556221:36531da3 name=gladstone:mdbigraid

uname -a

Linux gladstone 4.5.2-xen #1 SMP PREEMPT Wed Apr 27 02:12:36 BST 2016
x86_64 Intel(R) Core(TM)2 Quad CPU    Q6700  @ 2.66GHz GenuineIntel
GNU/Linux

(This is a Xen dom0. Have tried with an earlier Linux kernel, under
bare metal : no difference)

/dev/md/mdcache:
        Version : 1.2
  Creation Time : Sun Apr 10 22:51:53 2016
     Raid Level : raid10
     Array Size : 117151744 (111.72 GiB 119.96 GB)
  Used Dev Size : 117151744 (111.72 GiB 119.96 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Tue May 17 01:14:29 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 512K

           Name : gladstone:mdcache  (local to host gladstone)
           UUID : e634085b:95d697c9:7a422bc2:c94b142d
         Events : 125

    Number   Major   Minor   RaidDevice State
       0       8       12        0      active sync   /dev/sda12
       2       8       32        1      active sync   /dev/sdc

A cat /proc/mdstat when it's working

(yes, testing a rebuild of another four disk raid10)

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath]
md126 : active raid10 sdf[0] sde[4] sdb[3] sdd[2]
      1953261568 blocks super 1.2 512K chunks 2 offset-copies [4/3] [U_UU]
      [=====>...............]  recovery = 25.0% (244969472/976630784)
finish=213.5min speed=57106K/sec

md127 : active raid10 sda12[0] sdc[2]
      117151744 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]

unused devices: <none>

mdadm --examine of the partition

/dev/sda12:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e634085b:95d697c9:7a422bc2:c94b142d
           Name : gladstone:mdcache  (local to host gladstone)
  Creation Time : Sun Apr 10 22:51:53 2016
     Raid Level : raid10
   Raid Devices : 2

 Avail Dev Size : 234305410 (111.73 GiB 119.96 GB)
     Array Size : 117151744 (111.72 GiB 119.96 GB)
  Used Dev Size : 234303488 (111.72 GiB 119.96 GB)
    Data Offset : 131072 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 35309acc:c28c8b78:360f2d49:e87870db

    Update Time : Tue May 17 01:29:23 2016
       Checksum : 90634a13 - correct
         Events : 125

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)

and disk

/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e634085b:95d697c9:7a422bc2:c94b142d
           Name : gladstone:mdcache  (local to host gladstone)
  Creation Time : Sun Apr 10 22:51:53 2016
     Raid Level : raid10
   Raid Devices : 2

 Avail Dev Size : 234310576 (111.73 GiB 119.97 GB)
     Array Size : 117151744 (111.72 GiB 119.96 GB)
  Used Dev Size : 234303488 (111.72 GiB 119.96 GB)
    Data Offset : 131072 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 76c6dba4:36e4a87a:ff54f25d:9e7a8970

    Update Time : Tue May 17 02:09:25 2016
       Checksum : 14a09cd9 - correct
         Events : 125

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)

The partition was of type 83. I have changed to type FD with no difference.


Any clues? I can blow this away/change the disk (sdc) to be a
partition so the raid isn't using a partition and a disk if necessary,
but I can't see why it shouldn't work and would rather get to the root
cause.

If I tell mdam to activate the raid I get no errors, and nothing happens.

Cheers!

PK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Two disk RAID10 inactive on boot if partition is missing
  2016-05-17  1:52 Two disk RAID10 inactive on boot if partition is missing Peter Kay
@ 2016-05-17  8:40 ` Mikael Abrahamsson
       [not found]   ` <CAN4OnoguQMvnuUr7mGYpTi7_SonZ42Z5TSigKaJ0ZfZuiup0Og@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Abrahamsson @ 2016-05-17  8:40 UTC (permalink / raw)
  To: Peter Kay; +Cc: linux-raid

On Tue, 17 May 2016, Peter Kay wrote:

> If the entire SSD device is missing on boot, the mdraid fails to exist 
> (/dev/md/mdcache is not present), saying it's inactive and the partition 
> device is a spare, thus causing the bcache to fail to exist also.

I didn't see this in your email, does this help?

https://help.ubuntu.com/community/Installation/SoftwareRAID#Boot_from_Degraded_Disk

Boot from Degraded Disk
If the default HDD fails then RAID will ask you to boot from a degraded 
disk. If your server is located in a remote area, the best practice may be 
to configure this to occur automatically:

edit /etc/initramfs-tools/conf.d/mdadm
change "BOOT_DEGRADED=false" to "BOOT_DEGRADED=true"
# Please provide URL to support claim: (this option is not supported from 
mdadm-3.2.5-5ubuntu3 / Ubuntu 14.04 onwards)

Additionally, this can be specified on the kernel boot line with the 
bootdegraded=[true|false]
You also can use #dpkg-reconfigure mdadm rather than CLI!

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Two disk RAID10 inactive on boot if partition is missing
       [not found]   ` <CAN4OnoguQMvnuUr7mGYpTi7_SonZ42Z5TSigKaJ0ZfZuiup0Og@mail.gmail.com>
@ 2016-05-17 10:13     ` Mikael Abrahamsson
  2016-05-17 13:31       ` Phil Turmel
  0 siblings, 1 reply; 5+ messages in thread
From: Mikael Abrahamsson @ 2016-05-17 10:13 UTC (permalink / raw)
  To: Peter Kay; +Cc: linux-raid

On Tue, 17 May 2016, Peter Kay wrote:

> I'm not trying to boot from it, and I'm using Salix (Slackware), not a 
> debian based distribution. My boot partitions are not raided.

The setting would still be valid if your distribution defaults to not 
assembling and using degraded arrays on boot.

A dmesg output from when "it doesn't work" would probably yield an answer 
if this is why it's not being started properly.

> If I tell mdam to activate the raid I get no errors, and nothing 
> happens.

It would be interesting to see what mdadm command options you use here, 
and mdadm output from this with verbose output (-v) turned on, and what is 
written in dmesg when you do this.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Two disk RAID10 inactive on boot if partition is missing
  2016-05-17 10:13     ` Mikael Abrahamsson
@ 2016-05-17 13:31       ` Phil Turmel
  2016-05-18  4:16         ` Peter Kay
  0 siblings, 1 reply; 5+ messages in thread
From: Phil Turmel @ 2016-05-17 13:31 UTC (permalink / raw)
  To: Mikael Abrahamsson, Peter Kay; +Cc: linux-raid

On 05/17/2016 06:13 AM, Mikael Abrahamsson wrote:
> On Tue, 17 May 2016, Peter Kay wrote:
> 
>> I'm not trying to boot from it, and I'm using Salix (Slackware), not a
>> debian based distribution. My boot partitions are not raided.
> 
> The setting would still be valid if your distribution defaults to not
> assembling and using degraded arrays on boot.

See the "Unclean Shutdown" section of "man md".

The kernel parameter you need is "md_mod.start_dirty_degraded=1".

Doing this is a really good way to end up with split brain.  Why do you
need to regularly boot without the devices that were present at shutdown?

Phil


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Two disk RAID10 inactive on boot if partition is missing
  2016-05-17 13:31       ` Phil Turmel
@ 2016-05-18  4:16         ` Peter Kay
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Kay @ 2016-05-18  4:16 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Mikael Abrahamsson, linux-raid

On 17 May 2016 at 14:31, Phil Turmel <philip@turmel.org> wrote:
> See the "Unclean Shutdown" section of "man md".
>
> The kernel parameter you need is "md_mod.start_dirty_degraded=1".
>
> Doing this is a really good way to end up with split brain.  Why do you
> need to regularly boot without the devices that were present at shutdown?
Thanks for the md reference.

I don't need to regularly boot with devices that were present at
shutdown, but this is RAID! Given that it's not a boot device, and
frankly even if it was, I would expect the default to be to start up
anyway and then to be able to replace the missing device to be an
option. Otherwise it removes a lot of the point of resilience.

I'll have a look at this further later tonight. The reason it is
failing from '-v' is because /dev/sda12 is 'busy'. Looks like I need
to read the docs further, I got it to come back up by using stop and
assemble, but yes looks like split brain is a thing when I do that.
Fortunately I'm not using this for anything important yet - I'm doing
this precisely so that I know what to do when a device does fail. Once
it's all working I'm going to leave it alone.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-18  4:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-17  1:52 Two disk RAID10 inactive on boot if partition is missing Peter Kay
2016-05-17  8:40 ` Mikael Abrahamsson
     [not found]   ` <CAN4OnoguQMvnuUr7mGYpTi7_SonZ42Z5TSigKaJ0ZfZuiup0Og@mail.gmail.com>
2016-05-17 10:13     ` Mikael Abrahamsson
2016-05-17 13:31       ` Phil Turmel
2016-05-18  4:16         ` Peter Kay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.