All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID header in XFS area?
@ 2017-11-04 18:10 David F.
  2017-11-04 18:30 ` Wols Lists
  0 siblings, 1 reply; 12+ messages in thread
From: David F. @ 2017-11-04 18:10 UTC (permalink / raw)
  To: linux-raid

Question,  We had a customer remove a drive from a NAS device that as
mirrored using mdadm, the file system id for the partitions were 0xFD
(linux raid automount). The put it on a USB port and booted Linux
which attempts to mount any RAID devices.  The XFS had some issues, so
looking at it I see some type of RAID header for MyBook:2 at offset
4K.   Searching Internet on mdadm found:

Version 1.2: The superblock is 4 KiB after the beginning of the device.

I wouldn't think the RAID area would be available to the file system,
but assuming so, there must be some type of way to find out where the
real data for that went?   Or perhaps mdadm messed it up when trying
to mount and the other drive didn't exist.  Here details of it.

Output of 'fdisk -l' (using fdisk v2.25.2 from util-linux for GPT support)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x6212a1ca

Device     Boot Start       End   Sectors   Size Id Type
/dev/sda1  *       63 468857024 468856962 223.6G  7 HPFS/NTFS/exFAT

Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000300

Device     Boot   Start        End    Sectors   Size Id Type
/dev/sdb1         64320    3984191    3919872   1.9G fd Linux raid autodetect
/dev/sdb2       3984192    4498175     513984   251M fd Linux raid autodetect
/dev/sdb3       4498176    6474175    1976000 964.9M fd Linux raid autodetect
/dev/sdb4       6474176 1953520064 1947045889 928.4G fd Linux raid autodetect

Disk /dev/sdd: 1.9 GiB, 2038431744 bytes, 3981312 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc3072e18

Device     Boot Start     End Sectors  Size Id Type
/dev/sdd1  *       64 3981311 3981248  1.9G  c W95 FAT32 (LBA)

Disk /dev/sdc: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x09b2e054




Contents of /proc/mdstat (Linux software RAID status):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath]
md2 : inactive sdb4[1](S)
      973522808 blocks super 1.2

md3 : inactive sdb3[1](S)
      987904 blocks

md1 : inactive sdb2[1](S)
      256896 blocks

md0 : inactive sdb1[1](S)
      1959872 blocks

unused devices: <none>

Contents of /run/mdadm/map (Linux software RAID arrays):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
md2 1.2 8cc3c938:cd679958:86493280:2ae3f5f1 /dev/md/2
md1 0.90 5972f4e9:22fa2576:3b6d8c49:0838e6b9 /dev/md1
md0 0.90 06ba2d0c:8282ab7e:3b6d8c49:0838e6b9 /dev/md0
md3 0.90 f18a5702:7247eda1:3b6d8c49:0838e6b9 /dev/md3

Contents of /etc/mdadm/mdadm.conf (Linux software RAID config file):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

ARRAY /dev/md0 UUID=06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
ARRAY /dev/md1 UUID=5972f4e9:22fa2576:3b6d8c49:0838e6b9
ARRAY /dev/md3 UUID=f18a5702:7247eda1:3b6d8c49:0838e6b9
ARRAY /dev/md/2  metadata=1.2 UUID=38c9c38c:589967cd:80324986:f1f5e32a
name=MyBook:2

Contents of mdadm.txt (mdadm troubleshooting data captured when
'start-md' is executed):
mdadm - v3.4 - 28th January 2016
Output of 'mdadm --examine --scan'
ARRAY /dev/md0 UUID=06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
ARRAY /dev/md1 UUID=5972f4e9:22fa2576:3b6d8c49:0838e6b9
ARRAY /dev/md3 UUID=f18a5702:7247eda1:3b6d8c49:0838e6b9
ARRAY /dev/md/2  metadata=1.2 UUID=38c9c38c:589967cd:80324986:f1f5e32a
name=MyBook:2
Output of 'mdadm --assemble --scan --no-degraded -v'
mdadm: looking for devices for /dev/md0
mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: Cannot assemble mbr metadata on /dev/sdd1
mdadm: Cannot assemble mbr metadata on /dev/sdd
mdadm: /dev/sdb4 has wrong uuid.
mdadm: /dev/sdb3 has wrong uuid.
mdadm: /dev/sdb2 has wrong uuid.
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sr0 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sr0
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: /dev/md0 assembled from 1 drive (out of 2), but not started.
mdadm: looking for devices for /dev/md1
mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: Cannot assemble mbr metadata on /dev/sdd1
mdadm: Cannot assemble mbr metadata on /dev/sdd
mdadm: /dev/sdb4 has wrong uuid.
mdadm: /dev/sdb3 has wrong uuid.
mdadm: /dev/sdb1 has wrong uuid.
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sr0 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sr0
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdb2 is identified as a member of /dev/md1, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md1
mdadm: added /dev/sdb2 to /dev/md1 as 1
mdadm: /dev/md1 assembled from 1 drive (out of 2), but not started.
mdadm: looking for devices for /dev/md3
mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: Cannot assemble mbr metadata on /dev/sdd1
mdadm: Cannot assemble mbr metadata on /dev/sdd
mdadm: /dev/sdb4 has wrong uuid.
mdadm: /dev/sdb2 has wrong uuid.
mdadm: /dev/sdb1 has wrong uuid.
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sr0 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sr0
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdb3 is identified as a member of /dev/md3, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md3
mdadm: added /dev/sdb3 to /dev/md3 as 1
mdadm: /dev/md3 assembled from 1 drive (out of 2), but not started.
mdadm: looking for devices for /dev/md/2
mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdc
mdadm: No super block found on /dev/sdd1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdd1
mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdd
mdadm: No super block found on /dev/sdb3 (Expected magic a92b4efc, got 0000003e)
mdadm: no RAID superblock on /dev/sdb3
mdadm: No super block found on /dev/sdb2 (Expected magic a92b4efc, got 37383333)
mdadm: no RAID superblock on /dev/sdb2
mdadm: No super block found on /dev/sdb1 (Expected magic a92b4efc, got 00000002)
mdadm: no RAID superblock on /dev/sdb1
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 45300000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sr0 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sr0
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 000014b9)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdb4 is identified as a member of /dev/md/2, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md/2
mdadm: added /dev/sdb4 to /dev/md/2 as 1
mdadm: /dev/md/2 assembled from 1 drive (out of 2), but not started.
Output of 'dmesg | grep md:'
md: md0 stopped.
md: md1 stopped.
md: md3 stopped.
md: md2 stopped.


Output of 'mdadm --examine /dev/sda'
/dev/sda:
   MBR Magic : aa55
Partition[0] :    468856962 sectors at           63 (type 07)


Output of 'mdadm --examine /dev/sdb'
/dev/sdb:
   MBR Magic : aa55
Partition[0] :      3919872 sectors at        64320 (type fd)
Partition[1] :       513984 sectors at      3984192 (type fd)
Partition[2] :      1976000 sectors at      4498176 (type fd)
Partition[3] :   1947045889 sectors at      6474176 (type fd)


Output of 'mdadm --examine /dev/sdc'
/dev/sdc:
   MBR Magic : aa55


Output of 'mdadm --examine /dev/sdd'
/dev/sdd:
   MBR Magic : aa55
Partition[0] :      3981248 sectors at           64 (type 0c)


Output of 'mdadm --detail /dev/md*', if any:
/dev/md0:
        Version : 0.90
     Raid Level : raid0
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

          State : inactive

           UUID : 06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
         Events : 0.6876

    Number   Major   Minor   RaidDevice

       -       8       17        -        /dev/sdb1
/dev/md1:
        Version : 0.90
     Raid Level : raid0
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

          State : inactive

           UUID : 5972f4e9:22fa2576:3b6d8c49:0838e6b9
         Events : 0.2

    Number   Major   Minor   RaidDevice

       -       8       18        -        /dev/sdb2
/dev/md2:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 1
    Persistence : Superblock is persistent

          State : inactive

           Name : MyBook:2
           UUID : 38c9c38c:589967cd:80324986:f1f5e32a
         Events : 2

    Number   Major   Minor   RaidDevice

       -       8       20        -        /dev/sdb4
/dev/md3:
        Version : 0.90
     Raid Level : raid0
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

          State : inactive

           UUID : f18a5702:7247eda1:3b6d8c49:0838e6b9
         Events : 0.1200

    Number   Major   Minor   RaidDevice

       -       8       19        -        /dev/sdb3

Contents of /dev/mapper directory:
crw------T    1 root     root       10, 236 Jul  8 11:50 control

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 18:10 RAID header in XFS area? David F.
@ 2017-11-04 18:30 ` Wols Lists
  2017-11-04 18:34   ` Reindl Harald
       [not found]   ` <CAGRSmLuoauKaSZ5Z73+Tg19e_1q9Tc-A0ZjqMgr4Lv9Tfer6QQ@mail.gmail.com>
  0 siblings, 2 replies; 12+ messages in thread
From: Wols Lists @ 2017-11-04 18:30 UTC (permalink / raw)
  To: David F., linux-raid

On 04/11/17 18:10, David F. wrote:
> Question,  We had a customer remove a drive from a NAS device that as
> mirrored using mdadm, the file system id for the partitions were 0xFD
> (linux raid automount). The put it on a USB port and booted Linux
> which attempts to mount any RAID devices.  The XFS had some issues, so
> looking at it I see some type of RAID header for MyBook:2 at offset
> 4K.   Searching Internet on mdadm found:

First things first. DO NOT mount the array read/write over a USB
connection. There's a good chance you'll regret it (raid and USB don't
like each other).
> 
> Version 1.2: The superblock is 4 KiB after the beginning of the device.
> 
> I wouldn't think the RAID area would be available to the file system,
> but assuming so, there must be some type of way to find out where the
> real data for that went?   Or perhaps mdadm messed it up when trying
> to mount and the other drive didn't exist.  Here details of it.

mdadm did exactly what it is supposed to do. A mirror with one drive is
degraded, so it assembled the array AND STOPPED. Once you force it past
this point, I think it will happily go past again no problem, but it's
designed to refuse to proceed with a damaged array, if the array was
fully okay previous time.

So, in other words, the disk and everything else is fine.

What's happened is that mdadm has assembled the array, realised a disk
is missing, AND STOPPED.

What should happen next is that the array runs, so you need to do
mdadm --run /dev/md0
or something like that. You may well need to add the --force option.

Finally you need to mount the array
mount /dev/md0 /mnt READ ONLY !!!
Sorry, I don't know the correct option for read only

At this point, your filesystem should be available for access.
Everything's fine, mdadm is just playing it safe, because all it knows
is that a disk has disappeared.

And you need to play it safe, because USB places the array in danger.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 18:30 ` Wols Lists
@ 2017-11-04 18:34   ` Reindl Harald
  2017-11-04 19:27     ` Wol's lists
  2017-11-06 21:31     ` Phil Turmel
       [not found]   ` <CAGRSmLuoauKaSZ5Z73+Tg19e_1q9Tc-A0ZjqMgr4Lv9Tfer6QQ@mail.gmail.com>
  1 sibling, 2 replies; 12+ messages in thread
From: Reindl Harald @ 2017-11-04 18:34 UTC (permalink / raw)
  To: Wols Lists, David F., linux-raid



Am 04.11.2017 um 19:30 schrieb Wols Lists:
> mdadm did exactly what it is supposed to do. A mirror with one drive is
> degraded, so it assembled the array AND STOPPED. Once you force it past
> this point, I think it will happily go past again no problem, but it's
> designed to refuse to proceed with a damaged array, if the array was
> fully okay previous time.
> 
> So, in other words, the disk and everything else is fine.
> 
> What's happened is that mdadm has assembled the array, realised a disk
> is missing, AND STOPPED.

why would it be supposed that a simple mirror with a mising disk is 
stopped while the whole point of mirroring is to not care about one of 
the disks dying?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 18:34   ` Reindl Harald
@ 2017-11-04 19:27     ` Wol's lists
  2017-11-04 20:36       ` Reindl Harald
  2017-11-06 21:31     ` Phil Turmel
  1 sibling, 1 reply; 12+ messages in thread
From: Wol's lists @ 2017-11-04 19:27 UTC (permalink / raw)
  To: Reindl Harald, David F., linux-raid

On 04/11/17 18:34, Reindl Harald wrote:
>> What's happened is that mdadm has assembled the array, realised a disk
>> is missing, AND STOPPED.
> 
> why would it be supposed that a simple mirror with a mising disk is 
> stopped while the whole point of mirroring is to not care about one of 
> the disks dying?

While the system is RUNNING, yes. But if the array is STOPPED, mdadm 
will refuse to start it. At least, that's certainly how I understand it 
works ...

Do you REALLY want the system to be running, and give you no clue that 
it's not working properly?

Cheers,
Wol

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 19:27     ` Wol's lists
@ 2017-11-04 20:36       ` Reindl Harald
  2017-11-04 21:54         ` Wols Lists
  0 siblings, 1 reply; 12+ messages in thread
From: Reindl Harald @ 2017-11-04 20:36 UTC (permalink / raw)
  To: Wol's lists, David F., linux-raid



Am 04.11.2017 um 20:27 schrieb Wol's lists:
> On 04/11/17 18:34, Reindl Harald wrote:
>>> What's happened is that mdadm has assembled the array, realised a disk
>>> is missing, AND STOPPED.
>>
>> why would it be supposed that a simple mirror with a mising disk is 
>> stopped while the whole point of mirroring is to not care about one of 
>> the disks dying?
> 
> While the system is RUNNING, yes. But if the array is STOPPED, mdadm 
> will refuse to start it. At least, that's certainly how I understand it 
> works ...
> 
> Do you REALLY want the system to be running, and give you no clue that 
> it's not working properly?

mdmon typically writes a mail about a degraded array - so far away from 
"no clue"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 20:36       ` Reindl Harald
@ 2017-11-04 21:54         ` Wols Lists
  2017-11-05  3:34           ` Reindl Harald
  0 siblings, 1 reply; 12+ messages in thread
From: Wols Lists @ 2017-11-04 21:54 UTC (permalink / raw)
  To: Reindl Harald, linux-raid

On 04/11/17 20:36, Reindl Harald wrote:
>>
>> While the system is RUNNING, yes. But if the array is STOPPED, mdadm
>> will refuse to start it. At least, that's certainly how I understand
>> it works ...
>>
>> Do you REALLY want the system to be running, and give you no clue that
>> it's not working properly?
> 
> mdmon typically writes a mail about a degraded array - so far away from
> "no clue"

mdmon? What's that? Yes, I know, it's the monitor. But how do you know
whether or not it's running? It certainly isn't on my system.

Anyways, as I said, as far as I am aware, "mdadm --run" does NOT work on
a degraded array unless either the previous --run was on a degraded
array, or you use --force.

Which means if you remove a disk from an array, and then try to restart
the array, the restart will fail. Which is exactly the OP's scenario.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
       [not found]   ` <CAGRSmLuoauKaSZ5Z73+Tg19e_1q9Tc-A0ZjqMgr4Lv9Tfer6QQ@mail.gmail.com>
@ 2017-11-04 22:55     ` Wol's lists
       [not found]       ` <CAGRSmLvou+yEb2VLJoounbuiEdfrPSEC+8xBtdp9nfOpj8y-8Q@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Wol's lists @ 2017-11-04 22:55 UTC (permalink / raw)
  To: David F., mdraid

On 04/11/17 21:53, David F. wrote:
> thanks, but what about the RAID header being in the file system area?
> What happened to the actual sector data that belongs there (is there a
> way to find it from the raid header?) when not a md device?
> 
Sorry but I don't think You've quite grasped what a device is.

Let's start with sdb, the drive you're concerned about. The first 1MB is 
reserved space, the very first 512B is special because it contains the 
MBR, which defines the sub-devices sdb1, sdb2, sdb3 and sdb4.

Then mdadm comes along, and is given sdb1 and sd?1. It reserves the 
first few megs of the devices it's given (just like fdisk reserves the 
first meg), writes the superblock at position 4K (just like fdisk writes 
the MBR at position 0), and then just like the MBR defines sdb1 as 
starting at sector 2049 of sdb, so the superblock defines md0 as 
starting at a certain offset into sdb1. So that superblock will tell you 
where on the disk your filesystem actually starts.

WARNING - unless your superblock is 1.0 (and maybe even then) the start 
of your filesystem will move around if you add or remove devices.

In other words, just as on a normal disk the filesystem doesn't start at 
the beginning of the disk because the MBR is in the way, an array does 
not start at the beginning of the partition because the superblock is in 
the way.

You'll either need to use your knowledge of XFS internals to find the 
start of the filesystem, look at mdadm and work out how to read the 
superblock so it tells you, or just force-assemble the array!

But I think I'm on very safe ground saying your filesystem is safely 
there. It's just not where you think it is because you haven't grasped 
how raid works at the disk level.

Cheers,
Wol
> 
> On Sat, Nov 4, 2017 at 11:30 AM, Wols Lists <antlists@youngman.org.uk> wrote:
>> On 04/11/17 18:10, David F. wrote:
>>> Question,  We had a customer remove a drive from a NAS device that as
>>> mirrored using mdadm, the file system id for the partitions were 0xFD
>>> (linux raid automount). The put it on a USB port and booted Linux
>>> which attempts to mount any RAID devices.  The XFS had some issues, so
>>> looking at it I see some type of RAID header for MyBook:2 at offset
>>> 4K.   Searching Internet on mdadm found:
>>
>> First things first. DO NOT mount the array read/write over a USB
>> connection. There's a good chance you'll regret it (raid and USB don't
>> like each other).
>>>
>>> Version 1.2: The superblock is 4 KiB after the beginning of the device.
>>>
>>> I wouldn't think the RAID area would be available to the file system,
>>> but assuming so, there must be some type of way to find out where the
>>> real data for that went?   Or perhaps mdadm messed it up when trying
>>> to mount and the other drive didn't exist.  Here details of it.
>>
>> mdadm did exactly what it is supposed to do. A mirror with one drive is
>> degraded, so it assembled the array AND STOPPED. Once you force it past
>> this point, I think it will happily go past again no problem, but it's
>> designed to refuse to proceed with a damaged array, if the array was
>> fully okay previous time.
>>
>> So, in other words, the disk and everything else is fine.
>>
>> What's happened is that mdadm has assembled the array, realised a disk
>> is missing, AND STOPPED.
>>
>> What should happen next is that the array runs, so you need to do
>> mdadm --run /dev/md0
>> or something like that. You may well need to add the --force option.
>>
>> Finally you need to mount the array
>> mount /dev/md0 /mnt READ ONLY !!!
>> Sorry, I don't know the correct option for read only
>>
>> At this point, your filesystem should be available for access.
>> Everything's fine, mdadm is just playing it safe, because all it knows
>> is that a disk has disappeared.
>>
>> And you need to play it safe, because USB places the array in danger.
>>
>> Cheers,
>> Wol
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
       [not found]         ` <CAGRSmLuBEUynKNirFi9FuoJz82F4hDimmJWZSSfpQhoOi_9Rog@mail.gmail.com>
@ 2017-11-05  2:12           ` David F.
  2017-11-05  9:16             ` Wols Lists
  0 siblings, 1 reply; 12+ messages in thread
From: David F. @ 2017-11-05  2:12 UTC (permalink / raw)
  To: linux-raid

gmail started doing private replies for some reason..

Anyway, looking deeper found it.  That partition xfs information was
old left over items.  Searching for another header was found further
up, at byte offset 22000h (sector 110h), and looking at the RAID
header area, found bytes for 110h which must be a pointer to to where
the data starts (don't have the mdadm struct available).   Does anyone
have the RAID structure available using signature A92B4EFCh ?

So the old XFS information was confusing the whole situation.


On Sat, Nov 4, 2017 at 6:58 PM, David F. <df7729@gmail.com> wrote:
> Oh shoot, forgot to mention.  The customer did the mdadm --run --force
> /dev/md2 (or may have been /dev/md/2) but when trying to access it
> read errors. ?
>
> On Sat, Nov 4, 2017 at 6:55 PM, David F. <df7729@gmail.com> wrote:
>> That's what I would expect, which is why it's weird that that
>> signature for metadata 1.2 was 4K within the XFS partition itself (the
>> XFS partition started after a bunch of other partitions at LBA 6474176
>> and the xfs superblock is there (the RAID data is at LBA 6474184).
>> The information in that report also show that when it looked at
>> /dev/sdb4 it found metadata 1.2 ?? I'll see if there is another xfs
>> header after that location.
>>
>> ARRAY /dev/md0 UUID=06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
>> ARRAY /dev/md1 UUID=5972f4e9:22fa2576:3b6d8c49:0838e6b9
>> ARRAY /dev/md3 UUID=f18a5702:7247eda1:3b6d8c49:0838e6b9
>> ARRAY /dev/md/2  metadata=1.2 UUID=38c9c38c:589967cd:80324986:f1f5e32a
>> name=MyBook:2
>>
>>
>> On Sat, Nov 4, 2017 at 3:55 PM, Wol's lists <antlists@youngman.org.uk> wrote:
>>> On 04/11/17 21:53, David F. wrote:
>>>>
>>>> thanks, but what about the RAID header being in the file system area?
>>>> What happened to the actual sector data that belongs there (is there a
>>>> way to find it from the raid header?) when not a md device?
>>>>
>>> Sorry but I don't think You've quite grasped what a device is.
>>>
>>> Let's start with sdb, the drive you're concerned about. The first 1MB is
>>> reserved space, the very first 512B is special because it contains the MBR,
>>> which defines the sub-devices sdb1, sdb2, sdb3 and sdb4.
>>>
>>> Then mdadm comes along, and is given sdb1 and sd?1. It reserves the first
>>> few megs of the devices it's given (just like fdisk reserves the first meg),
>>> writes the superblock at position 4K (just like fdisk writes the MBR at
>>> position 0), and then just like the MBR defines sdb1 as starting at sector
>>> 2049 of sdb, so the superblock defines md0 as starting at a certain offset
>>> into sdb1. So that superblock will tell you where on the disk your
>>> filesystem actually starts.
>>>
>>> WARNING - unless your superblock is 1.0 (and maybe even then) the start of
>>> your filesystem will move around if you add or remove devices.
>>>
>>> In other words, just as on a normal disk the filesystem doesn't start at the
>>> beginning of the disk because the MBR is in the way, an array does not start
>>> at the beginning of the partition because the superblock is in the way.
>>>
>>> You'll either need to use your knowledge of XFS internals to find the start
>>> of the filesystem, look at mdadm and work out how to read the superblock so
>>> it tells you, or just force-assemble the array!
>>>
>>> But I think I'm on very safe ground saying your filesystem is safely there.
>>> It's just not where you think it is because you haven't grasped how raid
>>> works at the disk level.
>>>
>>> Cheers,
>>> Wol
>>>
>>>>
>>>> On Sat, Nov 4, 2017 at 11:30 AM, Wols Lists <antlists@youngman.org.uk>
>>>> wrote:
>>>>>
>>>>> On 04/11/17 18:10, David F. wrote:
>>>>>>
>>>>>> Question,  We had a customer remove a drive from a NAS device that as
>>>>>> mirrored using mdadm, the file system id for the partitions were 0xFD
>>>>>> (linux raid automount). The put it on a USB port and booted Linux
>>>>>> which attempts to mount any RAID devices.  The XFS had some issues, so
>>>>>> looking at it I see some type of RAID header for MyBook:2 at offset
>>>>>> 4K.   Searching Internet on mdadm found:
>>>>>
>>>>>
>>>>> First things first. DO NOT mount the array read/write over a USB
>>>>> connection. There's a good chance you'll regret it (raid and USB don't
>>>>> like each other).
>>>>>>
>>>>>>
>>>>>> Version 1.2: The superblock is 4 KiB after the beginning of the device.
>>>>>>
>>>>>> I wouldn't think the RAID area would be available to the file system,
>>>>>> but assuming so, there must be some type of way to find out where the
>>>>>> real data for that went?   Or perhaps mdadm messed it up when trying
>>>>>> to mount and the other drive didn't exist.  Here details of it.
>>>>>
>>>>>
>>>>> mdadm did exactly what it is supposed to do. A mirror with one drive is
>>>>> degraded, so it assembled the array AND STOPPED. Once you force it past
>>>>> this point, I think it will happily go past again no problem, but it's
>>>>> designed to refuse to proceed with a damaged array, if the array was
>>>>> fully okay previous time.
>>>>>
>>>>> So, in other words, the disk and everything else is fine.
>>>>>
>>>>> What's happened is that mdadm has assembled the array, realised a disk
>>>>> is missing, AND STOPPED.
>>>>>
>>>>> What should happen next is that the array runs, so you need to do
>>>>> mdadm --run /dev/md0
>>>>> or something like that. You may well need to add the --force option.
>>>>>
>>>>> Finally you need to mount the array
>>>>> mount /dev/md0 /mnt READ ONLY !!!
>>>>> Sorry, I don't know the correct option for read only
>>>>>
>>>>> At this point, your filesystem should be available for access.
>>>>> Everything's fine, mdadm is just playing it safe, because all it knows
>>>>> is that a disk has disappeared.
>>>>>
>>>>> And you need to play it safe, because USB places the array in danger.
>>>>>
>>>>> Cheers,
>>>>> Wol
>>>>
>>>>
>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 21:54         ` Wols Lists
@ 2017-11-05  3:34           ` Reindl Harald
  0 siblings, 0 replies; 12+ messages in thread
From: Reindl Harald @ 2017-11-05  3:34 UTC (permalink / raw)
  To: Wols Lists, linux-raid



Am 04.11.2017 um 22:54 schrieb Wols Lists:
> On 04/11/17 20:36, Reindl Harald wrote:
>>>
>>> While the system is RUNNING, yes. But if the array is STOPPED, mdadm
>>> will refuse to start it. At least, that's certainly how I understand
>>> it works ...
>>>
>>> Do you REALLY want the system to be running, and give you no clue that
>>> it's not working properly?
>>
>> mdmon typically writes a mail about a degraded array - so far away from
>> "no clue"
> 
> mdmon? What's that? Yes, I know, it's the monitor. But how do you know
> whether or not it's running? It certainly isn't on my system.

but it should and it#s the responsibility of the admin to ensue it runs 
and it#s mails are properly routed so that you receive them

-------- Weitergeleitete Nachricht --------
Betreff: Fail event on /dev/md0:srv-rhsoft.rhsoft.net
Datum: Wed,  3 Dec 2014 11:32:35 +0100 (CET)
Von: mdadm monitoring <root@srv-rhsoft.rhsoft.net>
An: root@srv-rhsoft.rhsoft.net

This is an automatically generated mail message from mdadm
running on srv-rhsoft.rhsoft.net

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

md0 : active raid1 sda1[4] sdb1[3](F) sdd1[0] sdc1[5]
       511988 blocks super 1.0 [4/3] [UUU_]
       unused devices: <none>

> Anyways, as I said, as far as I am aware, "mdadm --run" does NOT work on
> a degraded array unless either the previous --run was on a degraded
> array, or you use --force.
> 
> Which means if you remove a disk from an array, and then try to restart
> the array, the restart will fail. Which is exactly the OP's scenario

and that's what i do not get in general - why should a array not start 
in whatever situation if it has enough drives to do so?

i don't get why a mirrored array should stop at all just becsue it is 
degraded but maybe we talk about completly different things

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-05  2:12           ` David F.
@ 2017-11-05  9:16             ` Wols Lists
  2017-11-05 15:59               ` David F.
  0 siblings, 1 reply; 12+ messages in thread
From: Wols Lists @ 2017-11-05  9:16 UTC (permalink / raw)
  To: David F., linux-raid

On 05/11/17 02:12, David F. wrote:
> gmail started doing private replies for some reason..
> 
> Anyway, looking deeper found it.  That partition xfs information was
> old left over items.  Searching for another header was found further
> up, at byte offset 22000h (sector 110h), and looking at the RAID
> header area, found bytes for 110h which must be a pointer to to where
> the data starts (don't have the mdadm struct available).   Does anyone
> have the RAID structure available using signature A92B4EFCh ?
> 
> So the old XFS information was confusing the whole situation.
> 
No surprise. Old data does that :-( Why I always prefer "dd if=/dev/zero
of=/dev/sdx" to clear a device. It just takes so long ...

What really worried me was if they'd created the array over the
partitions, then accidentally created XFS on the partitions. That would
have crashed at the first reboot, but there's a good chance that if they
didn't reboot it would have run and run until ...

If you want the raid structure, download mdadm and read the source. I'll
probably document it on the wiki, but I need to read and understand the
source first, too.

As for accessing the data, md2 and md/2 are the same thing :-) Raid is
moving to named arrays rather than default numbers. Can they run a fsck
equivalent over the filesystem? Read-only of course, just to see whether
it's minimally damaged or there's something more seriously wrong.

Cheers,
Wol
> 
> On Sat, Nov 4, 2017 at 6:58 PM, David F. <df7729@gmail.com> wrote:
>> Oh shoot, forgot to mention.  The customer did the mdadm --run --force
>> /dev/md2 (or may have been /dev/md/2) but when trying to access it
>> read errors. ?
>>
>> On Sat, Nov 4, 2017 at 6:55 PM, David F. <df7729@gmail.com> wrote:
>>> That's what I would expect, which is why it's weird that that
>>> signature for metadata 1.2 was 4K within the XFS partition itself (the
>>> XFS partition started after a bunch of other partitions at LBA 6474176
>>> and the xfs superblock is there (the RAID data is at LBA 6474184).
>>> The information in that report also show that when it looked at
>>> /dev/sdb4 it found metadata 1.2 ?? I'll see if there is another xfs
>>> header after that location.
>>>
>>> ARRAY /dev/md0 UUID=06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
>>> ARRAY /dev/md1 UUID=5972f4e9:22fa2576:3b6d8c49:0838e6b9
>>> ARRAY /dev/md3 UUID=f18a5702:7247eda1:3b6d8c49:0838e6b9
>>> ARRAY /dev/md/2  metadata=1.2 UUID=38c9c38c:589967cd:80324986:f1f5e32a
>>> name=MyBook:2
>>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-05  9:16             ` Wols Lists
@ 2017-11-05 15:59               ` David F.
  0 siblings, 0 replies; 12+ messages in thread
From: David F. @ 2017-11-05 15:59 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

would be good if when building the RAID with sb at 4K that they clear
the first 4K when within a partition.

On Sun, Nov 5, 2017 at 1:16 AM, Wols Lists <antlists@youngman.org.uk> wrote:
> On 05/11/17 02:12, David F. wrote:
>> gmail started doing private replies for some reason..
>>
>> Anyway, looking deeper found it.  That partition xfs information was
>> old left over items.  Searching for another header was found further
>> up, at byte offset 22000h (sector 110h), and looking at the RAID
>> header area, found bytes for 110h which must be a pointer to to where
>> the data starts (don't have the mdadm struct available).   Does anyone
>> have the RAID structure available using signature A92B4EFCh ?
>>
>> So the old XFS information was confusing the whole situation.
>>
> No surprise. Old data does that :-( Why I always prefer "dd if=/dev/zero
> of=/dev/sdx" to clear a device. It just takes so long ...
>
> What really worried me was if they'd created the array over the
> partitions, then accidentally created XFS on the partitions. That would
> have crashed at the first reboot, but there's a good chance that if they
> didn't reboot it would have run and run until ...
>
> If you want the raid structure, download mdadm and read the source. I'll
> probably document it on the wiki, but I need to read and understand the
> source first, too.
>
> As for accessing the data, md2 and md/2 are the same thing :-) Raid is
> moving to named arrays rather than default numbers. Can they run a fsck
> equivalent over the filesystem? Read-only of course, just to see whether
> it's minimally damaged or there's something more seriously wrong.
>
> Cheers,
> Wol
>>
>> On Sat, Nov 4, 2017 at 6:58 PM, David F. <df7729@gmail.com> wrote:
>>> Oh shoot, forgot to mention.  The customer did the mdadm --run --force
>>> /dev/md2 (or may have been /dev/md/2) but when trying to access it
>>> read errors. ?
>>>
>>> On Sat, Nov 4, 2017 at 6:55 PM, David F. <df7729@gmail.com> wrote:
>>>> That's what I would expect, which is why it's weird that that
>>>> signature for metadata 1.2 was 4K within the XFS partition itself (the
>>>> XFS partition started after a bunch of other partitions at LBA 6474176
>>>> and the xfs superblock is there (the RAID data is at LBA 6474184).
>>>> The information in that report also show that when it looked at
>>>> /dev/sdb4 it found metadata 1.2 ?? I'll see if there is another xfs
>>>> header after that location.
>>>>
>>>> ARRAY /dev/md0 UUID=06ba2d0c:8282ab7e:3b6d8c49:0838e6b9
>>>> ARRAY /dev/md1 UUID=5972f4e9:22fa2576:3b6d8c49:0838e6b9
>>>> ARRAY /dev/md3 UUID=f18a5702:7247eda1:3b6d8c49:0838e6b9
>>>> ARRAY /dev/md/2  metadata=1.2 UUID=38c9c38c:589967cd:80324986:f1f5e32a
>>>> name=MyBook:2
>>>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID header in XFS area?
  2017-11-04 18:34   ` Reindl Harald
  2017-11-04 19:27     ` Wol's lists
@ 2017-11-06 21:31     ` Phil Turmel
  1 sibling, 0 replies; 12+ messages in thread
From: Phil Turmel @ 2017-11-06 21:31 UTC (permalink / raw)
  To: Reindl Harald, Wols Lists, David F., linux-raid

On 11/04/2017 02:34 PM, Reindl Harald wrote:
> Am 04.11.2017 um 19:30 schrieb Wols Lists:

>> What's happened is that mdadm has assembled the array, realised a 
>> disk is missing, AND STOPPED.
> 
> why would it be supposed that a simple mirror with a mising disk is 
> stopped while the whole point of mirroring is to not care about one 
> of the disks dying?

When a member device dies or is kicked out while running, the remaining
devices' superblocks are updated with that status.  (Can't update the
superblock on the one that died cause it's, you know, dead.)

If the system is rebooted at this point, mdadm can see on the
still-running drive that the missing drive is known to be failed and
will happily start up.

If for some reason the *other* drive wakes up and works on reboot, and
only that drive is working, mdadm sees that a drive is missing for an
unknown reason and stops.  This is to avoid the disaster known as "split
brain".

Split brain cannot be distinguished from the situation where a
previously non-degraded array is missing device(s) at startup, so mdadm
stops. Administrator input is needed to safely proceed.

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-11-06 21:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-04 18:10 RAID header in XFS area? David F.
2017-11-04 18:30 ` Wols Lists
2017-11-04 18:34   ` Reindl Harald
2017-11-04 19:27     ` Wol's lists
2017-11-04 20:36       ` Reindl Harald
2017-11-04 21:54         ` Wols Lists
2017-11-05  3:34           ` Reindl Harald
2017-11-06 21:31     ` Phil Turmel
     [not found]   ` <CAGRSmLuoauKaSZ5Z73+Tg19e_1q9Tc-A0ZjqMgr4Lv9Tfer6QQ@mail.gmail.com>
2017-11-04 22:55     ` Wol's lists
     [not found]       ` <CAGRSmLvou+yEb2VLJoounbuiEdfrPSEC+8xBtdp9nfOpj8y-8Q@mail.gmail.com>
     [not found]         ` <CAGRSmLuBEUynKNirFi9FuoJz82F4hDimmJWZSSfpQhoOi_9Rog@mail.gmail.com>
2017-11-05  2:12           ` David F.
2017-11-05  9:16             ` Wols Lists
2017-11-05 15:59               ` David F.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.