All of lore.kernel.org
 help / color / mirror / Atom feed
* md metadata nightmare
@ 2011-11-23  0:05 Kenneth Emerson
  2011-11-23  0:47 ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Kenneth Emerson @ 2011-11-23  0:05 UTC (permalink / raw)
  To: linux-raid

I am looking for any help I can get to explain what happened to me
this past week and what I can possibly do to really fix my problem.  I
apologize in advance for the long diatribe, but I don't want to be
accused of leaving out important details.  I am currently running
Ubuntu Lucid (10.4 LTS) 32 bit with mdadm version 3.1.4.

Original RAID configuration:

(4) 500GB drives partitioned into boot/root/swap/data
    sd[a-d]1 -> Boot
    sd[a-d]2 -> Root
    sd[a-d]3 -> Swap
    sd[a-d]4 -> LVM (data)

    sd[a-d][1-3] --> RAID1 (4 partitions) md[0-2]
    sd[a-d]4 --> RAID5 (4 partions) md3

1 year later:
Upgraded (4) 500GB drives to (4) 1000GB drives.
Replaced the 500GB drives one at a time, partitioned them and re-synced them.
After all drives replaced, did a grow operation on each of the RAID
devices (md[0-3]
Grew the file systems (md[0-1] -> ext3, md3 -> xfs)

1 year ago:
Added a fifth 1000GB drive as spare.
Upgraded mdadm to version 3.1.4 and performed a reshape of md3 from
RAID5 -> RAID6

3 months ago:
Upgraded (5) 1000GB drives to (5) 3000GB drives using the same
technique as the 500GB -> 1000GB replacement.
It was at this time that I experienced worrisome results.  The reshape
completed without a problem but after rebooting the kernel had
problems assembling the arrays.  I was dropped into the busybox
initramfs shell with strange arrays that were numbered something like
md125 -> md127.  I was able to stop the arrays (none were active) and
rebuild them manually by specifying the individual partitions for each
array.  After doing that and continuing the boot process, I updated my
mdadm.conf file (using mdadm --detail --scan) and then performed a
mkinitramfs to build a new initrd.img after which, I was able to boot
successfully with the correct md devices (md{0-3]).

This past week, one of the 3000GB drives began to fail.  The drives
are in a hot-swap cage and I removed the failed drive and
unintentionally powered down one of the other drives (sdb was the
failed drive sdd was the other drive that powered down).  Fortunately,
the array rebuilt the parity on sdd without any errors. At this time I
was running a degraded RAID6 missing one drive. I RMA'ed the drive and
used a spare 3000GB drive to restore the array to full health; no
problems here.

Several days later, it was necessary to reboot and things went to h,
e, double hockysticks in double time.  I ended up with the same md125
-> md127 arrays as I had seen previously, but the devices were even
more messed up.  Two of the devices (sda and sde) were in arrays as
the entire disk instead of  one of the five partitions I had made on
the disk (GPT style) and I was having trouble assembling them
manually.  Using the rescue CD, I tried to assemble the arrays and
then do a chroot to create a new initrd.img, but I found that my sda
drive was not being recognized as partitioned at all by the kernel;
however, if I went into parted and set one of the flags (that was
already set) and exit, the partitions did show up.  I was never
successful in building an initrd.img file that would boot successfully
building the arrays; always dropped into the busybox (BTW, my existing
kernel did see all of the sda partitions -- 2.6.32-33 lucid while the
rescuecd was 2.6.38).

Eventually, I was able to assemble all of the arrays in the busybox.
(aside: admitting that I had forgotten how to do a stop on the array
which led me to believe I couldn't rebuild them manually here).
However, loading LVM onto the RAID6 array failed.  Checking dmesg, the
kernel was complaining that the array was too small for the volume
group.  Checking the --examine on each of the partions, the size was
coming back at about 400+GB! It looked like I had the metadata (all
version 0.90) from the original RAID5 array with the 500GB drives.  It
was getting really late (2am), but I wanted to get this system mounted
and running, so, on a whim, I told mdadm to grow the array to maxsize
and (low and behold) the array size changed from the 1400GB to 7.5TB.
I thought all was well and good until I looked at mdstat on proc and
saw that the array was synch'ing.  My heart came into my throat as I
was thinking that it was wiping out everything above the 1400GB
original size, but I figured (correctly) better to let it finish than
try something foolish in the middle of the resync.  Getting some sleep
(resync took about 5 hours) came back to find the array healthy, still
7.4TB and all of the data was intact (better to be lucky than good
I've been told).

So here is the existing system: md0, md1 RAID1 with four drives; md3
RAID6 with 4 drives (one missing). I removed sda because it seemed to
be the most messed up and causing problems (just a guess).  Doing a
--examine on the drive (sda) and not any partition provided me with a
superblock and metadata.  The same for sde which I assume is why I saw
these drives in arrays (erroneously) by the kernel on a reboot.  I
intend to zero out the superblock(s) on the sda drive and re-add it to
the arrays, but I haven't done that yet (someone may want to see the
metadata on that drive first).

NOTE: I have set the linux-raid flag on all of the partitions in the
GPT. I think I have read in the linux-raid archives that this is not
recommended. Could this have had an affect on what transpired?

So my question is:

Is there a way, short of backing up the data, completely rebuilding
the arrays, and restoring the data (a real PIA) to rewrite the
metadata given the existing array configurations in the running
system?  Also, is there an explanation as to why the metadata seems so
screwed up that the arrays cannot be assembled automatically by the
kernel?

-- Ken Emerson

======================================================
Some current info:
mdadm.conf:
MAILADDR root
DEVICES /dev/sda* /dev/sdb* /dev/sdc* /dev/sdd* /dev/sde*
ARRAY /dev/md1 metadata=0.90 UUID=90f0aede:03a99d2a:bd811544:edcdae81
#ARRAY /dev/md2 metadata=0.90 UUID=bbb35b74:953e15e4:a6c431d9:d41e95bb
ARRAY /dev/md0 metadata=0.90 UUID=82ab6faa:6c2e2c2a:c44c77eb:7ee19756
ARRAY /dev/md3 metadata=0.90 UUID=bf3d03bc:87aa59eb:3381d0b6:242837d4

========================================================
from mdadm --examine (the four partitions are very similar):

/dev/sdd4:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : bf3d03bc:87aa59eb:3381d0b6:242837d4
  Creation Time : Mon Sep  3 15:11:50 2007
     Raid Level : raid6
  Used Dev Size : -1661870144 (2511.12 GiB 2696.29 GB)
     Array Size : 7899291456 (7533.35 GiB 8088.87 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

    Update Time : Tue Nov 22 17:45:42 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7ac29869 - correct
         Events : 3486116

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       52        4      active sync   /dev/sdd4

   0     0       0        0        0      removed
   1     1       8       20        1      active sync   /dev/sdb4
   2     2       8       36        2      active sync   /dev/sdc4
   3     3       8        4        3      active sync   /dev/sda4
   4     4       8       52        4      active sync   /dev/sdd4
======================================================
From mdadm --detail /dev/md3:
/dev/md3:
        Version : 0.90
  Creation Time : Mon Sep  3 15:11:50 2007
     Raid Level : raid6
     Array Size : 7899291456 (7533.35 GiB 8088.87 GB)
  Used Dev Size : -1
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Nov 22 17:47:20 2011
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : bf3d03bc:87aa59eb:3381d0b6:242837d4
         Events : 0.3486294

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       20        1      active sync   /dev/sdb4
       2       8       36        2      active sync   /dev/sdc4
       3       8        4        3      active sync   /dev/sda4
       4       8       52        4      active sync   /dev/sdd4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-11-23  0:05 md metadata nightmare Kenneth Emerson
@ 2011-11-23  0:47 ` NeilBrown
  2011-11-23  3:50   ` Phil Turmel
       [not found]   ` <CADzwnhUJ7HbZH9yqa6x9sHFLo8Vg=1k_SyzvZyq2=iQ5YRLhZQ@mail.gmail.com>
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2011-11-23  0:47 UTC (permalink / raw)
  To: Kenneth Emerson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2890 bytes --]

On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
<kenneth.emerson@gmail.com> wrote:

> NOTE: I have set the linux-raid flag on all of the partitions in the
> GPT. I think I have read in the linux-raid archives that this is not
> recommended. Could this have had an affect on what transpired?

Net recommended, but also totally ineffective.  The Linux-RAID partition type
is only recognised in MS-DOS partition tables.

> 
> So my question is:
> 
> Is there a way, short of backing up the data, completely rebuilding
> the arrays, and restoring the data (a real PIA) to rewrite the
> metadata given the existing array configurations in the running
> system?  Also, is there an explanation as to why the metadata seems so
> screwed up that the arrays cannot be assembled automatically by the
> kernel?

There appear to be two problems here.  Both could be resolved by converting to
v1.0 metadata.  But there are other approaches.  And converting to v1.0 is
not trivial (not enough developers to work on all the tasks!).

One problem is the final partition on at least some of your disks is at a 64K
alignment.  This means that the superblock looks valid for both the whole
device and for the partition.
You can confirm this by running
  mdadm --examine /dev/sda
  mdadm --examine /dev/sda4

(ditto for b,c,d,e,...)

The "sdX4" should show a superblock.  The 'sdX' should not.
I think it will show exactly the same superblock.  It could show a different
superblock... that would be interesting.

If I am correct here then you can "fix" this by changing mdadm.conf to read:

DEVICES /dev/sda? /dev/sdb? /dev/sdc? /dev/sdd? /dev/sde?
or 
DEVICES /dev/sd[abcde][1-4]

or similar.  i.e. tell it to ignore the whole devices.

The other problem is that v0.90 metadata isn't good with very large devices.
It has 32bits to record kilobytes per device.
This show allow 4TB per device but due to a bug (relating to sign bits) it
only works well with 2TB per device.  This bug was introduced in 2.6.29 and
removed in 3.1.

So if you can run a 3.1.2 kernel, that would be best.

You could convert to v1.0 if you want.  You only need to do this for the last
partition (sdX4).

Assuming nothing has changed since the "--detail" output you provided, you
should:

 mdadm -S /dev/md3
 mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
      missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
      --assume-clean

The order of the disks is import.  You should compare it with the output
of "mdadm --detail" before you start to ensure that it is correct and I have
made any typos.  You should of course check the rest as well.
After doing this (and possibly before) you should 'fsck' to ensure the
transition was successful.  If anything goes wrong, ask before risking
further breakage.

Good luck.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-11-23  0:47 ` NeilBrown
@ 2011-11-23  3:50   ` Phil Turmel
  2011-11-23 15:35     ` CoolCold
       [not found]   ` <CADzwnhUJ7HbZH9yqa6x9sHFLo8Vg=1k_SyzvZyq2=iQ5YRLhZQ@mail.gmail.com>
  1 sibling, 1 reply; 11+ messages in thread
From: Phil Turmel @ 2011-11-23  3:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: Kenneth Emerson, linux-raid

Hi Ken,

On 11/22/2011 07:47 PM, NeilBrown wrote:
> On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
[...]
> Assuming nothing has changed since the "--detail" output you provided, you
> should:
> 
>  mdadm -S /dev/md3
>  mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
>       missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
>       --assume-clean
> 
> The order of the disks is import.  You should compare it with the output
> of "mdadm --detail" before you start to ensure that it is correct and I have
> made any typos.  You should of course check the rest as well.
> After doing this (and possibly before) you should 'fsck' to ensure the
> transition was successful.  If anything goes wrong, ask before risking
> further breakage.

A word of warning...  the shell notation /dev/sd[bcad]4, which you might be
tempted to type in the above command line, *will* *not* *work*.  Bash reorders
the [bcad] to [abcd], dropping nonexistent names.  You might know this, and
not be burned, but others on the list have been.  Use {b,c,a,d} to stay safe.

HTH,

Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-11-23  3:50   ` Phil Turmel
@ 2011-11-23 15:35     ` CoolCold
  0 siblings, 0 replies; 11+ messages in thread
From: CoolCold @ 2011-11-23 15:35 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, Kenneth Emerson, linux-raid

On Wed, Nov 23, 2011 at 7:50 AM, Phil Turmel <philip@turmel.org> wrote:
> Hi Ken,
>
> On 11/22/2011 07:47 PM, NeilBrown wrote:
>> On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
> [...]
>> Assuming nothing has changed since the "--detail" output you provided, you
>> should:
>>
>>  mdadm -S /dev/md3
>>  mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
>>       missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
>>       --assume-clean
>>
>> The order of the disks is import.  You should compare it with the output
>> of "mdadm --detail" before you start to ensure that it is correct and I have
>> made any typos.  You should of course check the rest as well.
>> After doing this (and possibly before) you should 'fsck' to ensure the
>> transition was successful.  If anything goes wrong, ask before risking
>> further breakage.
>

> A word of warning...  the shell notation /dev/sd[bcad]4, which you might be
> tempted to type in the above command line, *will* *not* *work*.  Bash reorders
> the [bcad] to [abcd], dropping nonexistent names.  You might know this, and
> not be burned, but others on the list have been.  Use {b,c,a,d} to stay safe.
Thanks, may be you saved my data in future uses ;)
>
> HTH,
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
       [not found]   ` <CADzwnhUJ7HbZH9yqa6x9sHFLo8Vg=1k_SyzvZyq2=iQ5YRLhZQ@mail.gmail.com>
@ 2011-11-23 22:36     ` NeilBrown
       [not found]       ` <CADzwnhXuW7ShBNGf+kqnZYrtRnWMPSRDWzb2h4Gt69Cih0-yGA@mail.gmail.com>
  2011-11-23 22:38     ` Kenneth Emerson
  1 sibling, 1 reply; 11+ messages in thread
From: NeilBrown @ 2011-11-23 22:36 UTC (permalink / raw)
  To: Kenneth Emerson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6415 bytes --]

On Wed, 23 Nov 2011 16:17:52 -0600 Kenneth Emerson
<kenneth.emerson@gmail.com> wrote:

> On Tue, Nov 22, 2011 at 6:47 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
> > <kenneth.emerson@gmail.com> wrote:
> >
> >> NOTE: I have set the linux-raid flag on all of the partitions in the
> >> GPT. I think I have read in the linux-raid archives that this is not
> >> recommended. Could this have had an affect on what transpired?
> >
> > Net recommended, but also totally ineffective.  The Linux-RAID partition
> type
> > is only recognised in MS-DOS partition tables.
> >
> 
> I will remove these flags.
> 
> >>
> >> So my question is:
> >>
> >> Is there a way, short of backing up the data, completely rebuilding
> >> the arrays, and restoring the data (a real PIA) to rewrite the
> >> metadata given the existing array configurations in the running
> >> system?  Also, is there an explanation as to why the metadata seems so
> >> screwed up that the arrays cannot be assembled automatically by the
> >> kernel?
> >
> > There appear to be two problems here.  Both could be resolved by
> converting to
> > v1.0 metadata.  But there are other approaches.  And converting to v1.0 is
> > not trivial (not enough developers to work on all the tasks!).
> >
> 
> Here, I assume you mean providing a utility to upgrade the metatdata is
> daunting since
> below you give me instructions on how to do this with a brute-force method.

Yes.

"trivial" would mean you could:

  mdadm --stop /dev/md3
  mdadm --assemble /dev/md03 --update=metadata --metadata=1.0 /dev/sd[abcd]4

and it would "get it right.
Writing the code in mdadm to do that isn't exactly "daunting", it just isn't
near the top of my list.
It would do almost exactly the same steps as I told you do to manually.



> 
> 
> > One problem is the final partition on at least some of your disks is at a
> 64K
> > alignment.  This means that the superblock looks valid for both the whole
> > device and for the partition.
> > You can confirm this by running
> >  mdadm --examine /dev/sda
> >  mdadm --examine /dev/sda4
> >
> > (ditto for b,c,d,e,...)
> >
> > The "sdX4" should show a superblock.  The 'sdX' should not.
> > I think it will show exactly the same superblock.  It could show a
> different
> > superblock... that would be interesting.
> >
> I still have not re-installed the original sda drive, but the sde drive
> (which is now sdd)
> showed the similar problem where the kernel tried to build an array with
> the entire drive.
> When I look at the --examine on sdd and on sdd4 (and sdd1,2,3 as well),
> none are exactly
> the same (I assume that the output would be exactly the same if it were the
> same superblock).
> I get different UUID's and time stamps as well as RAID types.

In that case you could probably just remove them with e.g.
  mdadm --zero /dev/sdd

That will write zeros over the superblock it finds which is another way you
can stop mdadm from being confused by it.


> 
> 
> > If I am correct here then you can "fix" this by changing mdadm.conf to
> read:
> >
> > DEVICES /dev/sda? /dev/sdb? /dev/sdc? /dev/sdd? /dev/sde?
> > or
> > DEVICES /dev/sd[abcde][1-4]
> >
> > or similar.  i.e. tell it to ignore the whole devices.
> 
> I actually did this at one time, and it was better, but it still did not
> assemble the correct arrays.
> I will, however, change my current .conf file to ignore the whole drives.
> 
> >
> > The other problem is that v0.90 metadata isn't good with very large
> devices.
> > It has 32bits to record kilobytes per device.
> > This show allow 4TB per device but due to a bug (relating to sign bits) it
> > only works well with 2TB per device.  This bug was introduced in 2.6.29
> and
> > removed in 3.1.
> >
> > So if you can run a 3.1.2 kernel, that would be best.
> >
> OK. Now you have me worried.  Is this "bug" benign or is it a ticking time
> bomb?  If I do
> the conversion (below) to version 1.0 will that circumvent the problem?

Not sure what you mean by "time bomb".
The bug means that when you assemble an array with devices larger than 2TB,
the effective size has 2TB subtracted from it so you only see the beginning
of the array.

1.0 doesn't have this bug (it uses 64bit sizes) so after conversion the bug
will not affect you.


> 
> > You could convert to v1.0 if you want.  You only need to do this for the
> last
> > partition (sdX4).
> >
> > Assuming nothing has changed since the "--detail" output you provided, you
> > should:
> >
> >  mdadm -S /dev/md3
> >  mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
> >      missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
> >      --assume-clean
> >
> > The order of the disks is import.  You should compare it with the output
> > of "mdadm --detail" before you start to ensure that it is correct and I
> have
> > made any typos.  You should of course check the rest as well.
> > After doing this (and possibly before) you should 'fsck' to ensure the
> > transition was successful.  If anything goes wrong, ask before risking
> > further breakage.
> >
> I will do this conversion; but I will backup my data as best I can first,
> just in case.
> I still have the 5 1TB drives and my data should fit on there, just a PIA
> to do it.
> (Ahh, that's what weekends are for, right?)
> After the RAID6 is repaired and running OK, I believe I will rebuild the 2
> RAID1 arrays
> as that will be an easy project (since I have 5 copies of everything) which
> will get rid of
> all vestiges of previous raid arrays.  Do I need to anything special other
> than zeroing
> the superblocks (--zero-superblock)?  Also, shouldn't I do that on the
> RAID6 array before
> doing the create or is that done automagically?

It is done automatically.  When you use "--create", mdadm will zero any
superblocks it finds of any format that it recognises, then write the new
metadata it wants.

> 
> > Good luck.
> >
> Hopefully, luck has nothing to do with it, but I'll take it where I can get
> it.  Lucky is
> better than good any day in my book.  ;-)
> 
> Thank you very much for your insight and experience.  I'll let you know how
> it turns out.
> 
> -- Ken Emerson
> 
> > NeilBrown
> >
> >


:-)

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* md metadata nightmare
       [not found]   ` <CADzwnhUJ7HbZH9yqa6x9sHFLo8Vg=1k_SyzvZyq2=iQ5YRLhZQ@mail.gmail.com>
  2011-11-23 22:36     ` NeilBrown
@ 2011-11-23 22:38     ` Kenneth Emerson
  1 sibling, 0 replies; 11+ messages in thread
From: Kenneth Emerson @ 2011-11-23 22:38 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Tue, Nov 22, 2011 at 6:47 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
> <kenneth.emerson@gmail.com> wrote:
>
>> NOTE: I have set the linux-raid flag on all of the partitions in the
>> GPT. I think I have read in the linux-raid archives that this is not
>> recommended. Could this have had an affect on what transpired?
>
> Net recommended, but also totally ineffective.  The Linux-RAID partition type
> is only recognised in MS-DOS partition tables.
>
I will remove these flags.
>>
>> So my question is:
>>
>> Is there a way, short of backing up the data, completely rebuilding
>> the arrays, and restoring the data (a real PIA) to rewrite the
>> metadata given the existing array configurations in the running
>> system?  Also, is there an explanation as to why the metadata seems so
>> screwed up that the arrays cannot be assembled automatically by the
>> kernel?
>
> There appear to be two problems here.  Both could be resolved by converting to
> v1.0 metadata.  But there are other approaches.  And converting to v1.0 is
> not trivial (not enough developers to work on all the tasks!).
>

Here, I assume you mean providing a utility to upgrade the metatdata
is daunting since
below you give me instructions on how to do this with a brute-force method.

> One problem is the final partition on at least some of your disks is at a 64K
> alignment.  This means that the superblock looks valid for both the whole
> device and for the partition.
> You can confirm this by running
>  mdadm --examine /dev/sda
>  mdadm --examine /dev/sda4
>
> (ditto for b,c,d,e,...)
>
> The "sdX4" should show a superblock.  The 'sdX' should not.
> I think it will show exactly the same superblock.  It could show a different
> superblock... that would be interesting.
>

I still have not re-installed the original sda drive, but the sde
drive (which is now sdd)
showed the similar problem where the kernel tried to build an array
with the entire drive.
When I look at the --examine on sdd and on sdd4 (and sdd1,2,3 as
well), none are exactly
the same (I assume that the output would be exactly the same if it
were the same superblock).
I get different UUID's and time stamps as well as RAID types.

> If I am correct here then you can "fix" this by changing mdadm.conf to read:
>
> DEVICES /dev/sda? /dev/sdb? /dev/sdc? /dev/sdd? /dev/sde?
> or
> DEVICES /dev/sd[abcde][1-4]
>
> or similar.  i.e. tell it to ignore the whole devices.

I actually did this at one time, and it was better, but it still did
not assemble the correct arrays.
I will, however, change my current .conf file to ignore the whole drives.

>
> The other problem is that v0.90 metadata isn't good with very large devices.
> It has 32bits to record kilobytes per device.
> This show allow 4TB per device but due to a bug (relating to sign bits) it
> only works well with 2TB per device.  This bug was introduced in 2.6.29 and
> removed in 3.1.
>
> So if you can run a 3.1.2 kernel, that would be best.
>

OK. Now you have me worried.  Is this "bug" benign or is it a ticking
time bomb?  If I do
the conversion (below) to version 1.0 will that circumvent the problem?

> You could convert to v1.0 if you want.  You only need to do this for the last
> partition (sdX4).
>
> Assuming nothing has changed since the "--detail" output you provided, you
> should:
>
>  mdadm -S /dev/md3
>  mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
>      missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
>      --assume-clean
>
> The order of the disks is import.  You should compare it with the output
> of "mdadm --detail" before you start to ensure that it is correct and I have
> made any typos.  You should of course check the rest as well.
> After doing this (and possibly before) you should 'fsck' to ensure the
> transition was successful.  If anything goes wrong, ask before risking
> further breakage.
>

I will do this conversion; but I will backup my data as best I can
first, just in case.
I still have the 5 1TB drives and my data should fit on there, just a
PIA to do it.
(Ahh, that's what weekends are for, right?)
After the RAID6 is repaired and running OK, I believe I will rebuild
the 2 RAID1 arrays
as that will be an easy project (since I have 5 copies of everything)
which will get rid of
all vestiges of previous raid arrays.  Do I need to anything special
other than zeroing
the superblocks (--zero-superblock)?  Also, shouldn't I do that on the
RAID6 array before
doing the create or is that done automagically?

> Good luck.
>

Hopefully, luck has nothing to do with it, but I'll take it where I
can get it.  Lucky is
better than good any day in my book.  ;-)
Thank you very much for your insight and experience.  I'll let you
know how it turns out.

-- Ken Emerson

> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* md metadata nightmare
       [not found]         ` <CADzwnhUmHACPJA+c23AeRs3AW_ExuAUQST9jew_=5U1xdMqEFA@mail.gmail.com>
@ 2011-12-03 17:02           ` Kenneth Emerson
  0 siblings, 0 replies; 11+ messages in thread
From: Kenneth Emerson @ 2011-12-03 17:02 UTC (permalink / raw)
  To: linux-raid

> > I will do this conversion; but I will backup my data as best I can first,
> > just in case.
> > I still have the 5 1TB drives and my data should fit on there, just a PIA
> > to do it.
> > (Ahh, that's what weekends are for, right?)
> > After the RAID6 is repaired and running OK, I believe I will rebuild the 2
> > RAID1 arrays
> > as that will be an easy project (since I have 5 copies of everything) which
> > will get rid of
> > all vestiges of previous raid arrays.  Do I need to anything special other
> > than zeroing
> > the superblocks (--zero-superblock)?  Also, shouldn't I do that on the
> > RAID6 array before
> > doing the create or is that done automagically?
>
> It is done automatically.  When you use "--create", mdadm will zero any
> superblocks it finds of any format that it recognises, then write the new
> metadata it wants.
>
I am still waiting for my replacement Hitachi drive. My RAID6
configuration is made up of a total of 5 drives and I am currently
running degraded (missing one drive).  Can I recreate this raid set,
changing the metadata version missing one drive or should I just be
patient and wait for the replacement to arrive?

-- Ken Emerson
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
       [not found]         ` <20111204061122.5bb5de4b@notabene.brown>
@ 2011-12-04 17:20           ` Kenneth Emerson
  2011-12-04 17:31             ` wilsonjonathan
  0 siblings, 1 reply; 11+ messages in thread
From: Kenneth Emerson @ 2011-12-04 17:20 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Sat, Dec 3, 2011 at 1:11 PM, NeilBrown <neilb@suse.de> wrote:
> On Sat, 3 Dec 2011 10:58:43 -0600 Kenneth Emerson <kenneth.emerson@gmail.com>
> wrote:
>
>> >
>> > > I will do this conversion; but I will backup my data as best I can first,
>> > > just in case.
>> > > I still have the 5 1TB drives and my data should fit on there, just a PIA
>> > > to do it.
>> > > (Ahh, that's what weekends are for, right?)
>> > > After the RAID6 is repaired and running OK, I believe I will rebuild the
>> > 2
>> > > RAID1 arrays
>> > > as that will be an easy project (since I have 5 copies of everything)
>> > which
>> > > will get rid of
>> > > all vestiges of previous raid arrays.  Do I need to anything special
>> > other
>> > > than zeroing
>> > > the superblocks (--zero-superblock)?  Also, shouldn't I do that on the
>> > > RAID6 array before
>> > > doing the create or is that done automagically?
>> >
>> > It is done automatically.  When you use "--create", mdadm will zero any
>> > superblocks it finds of any format that it recognises, then write the new
>> > metadata it wants.
>> >
>> > I am still waiting for my replacement Hitachi drive. My RAID6
>> configuration is made up of a total of 5 drives and I am currently running
>> degraded (missing one drive).  Can I recreate this raid set, changing the
>> metadata version missing one drive or should I just be patient and wait for
>> the replacement to arrive?
>>
>> -- Ken Emerson
>
> Yes, you can re-create the raid set as degraded.
> When you list the devices in the "mdadm --create" command, use the word
> "missing" for the device which is missing.
>
> NeilBrown

So I finished my backup and attempted to recreate the array using:

root@mythtv:/home/ken# mdadm -C /dev/md3 --metadata=1.0 --chunk=64k
--level=6 --raid-devices=5 missing /dev/sdb4 /dev/sdc4 /dev/sda4
/dev/sdd4 --assume-clean

and received the error:

mdadm: invalid chunk/rounding value: 64k

What do I do now?

-- Ken E.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-12-04 17:20           ` Kenneth Emerson
@ 2011-12-04 17:31             ` wilsonjonathan
  2011-12-04 19:39               ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: wilsonjonathan @ 2011-12-04 17:31 UTC (permalink / raw)
  To: Kenneth Emerson; +Cc: NeilBrown, linux-raid


> So I finished my backup and attempted to recreate the array using:
> 
> root@mythtv:/home/ken# mdadm -C /dev/md3 --metadata=1.0 --chunk=64k
> --level=6 --raid-devices=5 missing /dev/sdb4 /dev/sdc4 /dev/sda4
> /dev/sdd4 --assume-clean
> 
> and received the error:
> 
> mdadm: invalid chunk/rounding value: 64k
> 
> What do I do now?

Omit the "k" only M or G are allowed as the default is to assume K


> 
> -- Ken E.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-12-04 17:31             ` wilsonjonathan
@ 2011-12-04 19:39               ` NeilBrown
  2011-12-05  5:05                 ` Kenneth Emerson
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2011-12-04 19:39 UTC (permalink / raw)
  To: wilsonjonathan; +Cc: Kenneth Emerson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On Sun, 4 Dec 2011 17:31:06 +0000 wilsonjonathan <piercing_male@hotmail.com>
wrote:

> 
> > So I finished my backup and attempted to recreate the array using:
> > 
> > root@mythtv:/home/ken# mdadm -C /dev/md3 --metadata=1.0 --chunk=64k
> > --level=6 --raid-devices=5 missing /dev/sdb4 /dev/sdc4 /dev/sda4
> > /dev/sdd4 --assume-clean
> > 
> > and received the error:
> > 
> > mdadm: invalid chunk/rounding value: 64k
> > 
> > What do I do now?
> 
> Omit the "k" only M or G are allowed as the default is to assume K
> 

Not strictly accurate.  'K' is allowed, though not 'k'.  But the default is
definitely 'K', so with

  --chunk=64
or
  --chunk-64K

should work.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: md metadata nightmare
  2011-12-04 19:39               ` NeilBrown
@ 2011-12-05  5:05                 ` Kenneth Emerson
  0 siblings, 0 replies; 11+ messages in thread
From: Kenneth Emerson @ 2011-12-05  5:05 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Sun, Dec 4, 2011 at 1:39 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 4 Dec 2011 17:31:06 +0000 wilsonjonathan <piercing_male@hotmail.com>
> wrote:
>
>>
>> > So I finished my backup and attempted to recreate the array using:
>> >
>> > root@mythtv:/home/ken# mdadm -C /dev/md3 --metadata=1.0 --chunk=64k
>> > --level=6 --raid-devices=5 missing /dev/sdb4 /dev/sdc4 /dev/sda4
>> > /dev/sdd4 --assume-clean
>> >
>> > and received the error:
>> >
>> > mdadm: invalid chunk/rounding value: 64k
>> >
>> > What do I do now?
>>
>> Omit the "k" only M or G are allowed as the default is to assume K
>>
>
> Not strictly accurate.  'K' is allowed, though not 'k'.  But the default is
> definitely 'K', so with
>
>  --chunk=64
> or
>  --chunk-64K
>
> should work.
>
> Thanks,
> NeilBrown

I wanted to report back my success (yes, it was successful).  I
appreciate everyone's help, especially Neil Brown.  I was able to
change the metadata from 0.90 to 1.0 without losing any data (which
was good since it would have take 20+ hours to restore).  I even tried
changing my three RAID1 arrays to version 1.00 but stopped at my boot
partition. I now believe I remember reading that  grub2 only
understands version 0.90; regardless, it wouldn't boot with it changed
to 1.0 so I changed it back to 0.90 and all is good. I got rid of the
erroneous super block on the fourth drive and the arrays are
automatically assembled on boot without intervention.

Much obliged for all the quick responses.  Now I just wait for my
replacement drive from Hitachi and things will be back to 100%.

Regards,

Ken Emerson
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-12-05  5:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-23  0:05 md metadata nightmare Kenneth Emerson
2011-11-23  0:47 ` NeilBrown
2011-11-23  3:50   ` Phil Turmel
2011-11-23 15:35     ` CoolCold
     [not found]   ` <CADzwnhUJ7HbZH9yqa6x9sHFLo8Vg=1k_SyzvZyq2=iQ5YRLhZQ@mail.gmail.com>
2011-11-23 22:36     ` NeilBrown
     [not found]       ` <CADzwnhXuW7ShBNGf+kqnZYrtRnWMPSRDWzb2h4Gt69Cih0-yGA@mail.gmail.com>
     [not found]         ` <CADzwnhUmHACPJA+c23AeRs3AW_ExuAUQST9jew_=5U1xdMqEFA@mail.gmail.com>
2011-12-03 17:02           ` Kenneth Emerson
     [not found]         ` <20111204061122.5bb5de4b@notabene.brown>
2011-12-04 17:20           ` Kenneth Emerson
2011-12-04 17:31             ` wilsonjonathan
2011-12-04 19:39               ` NeilBrown
2011-12-05  5:05                 ` Kenneth Emerson
2011-11-23 22:38     ` Kenneth Emerson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.