All of lore.kernel.org
 help / color / mirror / Atom feed
* freshly grown array shrinks after first reboot - major data loss
@ 2011-09-01 15:28 Pim Zandbergen
  2011-09-01 16:12 ` Pim Zandbergen
  0 siblings, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 15:28 UTC (permalink / raw)
  To: linux-raid

I replaced every 2TB drive of my 7-drive RAID-5 array with 3TB drives.
After the last replacement I could grow the array from 12 TB to 18 TB using

mdadm --grow /dev/md0 --size max

That worked:
md0: detected capacity change from 12002386771968 to 18003551059968

It worked for quite a while, until the machine had to be rebooted. It 
shrunk:
md0: detected capacity change from 0 to 4809411526656

The LVM volume group on this array would not be activated until I repeated
the mdadm command. It grew back to the original size.
md0: detected capacity change from 4809411526656 to 18003551059968

However, this caused major data loss, as everything beyond the perceived
4.8 TB size was wiped by the sync process.

This happened on Fedora 15, using kernel-2.6.38.6-27.fc15.x86_64 and
mdadm-3.2.2-6.fc15.x86_64.

The drives are Hitachi Deskstar 7K3000 HDS723030ALA640. The adapter is an
LSI Logic  SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
(LSI SAS 9211-8i). I had to buy this adapter as my old SAS1068 based card
would not support 3TB drives.

I can probably fix this by creating a fresh new array and then start 
restoring
my backups, but now is the time to seek for the cause of this.

I can reproduce this on demand. I can grow the array again, and it will 
shrink
immediately after the next reboot.

What should I do to find the cause?

Thanks,
Pim

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 15:28 freshly grown array shrinks after first reboot - major data loss Pim Zandbergen
@ 2011-09-01 16:12 ` Pim Zandbergen
  2011-09-01 16:16   ` Pim Zandbergen
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 16:12 UTC (permalink / raw)
  To: linux-raid

On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
>
>
> What should I do to find the cause?

Additional information:

Both the original 2TB drives as well as the new 3TB drives were GPT
formatted with partition type FD00

This is information about the currently shrunk array:


# mdadm --detail  /dev/md0
/dev/md0:
         Version : 0.90
   Creation Time : Wed Feb  8 23:22:15 2006
      Raid Level : raid5
      Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
   Used Dev Size : 782781824 (746.52 GiB 801.57 GB)
    Raid Devices : 7
   Total Devices : 7
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Tue Aug 30 21:50:50 2011
           State : clean
  Active Devices : 7
Working Devices : 7
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 64K

            UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
          Events : 0.3157574

     Number   Major   Minor   RaidDevice State
        0       8      161        0      active sync   /dev/sdk1
        1       8      177        1      active sync   /dev/sdl1
        2       8      193        2      active sync   /dev/sdm1
        3       8      145        3      active sync   /dev/sdj1
        4       8      209        4      active sync   /dev/sdn1
        5       8      225        5      active sync   /dev/sdo1
        6       8      129        6      active sync   /dev/sdi1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:12 ` Pim Zandbergen
@ 2011-09-01 16:16   ` Pim Zandbergen
  2011-09-01 16:48     ` John Robinson
  2011-09-01 16:31   ` Doug Ledford
  2011-09-01 17:03   ` Robin Hill
  2 siblings, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 16:16 UTC (permalink / raw)
  To: linux-raid

More info:

# gdisk -l /dev/sdk
GPT fdisk (gdisk) version 0.7.2

Partition table scan:
   MBR: protective
   BSD: not present
   APM: not present
   GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdk: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): BEEBC2FD-A959-4292-8115-AEFA06E0978E
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
    1            2048      5860533134   2.7 TiB     FD00  Linux RAID

# mdadm --examine /dev/sdk1
/dev/sdk1:
           Magic : a92b4efc
         Version : 0.90.03
            UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
   Creation Time : Wed Feb  8 23:22:15 2006
      Raid Level : raid5
   Used Dev Size : 782781824 (746.52 GiB 801.57 GB)
      Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
    Raid Devices : 7
   Total Devices : 7
Preferred Minor : 0

     Update Time : Thu Sep  1 18:11:08 2011
           State : clean
  Active Devices : 7
Working Devices : 7
  Failed Devices : 0
   Spare Devices : 0
        Checksum : 7698c20e - correct
          Events : 3157574

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8      161        0      active sync   /dev/sdk1

    0     0       8      161        0      active sync   /dev/sdk1
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      193        2      active sync   /dev/sdm1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      209        4      active sync   /dev/sdn1
    5     5       8      225        5      active sync   /dev/sdo1
    6     6       8      129        6      active sync   /dev/sdi1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:12 ` Pim Zandbergen
  2011-09-01 16:16   ` Pim Zandbergen
@ 2011-09-01 16:31   ` Doug Ledford
  2011-09-01 17:44     ` Pim Zandbergen
  2011-09-01 17:03   ` Robin Hill
  2 siblings, 1 reply; 21+ messages in thread
From: Doug Ledford @ 2011-09-01 16:31 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: linux-raid

On 09/01/2011 12:12 PM, Pim Zandbergen wrote:
> On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
>>
>>
>> What should I do to find the cause?
>
> Additional information:
>
> Both the original 2TB drives as well as the new 3TB drives were GPT
> formatted with partition type FD00
>
> This is information about the currently shrunk array:
>
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 0.90

Why is your raid metadata using this old version?  mdadm-3.2.2-6.fc15 
will not create this version of raid array by default.  There is a 
reason we have updated to a new superblock.  Does this problem still 
occur if you use a newer superblock format (one of the version 1.x 
versions)?

> Creation Time : Wed Feb 8 23:22:15 2006
> Raid Level : raid5
> Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
> Used Dev Size : 782781824 (746.52 GiB 801.57 GB)

This looks like some sort of sector count wrap, which might be related 
to version 0.90 superblock usage. 3TB - 2.2TB (roughly the wrap point) = 
800GB, which is precisely how much of each device you are using to 
create a 4.8TB array.

> Raid Devices : 7
> Total Devices : 7
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Tue Aug 30 21:50:50 2011
> State : clean
> Active Devices : 7
> Working Devices : 7
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
> Events : 0.3157574
>
> Number Major Minor RaidDevice State
> 0 8 161 0 active sync /dev/sdk1
> 1 8 177 1 active sync /dev/sdl1
> 2 8 193 2 active sync /dev/sdm1
> 3 8 145 3 active sync /dev/sdj1
> 4 8 209 4 active sync /dev/sdn1
> 5 8 225 5 active sync /dev/sdo1
> 6 8 129 6 active sync /dev/sdi1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:16   ` Pim Zandbergen
@ 2011-09-01 16:48     ` John Robinson
  2011-09-01 17:21       ` Pim Zandbergen
  0 siblings, 1 reply; 21+ messages in thread
From: John Robinson @ 2011-09-01 16:48 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: linux-raid

On 01/09/2011 17:16, Pim Zandbergen wrote:
> # gdisk -l /dev/sdk
[...]
> Number Start (sector) End (sector) Size Code Name
> 1 2048 5860533134 2.7 TiB FD00 Linux RAID

Partition type FD is only for metadata 0.90 arrays to be auto-assembled 
by the kernel. This is now deprecated; you should be using partition 
type DA (Non-FS data) and an initrd to assemble your arrays.

> # mdadm --examine /dev/sdk1
> /dev/sdk1:
> Magic : a92b4efc
> Version : 0.90.03

Metadata version 0.90 does not support devices over 2TiB. I think it's a 
bug that you weren't warned at some point.

Cheers,

John.


-- 
John Robinson, yuiop IT services
0131 557 9577 / 07771 784 058
46/12 Broughton Road, Edinburgh EH7 4EE

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:12 ` Pim Zandbergen
  2011-09-01 16:16   ` Pim Zandbergen
  2011-09-01 16:31   ` Doug Ledford
@ 2011-09-01 17:03   ` Robin Hill
  2 siblings, 0 replies; 21+ messages in thread
From: Robin Hill @ 2011-09-01 17:03 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Thu Sep 01, 2011 at 06:12:35PM +0200, Pim Zandbergen wrote:

> On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
> >
> >
> > What should I do to find the cause?
> 
> Additional information:
> 
> Both the original 2TB drives as well as the new 3TB drives were GPT
> formatted with partition type FD00
> 
> This is information about the currently shrunk array:
> 
> 
> # mdadm --detail  /dev/md0
> /dev/md0:
>          Version : 0.90
>    Creation Time : Wed Feb  8 23:22:15 2006
>       Raid Level : raid5
> 
Looks like there's a bug somewhere. The documentation says that 0.90
metadata doesn't support >2TB components for RAID levels 1 and above.
If this is still correct, mdadm should have prevented you growing the
array in the first place.

I'd suggest recreating the array with 1.x metadata instead and checking
whether that runs into the same issue.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:48     ` John Robinson
@ 2011-09-01 17:21       ` Pim Zandbergen
  2011-09-02  9:02         ` Pim Zandbergen
  0 siblings, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 17:21 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

On 1-9-2011 6:48, John Robinson wrote:
> you should be using partition type DA (Non-FS data)
using gdisk (GPT) or fdisk (MBR) ?


> and an initrd to assemble your arrays. 

Booting from the array is not required. I guess the Fedora init scripts
will assemble the array from /etc/mdadm.conf

Thanks,
Pim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 16:31   ` Doug Ledford
@ 2011-09-01 17:44     ` Pim Zandbergen
  2011-09-01 18:17       ` Doug Ledford
  2011-09-02  5:32       ` Simon Matthews
  0 siblings, 2 replies; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 17:44 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On 09/01/2011 06:31 PM, Doug Ledford wrote:
> Why is your raid metadata using this old version?  mdadm-3.2.2-6.fc15 
> will not create this version of raid array by default.  There is a 
> reason we have updated to a new superblock.

As you may have seen, the array was created in 2006, and has gone through
several similar grow procedures.

> Does this problem still occur if you use a newer superblock format 
> (one of the version 1.x versions)?

I suppose not. But that would destroy the "evidence" of a possible bug.
For me, it's too late, but finding it could help others to prevent this 
situation.
If there's anything I could do to help find it, now is the time.

If the people on this list know enough, I will proceed.

Thanks,
Pim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 17:44     ` Pim Zandbergen
@ 2011-09-01 18:17       ` Doug Ledford
  2011-09-01 18:52         ` Pim Zandbergen
  2011-09-08  1:10         ` NeilBrown
  2011-09-02  5:32       ` Simon Matthews
  1 sibling, 2 replies; 21+ messages in thread
From: Doug Ledford @ 2011-09-01 18:17 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: linux-raid

On 09/01/2011 01:44 PM, Pim Zandbergen wrote:
> On 09/01/2011 06:31 PM, Doug Ledford wrote:
>> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
>> will not create this version of raid array by default. There is a
>> reason we have updated to a new superblock.
>
> As you may have seen, the array was created in 2006, and has gone through
> several similar grow procedures.

Even so, one of the original limitations of the 0.90 superblock was 
maximum usable device size.  I'm not entirely sure that growing a 0.90 
superblock past 2TB wasn't the source of your problem and that the bug 
that needs fixed is that mdadm should have refused to grow a 0.90 
superblock based array beyond the 2TB limit.  Neil would have to speak 
to that.

>> Does this problem still occur if you use a newer superblock format
>> (one of the version 1.x versions)?
>
> I suppose not. But that would destroy the "evidence" of a possible bug.
> For me, it's too late, but finding it could help others to prevent this
> situation.
> If there's anything I could do to help find it, now is the time.
>
> If the people on this list know enough, I will proceed.
>
> Thanks,
> Pim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 18:17       ` Doug Ledford
@ 2011-09-01 18:52         ` Pim Zandbergen
  2011-09-01 19:41           ` Doug Ledford
  2011-09-08  1:10         ` NeilBrown
  1 sibling, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-01 18:52 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On 09/01/2011 08:17 PM, Doug Ledford wrote:
> the bug that needs fixed is that mdadm should have refused to grow a 
> 0.90 superblock based array beyond the 2TB limit
Yes, that's exactly what I am aiming for.

I could file a bug on bugzilla.redhat.com if that would help.

I'm not sure whether I need to keep my hosed array around
in order to be able to reproduce things.

Thanks,
Pim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 18:52         ` Pim Zandbergen
@ 2011-09-01 19:41           ` Doug Ledford
  2011-09-02  9:19             ` Pim Zandbergen
  0 siblings, 1 reply; 21+ messages in thread
From: Doug Ledford @ 2011-09-01 19:41 UTC (permalink / raw)
  To: Pim Zandbergen, linux-raid

On 09/01/2011 02:52 PM, Pim Zandbergen wrote:
> On 09/01/2011 08:17 PM, Doug Ledford wrote:
>> the bug that needs fixed is that mdadm should have refused to grow a
>> 0.90 superblock based array beyond the 2TB limit
> Yes, that's exactly what I am aiming for.
>
> I could file a bug on bugzilla.redhat.com if that would help.

Feel free, it helps me track things.

> I'm not sure whether I need to keep my hosed array around
> in order to be able to reproduce things.

I don't think that's necessary at this point.  It seems pretty obvious 
what's going on and should be easy to reproduce.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 17:44     ` Pim Zandbergen
  2011-09-01 18:17       ` Doug Ledford
@ 2011-09-02  5:32       ` Simon Matthews
  2011-09-02  8:53         ` Pim Zandbergen
  1 sibling, 1 reply; 21+ messages in thread
From: Simon Matthews @ 2011-09-02  5:32 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: Doug Ledford, linux-raid

On Thu, Sep 1, 2011 at 10:44 AM, Pim Zandbergen
<P.Zandbergen@macroscoop.nl> wrote:
> On 09/01/2011 06:31 PM, Doug Ledford wrote:
>>
>> Why is your raid metadata using this old version?  mdadm-3.2.2-6.fc15 will
>> not create this version of raid array by default.  There is a reason we have
>> updated to a new superblock.
>
> As you may have seen, the array was created in 2006, and has gone through
> several similar grow procedures.
>
>> Does this problem still occur if you use a newer superblock format (one of
>> the version 1.x versions)?
>
> I suppose not. But that would destroy the "evidence" of a possible bug.
> For me, it's too late, but finding it could help others to prevent this
> situation.
> If there's anything I could do to help find it, now is the time.
>
> If the people on this list know enough, I will proceed.
>
> Thanks,
> Pim

I ran into this exact problem some weeks ago. I don't recall any error
or warning messages about growing the array to use 3TB partitions and
Neil acknowledge that this was a bug. He also gave instructions on how
to recover from this situation and re-start the array using 1.0
metadata.

Here is Neil's comment from that thread:

--------------------------------------------------------------------
Oopps.  That array is using 0.90 metadata which can only handle up to 2TB
devices.  The 'resize' code should catch that you are asking the impossible,
but it doesn't it seems.

You need to simply recreate the array as 1.0.
i.e.
 mdadm -S /dev/md5
 mdadm -C /dev/md5 --metadata 1.0 -l1 -n2 --assume-clean

Then all should be happiness.
>
--------------------------------------------------------------------
Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-02  5:32       ` Simon Matthews
@ 2011-09-02  8:53         ` Pim Zandbergen
  0 siblings, 0 replies; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-02  8:53 UTC (permalink / raw)
  To: linux-raid

On 09/02/2011 07:32 AM, Simon Matthews wrote:
> He also gave instructions on how
> to recover from this situation and re-start the array using 1.0
> metadata.
If only I had been patient and had not not tried to grow the array back..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 17:21       ` Pim Zandbergen
@ 2011-09-02  9:02         ` Pim Zandbergen
  2011-09-02 10:33           ` Mikael Abrahamsson
  0 siblings, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-02  9:02 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

On 09/01/2011 07:21 PM, Pim Zandbergen wrote:
> On 1-9-2011 6:48, John Robinson wrote:
>> you should be using partition type DA (Non-FS data)
> using gdisk (GPT) or fdisk (MBR) ?

I tried gdisk but does not know about DA00.

I tried fdisk and created an array from the resulting partitions.
That would only use the first 2TB of the 3TB disks.

Then I tried fdisk, but used the whole disks for the array.
That seems to work, although mdadm gave a lot of warnings
about the fact that the drives were partitioned.
The partition table does not seem to be wiped, however.

Is the latter way the way it is supposed to be done now?

Thanks,
Pim




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 19:41           ` Doug Ledford
@ 2011-09-02  9:19             ` Pim Zandbergen
  2011-09-02 11:06               ` John Robinson
  0 siblings, 1 reply; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-02  9:19 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On 09/01/2011 09:41 PM, Doug Ledford wrote:
>> I could file a bug on bugzilla.redhat.com if that would help.
>
> Feel free, it helps me track things.

https://bugzilla.redhat.com/show_bug.cgi?id=735306

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-02  9:02         ` Pim Zandbergen
@ 2011-09-02 10:33           ` Mikael Abrahamsson
  2011-09-05 10:47             ` Pim Zandbergen
  0 siblings, 1 reply; 21+ messages in thread
From: Mikael Abrahamsson @ 2011-09-02 10:33 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: John Robinson, linux-raid

On Fri, 2 Sep 2011, Pim Zandbergen wrote:

> Is the latter way the way it is supposed to be done now?

I've used whole drives the past years, it's worked great. You avoid all 
the hassle of handling partitions and alignment.

So yes, go for the whole device approach. I would make sure the partition 
table is wiped and that I was using v1.2 superblocks (default by now).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-02  9:19             ` Pim Zandbergen
@ 2011-09-02 11:06               ` John Robinson
  2011-09-09 19:30                 ` Bill Davidsen
  0 siblings, 1 reply; 21+ messages in thread
From: John Robinson @ 2011-09-02 11:06 UTC (permalink / raw)
  To: Pim Zandbergen; +Cc: Doug Ledford, linux-raid

On 02/09/2011 10:19, Pim Zandbergen wrote:
> On 09/01/2011 09:41 PM, Doug Ledford wrote:
>>> I could file a bug on bugzilla.redhat.com if that would help.
>>
>> Feel free, it helps me track things.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=735306

I'm not sure whether it's just the --grow that should complain, or 
perhaps the earlier step of
   mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB
should also complain (even if it'll work with less than 2TiB in use, it 
ought to tell the user they won't be able to grow the array).

Cheers,

John.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-02 10:33           ` Mikael Abrahamsson
@ 2011-09-05 10:47             ` Pim Zandbergen
  0 siblings, 0 replies; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-05 10:47 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: John Robinson, linux-raid

On 09/02/2011 12:33 PM, Mikael Abrahamsson wrote:
> So yes, go for the whole device approach. I would make sure the 
> partition table is wiped and that I was using v1.2 superblocks 
> (default by now). 

Could I have both? That is, add the whole device to the array, yet have 
a protective
partition table?

I like the idea of having a protective partition table, similar to the 
EE type that protects GPT
partitions from non-GPT aware partitioning software or OS's.

It looks like the 1.2 superblock allows just that, as it starts 4k past 
the start.

So, would it be wise to add the whole device to an array, using 1.2 
metadata,
with a fake partition table (type DA) ?

Thanks,
Pim

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-01 18:17       ` Doug Ledford
  2011-09-01 18:52         ` Pim Zandbergen
@ 2011-09-08  1:10         ` NeilBrown
  2011-09-08 13:44           ` Pim Zandbergen
  1 sibling, 1 reply; 21+ messages in thread
From: NeilBrown @ 2011-09-08  1:10 UTC (permalink / raw)
  To: Doug Ledford, Pim Zandbergen; +Cc: linux-raid

On Thu, 01 Sep 2011 14:17:22 -0400 Doug Ledford <dledford@redhat.com> wrote:

> On 09/01/2011 01:44 PM, Pim Zandbergen wrote:
> > On 09/01/2011 06:31 PM, Doug Ledford wrote:
> >> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
> >> will not create this version of raid array by default. There is a
> >> reason we have updated to a new superblock.
> >
> > As you may have seen, the array was created in 2006, and has gone through
> > several similar grow procedures.
> 
> Even so, one of the original limitations of the 0.90 superblock was 
> maximum usable device size.  I'm not entirely sure that growing a 0.90 
> superblock past 2TB wasn't the source of your problem and that the bug 
> that needs fixed is that mdadm should have refused to grow a 0.90 
> superblock based array beyond the 2TB limit.  Neil would have to speak 
> to that.

I finally had time to look into this problem.
I'm ashamed to say there is a serious bug here that I should have found and
fixed some time ago, but didn't look problem.  However I don't understand why
you lost any data.

The 0.90 metadata uses an unsigned 32bit number to record the number of
kilobytes used per device.  This should allow devices up to 4TB.  I don't
know where the "2TB" came from.  Maybe I thought something was signed? or
maybe I just didn't think.

However in 2.6.29 a bug was introduced in the handling of the count.
It is best to keep everything in the same units and the preferred units for
devices seems to be 512byte sectors so we changed md to record the available
size on a device in sectors.  So for 0.90 metadata this is:

   rdev->sectors = sb->size * 2;

Do you see the bug?  It will multiple size (a u32) by 2 before casting it to
a sector_t, so we lose the high bit.   This should have been
   rdev->sectors = ((sector_t)sb->size)*2;
and will be after I submit a patch.

However this should not lead to any data corruption.  When you reassemble the
array after reboot it will be 2TB per device smaller than it should be
(which is exactly what we see: 18003551059968-4809411526656 == 2*10^40*(7-1))
so some data will be missing.  But when you increase the size again it will
check the parity of the "new" space but as that is all correct it will not
change anything.
So your data *should* have been back exactly as it was.  I am at a loss to
explain why it is not.

I will add a test to mdadm to discourage you from adding 4+TB devices to 0.90
metadata, or 2+TB devices for 3.0 and earlier kernels.

I might also add a test to discourage growing an array beyond 2TB on kernels
before 3.1.  That is more awkward as mdadm doesn't really know how big you
are growing it to.  You ask for 'max' and it just says 'max' to the kernel.
The kernel needs to do the testing - and currently it doesn't.

Anyway the following patch will be on its way to Linus in a day or two.
Thanks for your report, and my apologies for your loss.

NeilBrown

From 24e9c8d1a620159df73f9b4a545cae668b6285ef Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 8 Sep 2011 10:54:34 +1000
Subject: [PATCH] md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.

0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3742ce8..63f71cc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1138,8 +1138,11 @@ static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version
 			ret = 0;
 	}
 	rdev->sectors = rdev->sb_start;
+	/* Limit to 4TB as metadata cannot record more than that */
+	if (rdev->sectors >= (2ULL << 32))
+		rdev->sectors = (2ULL << 32) - 2;
 
-	if (rdev->sectors < sb->size * 2 && sb->level > 1)
+	if (rdev->sectors < ((sector_t)sb->size) * 2 && sb->level > 1)
 		/* "this cannot possibly happen" ... */
 		ret = -EINVAL;
 
@@ -1173,7 +1176,7 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev)
 		mddev->clevel[0] = 0;
 		mddev->layout = sb->layout;
 		mddev->raid_disks = sb->raid_disks;
-		mddev->dev_sectors = sb->size * 2;
+		mddev->dev_sectors = ((sector_t)sb->size) * 2;
 		mddev->events = ev1;
 		mddev->bitmap_info.offset = 0;
 		mddev->bitmap_info.default_offset = MD_SB_BYTES >> 9;
@@ -1415,6 +1418,11 @@ super_90_rdev_size_change(mdk_rdev_t *rdev, sector_t num_sectors)
 	rdev->sb_start = calc_dev_sboffset(rdev);
 	if (!num_sectors || num_sectors > rdev->sb_start)
 		num_sectors = rdev->sb_start;
+	/* Limit to 4TB as metadata cannot record more than that.
+	 * 4TB == 2^32 KB, or 2*2^32 sectors.
+	 */
+	if (num_sectors >= (2ULL << 32))
+		num_sectors = (2ULL << 32) - 2;
 	md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size,
 		       rdev->sb_page);
 	md_super_wait(rdev->mddev);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-08  1:10         ` NeilBrown
@ 2011-09-08 13:44           ` Pim Zandbergen
  0 siblings, 0 replies; 21+ messages in thread
From: Pim Zandbergen @ 2011-09-08 13:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: Doug Ledford, linux-raid



On 8-9-2011 3:10, NeilBrown wrote:
> So your data*should*  have been back exactly as it was.  I am at a loss to
> explain why it is not.
The array contained an LVM VG that would not activate until grown back.
After growing back,
- one ext4 LV was perfectly intact
- one other could be fsck'd back to life without any damage
- a third one could be fsck'd back with leaving some stuff in lost+found
- three others were beyond repair.

The VG was as old as the array itself; the LV's were pretty fragmented.

It looked like the ext4 superblocks were shifted. I could see the superblock
with hexdump, but mount would not. fsck first had to repair the superblock
before anything else.

So my report that data was "wiped" by the sync process was incorrect.

> You ask for 'max' and it just says 'max' to the kernel.
> The kernel needs to do the testing - and currently it doesn't.
I hope/assume this is no problem for my newly created array.
>
> Thanks for your report, and my apologies for your loss.
No need to apologize ; the limitation was documented, and I could have
upgraded the metadata without data loss, had I waited longer for advice.
And I did have off-site backups for the important stuff.

I'm just reporting this so others may be spared this experience.

Thanks,
Pim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: freshly grown array shrinks after first reboot - major data loss
  2011-09-02 11:06               ` John Robinson
@ 2011-09-09 19:30                 ` Bill Davidsen
  0 siblings, 0 replies; 21+ messages in thread
From: Bill Davidsen @ 2011-09-09 19:30 UTC (permalink / raw)
  To: John Robinson; +Cc: Pim Zandbergen, Doug Ledford, linux-raid, Neil Brown

John Robinson wrote:
> On 02/09/2011 10:19, Pim Zandbergen wrote:
>> On 09/01/2011 09:41 PM, Doug Ledford wrote:
>>>> I could file a bug on bugzilla.redhat.com if that would help.
>>>
>>> Feel free, it helps me track things.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=735306
>
> I'm not sure whether it's just the --grow that should complain, or 
> perhaps the earlier step of
>   mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB
> should also complain (even if it'll work with less than 2TiB in use, 
> it ought to tell the user they won't be able to grow the array).
>
Perhaps Neil can confirm, but the limitation seems to be using 2TB as an 
array member size, I am reasonably sure that if you had partitioned the 
drives into two 1.5TB partitions you could have created the array just fine.

Note that this is just a speculation, not a suggestion to allow using 
0.90 metadata, and I do realize that this array was created in the dark 
ages, not being created new.

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-09-09 19:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 15:28 freshly grown array shrinks after first reboot - major data loss Pim Zandbergen
2011-09-01 16:12 ` Pim Zandbergen
2011-09-01 16:16   ` Pim Zandbergen
2011-09-01 16:48     ` John Robinson
2011-09-01 17:21       ` Pim Zandbergen
2011-09-02  9:02         ` Pim Zandbergen
2011-09-02 10:33           ` Mikael Abrahamsson
2011-09-05 10:47             ` Pim Zandbergen
2011-09-01 16:31   ` Doug Ledford
2011-09-01 17:44     ` Pim Zandbergen
2011-09-01 18:17       ` Doug Ledford
2011-09-01 18:52         ` Pim Zandbergen
2011-09-01 19:41           ` Doug Ledford
2011-09-02  9:19             ` Pim Zandbergen
2011-09-02 11:06               ` John Robinson
2011-09-09 19:30                 ` Bill Davidsen
2011-09-08  1:10         ` NeilBrown
2011-09-08 13:44           ` Pim Zandbergen
2011-09-02  5:32       ` Simon Matthews
2011-09-02  8:53         ` Pim Zandbergen
2011-09-01 17:03   ` Robin Hill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.