* freshly grown array shrinks after first reboot - major data loss @ 2011-09-01 15:28 Pim Zandbergen 2011-09-01 16:12 ` Pim Zandbergen 0 siblings, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 15:28 UTC (permalink / raw) To: linux-raid I replaced every 2TB drive of my 7-drive RAID-5 array with 3TB drives. After the last replacement I could grow the array from 12 TB to 18 TB using mdadm --grow /dev/md0 --size max That worked: md0: detected capacity change from 12002386771968 to 18003551059968 It worked for quite a while, until the machine had to be rebooted. It shrunk: md0: detected capacity change from 0 to 4809411526656 The LVM volume group on this array would not be activated until I repeated the mdadm command. It grew back to the original size. md0: detected capacity change from 4809411526656 to 18003551059968 However, this caused major data loss, as everything beyond the perceived 4.8 TB size was wiped by the sync process. This happened on Fedora 15, using kernel-2.6.38.6-27.fc15.x86_64 and mdadm-3.2.2-6.fc15.x86_64. The drives are Hitachi Deskstar 7K3000 HDS723030ALA640. The adapter is an LSI Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) (LSI SAS 9211-8i). I had to buy this adapter as my old SAS1068 based card would not support 3TB drives. I can probably fix this by creating a fresh new array and then start restoring my backups, but now is the time to seek for the cause of this. I can reproduce this on demand. I can grow the array again, and it will shrink immediately after the next reboot. What should I do to find the cause? Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 15:28 freshly grown array shrinks after first reboot - major data loss Pim Zandbergen @ 2011-09-01 16:12 ` Pim Zandbergen 2011-09-01 16:16 ` Pim Zandbergen ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 16:12 UTC (permalink / raw) To: linux-raid On 09/01/2011 05:28 PM, Pim Zandbergen wrote: > > > What should I do to find the cause? Additional information: Both the original 2TB drives as well as the new 3TB drives were GPT formatted with partition type FD00 This is information about the currently shrunk array: # mdadm --detail /dev/md0 /dev/md0: Version : 0.90 Creation Time : Wed Feb 8 23:22:15 2006 Raid Level : raid5 Array Size : 4696690944 (4479.11 GiB 4809.41 GB) Used Dev Size : 782781824 (746.52 GiB 801.57 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Aug 30 21:50:50 2011 State : clean Active Devices : 7 Working Devices : 7 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1 Events : 0.3157574 Number Major Minor RaidDevice State 0 8 161 0 active sync /dev/sdk1 1 8 177 1 active sync /dev/sdl1 2 8 193 2 active sync /dev/sdm1 3 8 145 3 active sync /dev/sdj1 4 8 209 4 active sync /dev/sdn1 5 8 225 5 active sync /dev/sdo1 6 8 129 6 active sync /dev/sdi1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:12 ` Pim Zandbergen @ 2011-09-01 16:16 ` Pim Zandbergen 2011-09-01 16:48 ` John Robinson 2011-09-01 16:31 ` Doug Ledford 2011-09-01 17:03 ` Robin Hill 2 siblings, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 16:16 UTC (permalink / raw) To: linux-raid More info: # gdisk -l /dev/sdk GPT fdisk (gdisk) version 0.7.2 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdk: 5860533168 sectors, 2.7 TiB Logical sector size: 512 bytes Disk identifier (GUID): BEEBC2FD-A959-4292-8115-AEFA06E0978E Partition table holds up to 128 entries First usable sector is 34, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 5860533134 2.7 TiB FD00 Linux RAID # mdadm --examine /dev/sdk1 /dev/sdk1: Magic : a92b4efc Version : 0.90.03 UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1 Creation Time : Wed Feb 8 23:22:15 2006 Raid Level : raid5 Used Dev Size : 782781824 (746.52 GiB 801.57 GB) Array Size : 4696690944 (4479.11 GiB 4809.41 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 0 Update Time : Thu Sep 1 18:11:08 2011 State : clean Active Devices : 7 Working Devices : 7 Failed Devices : 0 Spare Devices : 0 Checksum : 7698c20e - correct Events : 3157574 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 161 0 active sync /dev/sdk1 0 0 8 161 0 active sync /dev/sdk1 1 1 8 177 1 active sync /dev/sdl1 2 2 8 193 2 active sync /dev/sdm1 3 3 8 145 3 active sync /dev/sdj1 4 4 8 209 4 active sync /dev/sdn1 5 5 8 225 5 active sync /dev/sdo1 6 6 8 129 6 active sync /dev/sdi1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:16 ` Pim Zandbergen @ 2011-09-01 16:48 ` John Robinson 2011-09-01 17:21 ` Pim Zandbergen 0 siblings, 1 reply; 21+ messages in thread From: John Robinson @ 2011-09-01 16:48 UTC (permalink / raw) To: Pim Zandbergen; +Cc: linux-raid On 01/09/2011 17:16, Pim Zandbergen wrote: > # gdisk -l /dev/sdk [...] > Number Start (sector) End (sector) Size Code Name > 1 2048 5860533134 2.7 TiB FD00 Linux RAID Partition type FD is only for metadata 0.90 arrays to be auto-assembled by the kernel. This is now deprecated; you should be using partition type DA (Non-FS data) and an initrd to assemble your arrays. > # mdadm --examine /dev/sdk1 > /dev/sdk1: > Magic : a92b4efc > Version : 0.90.03 Metadata version 0.90 does not support devices over 2TiB. I think it's a bug that you weren't warned at some point. Cheers, John. -- John Robinson, yuiop IT services 0131 557 9577 / 07771 784 058 46/12 Broughton Road, Edinburgh EH7 4EE ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:48 ` John Robinson @ 2011-09-01 17:21 ` Pim Zandbergen 2011-09-02 9:02 ` Pim Zandbergen 0 siblings, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 17:21 UTC (permalink / raw) To: John Robinson; +Cc: linux-raid On 1-9-2011 6:48, John Robinson wrote: > you should be using partition type DA (Non-FS data) using gdisk (GPT) or fdisk (MBR) ? > and an initrd to assemble your arrays. Booting from the array is not required. I guess the Fedora init scripts will assemble the array from /etc/mdadm.conf Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 17:21 ` Pim Zandbergen @ 2011-09-02 9:02 ` Pim Zandbergen 2011-09-02 10:33 ` Mikael Abrahamsson 0 siblings, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-02 9:02 UTC (permalink / raw) To: John Robinson; +Cc: linux-raid On 09/01/2011 07:21 PM, Pim Zandbergen wrote: > On 1-9-2011 6:48, John Robinson wrote: >> you should be using partition type DA (Non-FS data) > using gdisk (GPT) or fdisk (MBR) ? I tried gdisk but does not know about DA00. I tried fdisk and created an array from the resulting partitions. That would only use the first 2TB of the 3TB disks. Then I tried fdisk, but used the whole disks for the array. That seems to work, although mdadm gave a lot of warnings about the fact that the drives were partitioned. The partition table does not seem to be wiped, however. Is the latter way the way it is supposed to be done now? Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-02 9:02 ` Pim Zandbergen @ 2011-09-02 10:33 ` Mikael Abrahamsson 2011-09-05 10:47 ` Pim Zandbergen 0 siblings, 1 reply; 21+ messages in thread From: Mikael Abrahamsson @ 2011-09-02 10:33 UTC (permalink / raw) To: Pim Zandbergen; +Cc: John Robinson, linux-raid On Fri, 2 Sep 2011, Pim Zandbergen wrote: > Is the latter way the way it is supposed to be done now? I've used whole drives the past years, it's worked great. You avoid all the hassle of handling partitions and alignment. So yes, go for the whole device approach. I would make sure the partition table is wiped and that I was using v1.2 superblocks (default by now). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-02 10:33 ` Mikael Abrahamsson @ 2011-09-05 10:47 ` Pim Zandbergen 0 siblings, 0 replies; 21+ messages in thread From: Pim Zandbergen @ 2011-09-05 10:47 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: John Robinson, linux-raid On 09/02/2011 12:33 PM, Mikael Abrahamsson wrote: > So yes, go for the whole device approach. I would make sure the > partition table is wiped and that I was using v1.2 superblocks > (default by now). Could I have both? That is, add the whole device to the array, yet have a protective partition table? I like the idea of having a protective partition table, similar to the EE type that protects GPT partitions from non-GPT aware partitioning software or OS's. It looks like the 1.2 superblock allows just that, as it starts 4k past the start. So, would it be wise to add the whole device to an array, using 1.2 metadata, with a fake partition table (type DA) ? Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:12 ` Pim Zandbergen 2011-09-01 16:16 ` Pim Zandbergen @ 2011-09-01 16:31 ` Doug Ledford 2011-09-01 17:44 ` Pim Zandbergen 2011-09-01 17:03 ` Robin Hill 2 siblings, 1 reply; 21+ messages in thread From: Doug Ledford @ 2011-09-01 16:31 UTC (permalink / raw) To: Pim Zandbergen; +Cc: linux-raid On 09/01/2011 12:12 PM, Pim Zandbergen wrote: > On 09/01/2011 05:28 PM, Pim Zandbergen wrote: >> >> >> What should I do to find the cause? > > Additional information: > > Both the original 2TB drives as well as the new 3TB drives were GPT > formatted with partition type FD00 > > This is information about the currently shrunk array: > > > # mdadm --detail /dev/md0 > /dev/md0: > Version : 0.90 Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15 will not create this version of raid array by default. There is a reason we have updated to a new superblock. Does this problem still occur if you use a newer superblock format (one of the version 1.x versions)? > Creation Time : Wed Feb 8 23:22:15 2006 > Raid Level : raid5 > Array Size : 4696690944 (4479.11 GiB 4809.41 GB) > Used Dev Size : 782781824 (746.52 GiB 801.57 GB) This looks like some sort of sector count wrap, which might be related to version 0.90 superblock usage. 3TB - 2.2TB (roughly the wrap point) = 800GB, which is precisely how much of each device you are using to create a 4.8TB array. > Raid Devices : 7 > Total Devices : 7 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Tue Aug 30 21:50:50 2011 > State : clean > Active Devices : 7 > Working Devices : 7 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1 > Events : 0.3157574 > > Number Major Minor RaidDevice State > 0 8 161 0 active sync /dev/sdk1 > 1 8 177 1 active sync /dev/sdl1 > 2 8 193 2 active sync /dev/sdm1 > 3 8 145 3 active sync /dev/sdj1 > 4 8 209 4 active sync /dev/sdn1 > 5 8 225 5 active sync /dev/sdo1 > 6 8 129 6 active sync /dev/sdi1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:31 ` Doug Ledford @ 2011-09-01 17:44 ` Pim Zandbergen 2011-09-01 18:17 ` Doug Ledford 2011-09-02 5:32 ` Simon Matthews 0 siblings, 2 replies; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 17:44 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On 09/01/2011 06:31 PM, Doug Ledford wrote: > Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15 > will not create this version of raid array by default. There is a > reason we have updated to a new superblock. As you may have seen, the array was created in 2006, and has gone through several similar grow procedures. > Does this problem still occur if you use a newer superblock format > (one of the version 1.x versions)? I suppose not. But that would destroy the "evidence" of a possible bug. For me, it's too late, but finding it could help others to prevent this situation. If there's anything I could do to help find it, now is the time. If the people on this list know enough, I will proceed. Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 17:44 ` Pim Zandbergen @ 2011-09-01 18:17 ` Doug Ledford 2011-09-01 18:52 ` Pim Zandbergen 2011-09-08 1:10 ` NeilBrown 2011-09-02 5:32 ` Simon Matthews 1 sibling, 2 replies; 21+ messages in thread From: Doug Ledford @ 2011-09-01 18:17 UTC (permalink / raw) To: Pim Zandbergen; +Cc: linux-raid On 09/01/2011 01:44 PM, Pim Zandbergen wrote: > On 09/01/2011 06:31 PM, Doug Ledford wrote: >> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15 >> will not create this version of raid array by default. There is a >> reason we have updated to a new superblock. > > As you may have seen, the array was created in 2006, and has gone through > several similar grow procedures. Even so, one of the original limitations of the 0.90 superblock was maximum usable device size. I'm not entirely sure that growing a 0.90 superblock past 2TB wasn't the source of your problem and that the bug that needs fixed is that mdadm should have refused to grow a 0.90 superblock based array beyond the 2TB limit. Neil would have to speak to that. >> Does this problem still occur if you use a newer superblock format >> (one of the version 1.x versions)? > > I suppose not. But that would destroy the "evidence" of a possible bug. > For me, it's too late, but finding it could help others to prevent this > situation. > If there's anything I could do to help find it, now is the time. > > If the people on this list know enough, I will proceed. > > Thanks, > Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 18:17 ` Doug Ledford @ 2011-09-01 18:52 ` Pim Zandbergen 2011-09-01 19:41 ` Doug Ledford 2011-09-08 1:10 ` NeilBrown 1 sibling, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-01 18:52 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On 09/01/2011 08:17 PM, Doug Ledford wrote: > the bug that needs fixed is that mdadm should have refused to grow a > 0.90 superblock based array beyond the 2TB limit Yes, that's exactly what I am aiming for. I could file a bug on bugzilla.redhat.com if that would help. I'm not sure whether I need to keep my hosed array around in order to be able to reproduce things. Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 18:52 ` Pim Zandbergen @ 2011-09-01 19:41 ` Doug Ledford 2011-09-02 9:19 ` Pim Zandbergen 0 siblings, 1 reply; 21+ messages in thread From: Doug Ledford @ 2011-09-01 19:41 UTC (permalink / raw) To: Pim Zandbergen, linux-raid On 09/01/2011 02:52 PM, Pim Zandbergen wrote: > On 09/01/2011 08:17 PM, Doug Ledford wrote: >> the bug that needs fixed is that mdadm should have refused to grow a >> 0.90 superblock based array beyond the 2TB limit > Yes, that's exactly what I am aiming for. > > I could file a bug on bugzilla.redhat.com if that would help. Feel free, it helps me track things. > I'm not sure whether I need to keep my hosed array around > in order to be able to reproduce things. I don't think that's necessary at this point. It seems pretty obvious what's going on and should be easy to reproduce. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 19:41 ` Doug Ledford @ 2011-09-02 9:19 ` Pim Zandbergen 2011-09-02 11:06 ` John Robinson 0 siblings, 1 reply; 21+ messages in thread From: Pim Zandbergen @ 2011-09-02 9:19 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On 09/01/2011 09:41 PM, Doug Ledford wrote: >> I could file a bug on bugzilla.redhat.com if that would help. > > Feel free, it helps me track things. https://bugzilla.redhat.com/show_bug.cgi?id=735306 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-02 9:19 ` Pim Zandbergen @ 2011-09-02 11:06 ` John Robinson 2011-09-09 19:30 ` Bill Davidsen 0 siblings, 1 reply; 21+ messages in thread From: John Robinson @ 2011-09-02 11:06 UTC (permalink / raw) To: Pim Zandbergen; +Cc: Doug Ledford, linux-raid On 02/09/2011 10:19, Pim Zandbergen wrote: > On 09/01/2011 09:41 PM, Doug Ledford wrote: >>> I could file a bug on bugzilla.redhat.com if that would help. >> >> Feel free, it helps me track things. > > https://bugzilla.redhat.com/show_bug.cgi?id=735306 I'm not sure whether it's just the --grow that should complain, or perhaps the earlier step of mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB should also complain (even if it'll work with less than 2TiB in use, it ought to tell the user they won't be able to grow the array). Cheers, John. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-02 11:06 ` John Robinson @ 2011-09-09 19:30 ` Bill Davidsen 0 siblings, 0 replies; 21+ messages in thread From: Bill Davidsen @ 2011-09-09 19:30 UTC (permalink / raw) To: John Robinson; +Cc: Pim Zandbergen, Doug Ledford, linux-raid, Neil Brown John Robinson wrote: > On 02/09/2011 10:19, Pim Zandbergen wrote: >> On 09/01/2011 09:41 PM, Doug Ledford wrote: >>>> I could file a bug on bugzilla.redhat.com if that would help. >>> >>> Feel free, it helps me track things. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=735306 > > I'm not sure whether it's just the --grow that should complain, or > perhaps the earlier step of > mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB > should also complain (even if it'll work with less than 2TiB in use, > it ought to tell the user they won't be able to grow the array). > Perhaps Neil can confirm, but the limitation seems to be using 2TB as an array member size, I am reasonably sure that if you had partitioned the drives into two 1.5TB partitions you could have created the array just fine. Note that this is just a speculation, not a suggestion to allow using 0.90 metadata, and I do realize that this array was created in the dark ages, not being created new. -- Bill Davidsen<davidsen@tmr.com> We are not out of the woods yet, but we know the direction and have taken the first step. The steps are many, but finite in number, and if we persevere we will reach our destination. -me, 2010 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 18:17 ` Doug Ledford 2011-09-01 18:52 ` Pim Zandbergen @ 2011-09-08 1:10 ` NeilBrown 2011-09-08 13:44 ` Pim Zandbergen 1 sibling, 1 reply; 21+ messages in thread From: NeilBrown @ 2011-09-08 1:10 UTC (permalink / raw) To: Doug Ledford, Pim Zandbergen; +Cc: linux-raid On Thu, 01 Sep 2011 14:17:22 -0400 Doug Ledford <dledford@redhat.com> wrote: > On 09/01/2011 01:44 PM, Pim Zandbergen wrote: > > On 09/01/2011 06:31 PM, Doug Ledford wrote: > >> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15 > >> will not create this version of raid array by default. There is a > >> reason we have updated to a new superblock. > > > > As you may have seen, the array was created in 2006, and has gone through > > several similar grow procedures. > > Even so, one of the original limitations of the 0.90 superblock was > maximum usable device size. I'm not entirely sure that growing a 0.90 > superblock past 2TB wasn't the source of your problem and that the bug > that needs fixed is that mdadm should have refused to grow a 0.90 > superblock based array beyond the 2TB limit. Neil would have to speak > to that. I finally had time to look into this problem. I'm ashamed to say there is a serious bug here that I should have found and fixed some time ago, but didn't look problem. However I don't understand why you lost any data. The 0.90 metadata uses an unsigned 32bit number to record the number of kilobytes used per device. This should allow devices up to 4TB. I don't know where the "2TB" came from. Maybe I thought something was signed? or maybe I just didn't think. However in 2.6.29 a bug was introduced in the handling of the count. It is best to keep everything in the same units and the preferred units for devices seems to be 512byte sectors so we changed md to record the available size on a device in sectors. So for 0.90 metadata this is: rdev->sectors = sb->size * 2; Do you see the bug? It will multiple size (a u32) by 2 before casting it to a sector_t, so we lose the high bit. This should have been rdev->sectors = ((sector_t)sb->size)*2; and will be after I submit a patch. However this should not lead to any data corruption. When you reassemble the array after reboot it will be 2TB per device smaller than it should be (which is exactly what we see: 18003551059968-4809411526656 == 2*10^40*(7-1)) so some data will be missing. But when you increase the size again it will check the parity of the "new" space but as that is all correct it will not change anything. So your data *should* have been back exactly as it was. I am at a loss to explain why it is not. I will add a test to mdadm to discourage you from adding 4+TB devices to 0.90 metadata, or 2+TB devices for 3.0 and earlier kernels. I might also add a test to discourage growing an array beyond 2TB on kernels before 3.1. That is more awkward as mdadm doesn't really know how big you are growing it to. You ask for 'max' and it just says 'max' to the kernel. The kernel needs to do the testing - and currently it doesn't. Anyway the following patch will be on its way to Linus in a day or two. Thanks for your report, and my apologies for your loss. NeilBrown From 24e9c8d1a620159df73f9b4a545cae668b6285ef Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Thu, 8 Sep 2011 10:54:34 +1000 Subject: [PATCH] md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. 0.90 metadata uses an unsigned 32bit number to count the number of kilobytes used from each device. This should allow up to 4TB per device. However we multiply this by 2 (to get sectors) before casting to a larger type, so sizes above 2TB get truncated. Also we allow rdev->sectors to be larger than 4TB, so it is possible for the array to be resized larger than the metadata can handle. So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in used. Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl> Signed-off-by: NeilBrown <neilb@suse.de> diff --git a/drivers/md/md.c b/drivers/md/md.c index 3742ce8..63f71cc 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1138,8 +1138,11 @@ static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version ret = 0; } rdev->sectors = rdev->sb_start; + /* Limit to 4TB as metadata cannot record more than that */ + if (rdev->sectors >= (2ULL << 32)) + rdev->sectors = (2ULL << 32) - 2; - if (rdev->sectors < sb->size * 2 && sb->level > 1) + if (rdev->sectors < ((sector_t)sb->size) * 2 && sb->level > 1) /* "this cannot possibly happen" ... */ ret = -EINVAL; @@ -1173,7 +1176,7 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) mddev->clevel[0] = 0; mddev->layout = sb->layout; mddev->raid_disks = sb->raid_disks; - mddev->dev_sectors = sb->size * 2; + mddev->dev_sectors = ((sector_t)sb->size) * 2; mddev->events = ev1; mddev->bitmap_info.offset = 0; mddev->bitmap_info.default_offset = MD_SB_BYTES >> 9; @@ -1415,6 +1418,11 @@ super_90_rdev_size_change(mdk_rdev_t *rdev, sector_t num_sectors) rdev->sb_start = calc_dev_sboffset(rdev); if (!num_sectors || num_sectors > rdev->sb_start) num_sectors = rdev->sb_start; + /* Limit to 4TB as metadata cannot record more than that. + * 4TB == 2^32 KB, or 2*2^32 sectors. + */ + if (num_sectors >= (2ULL << 32)) + num_sectors = (2ULL << 32) - 2; md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size, rdev->sb_page); md_super_wait(rdev->mddev); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-08 1:10 ` NeilBrown @ 2011-09-08 13:44 ` Pim Zandbergen 0 siblings, 0 replies; 21+ messages in thread From: Pim Zandbergen @ 2011-09-08 13:44 UTC (permalink / raw) To: NeilBrown; +Cc: Doug Ledford, linux-raid On 8-9-2011 3:10, NeilBrown wrote: > So your data*should* have been back exactly as it was. I am at a loss to > explain why it is not. The array contained an LVM VG that would not activate until grown back. After growing back, - one ext4 LV was perfectly intact - one other could be fsck'd back to life without any damage - a third one could be fsck'd back with leaving some stuff in lost+found - three others were beyond repair. The VG was as old as the array itself; the LV's were pretty fragmented. It looked like the ext4 superblocks were shifted. I could see the superblock with hexdump, but mount would not. fsck first had to repair the superblock before anything else. So my report that data was "wiped" by the sync process was incorrect. > You ask for 'max' and it just says 'max' to the kernel. > The kernel needs to do the testing - and currently it doesn't. I hope/assume this is no problem for my newly created array. > > Thanks for your report, and my apologies for your loss. No need to apologize ; the limitation was documented, and I could have upgraded the metadata without data loss, had I waited longer for advice. And I did have off-site backups for the important stuff. I'm just reporting this so others may be spared this experience. Thanks, Pim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 17:44 ` Pim Zandbergen 2011-09-01 18:17 ` Doug Ledford @ 2011-09-02 5:32 ` Simon Matthews 2011-09-02 8:53 ` Pim Zandbergen 1 sibling, 1 reply; 21+ messages in thread From: Simon Matthews @ 2011-09-02 5:32 UTC (permalink / raw) To: Pim Zandbergen; +Cc: Doug Ledford, linux-raid On Thu, Sep 1, 2011 at 10:44 AM, Pim Zandbergen <P.Zandbergen@macroscoop.nl> wrote: > On 09/01/2011 06:31 PM, Doug Ledford wrote: >> >> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15 will >> not create this version of raid array by default. There is a reason we have >> updated to a new superblock. > > As you may have seen, the array was created in 2006, and has gone through > several similar grow procedures. > >> Does this problem still occur if you use a newer superblock format (one of >> the version 1.x versions)? > > I suppose not. But that would destroy the "evidence" of a possible bug. > For me, it's too late, but finding it could help others to prevent this > situation. > If there's anything I could do to help find it, now is the time. > > If the people on this list know enough, I will proceed. > > Thanks, > Pim I ran into this exact problem some weeks ago. I don't recall any error or warning messages about growing the array to use 3TB partitions and Neil acknowledge that this was a bug. He also gave instructions on how to recover from this situation and re-start the array using 1.0 metadata. Here is Neil's comment from that thread: -------------------------------------------------------------------- Oopps. That array is using 0.90 metadata which can only handle up to 2TB devices. The 'resize' code should catch that you are asking the impossible, but it doesn't it seems. You need to simply recreate the array as 1.0. i.e. mdadm -S /dev/md5 mdadm -C /dev/md5 --metadata 1.0 -l1 -n2 --assume-clean Then all should be happiness. > -------------------------------------------------------------------- Simon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-02 5:32 ` Simon Matthews @ 2011-09-02 8:53 ` Pim Zandbergen 0 siblings, 0 replies; 21+ messages in thread From: Pim Zandbergen @ 2011-09-02 8:53 UTC (permalink / raw) To: linux-raid On 09/02/2011 07:32 AM, Simon Matthews wrote: > He also gave instructions on how > to recover from this situation and re-start the array using 1.0 > metadata. If only I had been patient and had not not tried to grow the array back.. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: freshly grown array shrinks after first reboot - major data loss 2011-09-01 16:12 ` Pim Zandbergen 2011-09-01 16:16 ` Pim Zandbergen 2011-09-01 16:31 ` Doug Ledford @ 2011-09-01 17:03 ` Robin Hill 2 siblings, 0 replies; 21+ messages in thread From: Robin Hill @ 2011-09-01 17:03 UTC (permalink / raw) To: Pim Zandbergen; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1128 bytes --] On Thu Sep 01, 2011 at 06:12:35PM +0200, Pim Zandbergen wrote: > On 09/01/2011 05:28 PM, Pim Zandbergen wrote: > > > > > > What should I do to find the cause? > > Additional information: > > Both the original 2TB drives as well as the new 3TB drives were GPT > formatted with partition type FD00 > > This is information about the currently shrunk array: > > > # mdadm --detail /dev/md0 > /dev/md0: > Version : 0.90 > Creation Time : Wed Feb 8 23:22:15 2006 > Raid Level : raid5 > Looks like there's a bug somewhere. The documentation says that 0.90 metadata doesn't support >2TB components for RAID levels 1 and above. If this is still correct, mdadm should have prevented you growing the array in the first place. I'd suggest recreating the array with 1.x metadata instead and checking whether that runs into the same issue. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2011-09-09 19:30 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-01 15:28 freshly grown array shrinks after first reboot - major data loss Pim Zandbergen 2011-09-01 16:12 ` Pim Zandbergen 2011-09-01 16:16 ` Pim Zandbergen 2011-09-01 16:48 ` John Robinson 2011-09-01 17:21 ` Pim Zandbergen 2011-09-02 9:02 ` Pim Zandbergen 2011-09-02 10:33 ` Mikael Abrahamsson 2011-09-05 10:47 ` Pim Zandbergen 2011-09-01 16:31 ` Doug Ledford 2011-09-01 17:44 ` Pim Zandbergen 2011-09-01 18:17 ` Doug Ledford 2011-09-01 18:52 ` Pim Zandbergen 2011-09-01 19:41 ` Doug Ledford 2011-09-02 9:19 ` Pim Zandbergen 2011-09-02 11:06 ` John Robinson 2011-09-09 19:30 ` Bill Davidsen 2011-09-08 1:10 ` NeilBrown 2011-09-08 13:44 ` Pim Zandbergen 2011-09-02 5:32 ` Simon Matthews 2011-09-02 8:53 ` Pim Zandbergen 2011-09-01 17:03 ` Robin Hill
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.