* Accidentally resized array to 9 @ 2017-09-29 4:23 Eli Ben-Shoshan 2017-09-29 12:38 ` John Stoffel 2017-09-29 12:55 ` Roman Mamedov 0 siblings, 2 replies; 13+ messages in thread From: Eli Ben-Shoshan @ 2017-09-29 4:23 UTC (permalink / raw) To: linux-raid I need to add another disk to my array (/dev/md128) when I accidentally did an array resize to 9 with the following command: First I add the disk to the array with the following: mdadm --manage /dev/md128 --add /dev/sdl This was a RAID6 with 8 devices. Instead of using --grow with --raid-devices set to 9, I did the following: mdadm --grow /dev/md128 --size 9 This happily returned without any errors so I went to go look at /proc/mdstat and did not see a resize operation going. So I shook my head and read the output of --grow --help and did the right thing which is: mdadm --grow /dev/md128 --raid-devices=9 Right after that everything hit the fan. dmesg reported a lot of filesystem errors. I quickly stopped all processes that were using this device and unmounted the filesystems. I then, stupidly, decided to reboot before looking around. I am now booted and can assemble this array but it seems like there is no data there. Here is the output of --misc --examine: ganon raid # cat md128 /dev/md128: Version : 1.2 Creation Time : Sat Aug 30 22:01:09 2014 Raid Level : raid6 Used Dev Size : unknown Raid Devices : 9 Total Devices : 9 Persistence : Superblock is persistent Update Time : Thu Sep 28 19:44:39 2017 State : clean, Not Started Active Devices : 9 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : ganon:ganon - large raid6 (local to host ganon) UUID : 2b3f41d5:ac904000:965be496:dd3ae4ae Events : 84345 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 48 1 active sync /dev/sdd 6 8 128 2 active sync /dev/sdi 3 8 96 3 active sync /dev/sdg 4 8 80 4 active sync /dev/sdf 8 8 160 5 active sync /dev/sdk 7 8 64 6 active sync /dev/sde 9 8 112 7 active sync /dev/sdh 10 8 176 8 active sync /dev/sdl You will note that the "Used Dev Size" is unknown. The output of --misc --examine on each disk looks similar to this: /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 2b3f41d5:ac904000:965be496:dd3ae4ae Name : ganon:ganon - large raid6 (local to host ganon) Creation Time : Sat Aug 30 22:01:09 2014 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB) Array Size : 0 Used Dev Size : 0 Data Offset : 239616 sectors Super Offset : 8 sectors Unused Space : before=239528 sectors, after=3906789552 sectors State : clean Device UUID : b1bd681a:36849191:b3fdad44:22567d99 Update Time : Thu Sep 28 19:44:39 2017 Bad Block Log : 512 entries available at offset 72 sectors Checksum : bca7b1d5 - correct Events : 84345 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) I followed directions to create overlays and I tried to re-create the array with the following: mdadm --create /dev/md150 --assume-clean --metadata=1.2 --data-offset=117M --level=6 --layout=ls --chunk=512 --raid-devices=9 /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sdi /dev/mapper/sdg /dev/mapper/sdf /dev/mapper/sdk /dev/mapper/sde /dev/mapper/sdh /dev/mapper/sdl while this creates a /dev/md150, it is basically empty. There should be an LVM PV label on this disk but pvck returns: Could not find LVM label on /dev/md150 The output of --misc --examine looks like this with the overlay: /dev/md150: Version : 1.2 Creation Time : Fri Sep 29 00:22:11 2017 Raid Level : raid6 Array Size : 13673762816 (13040.32 GiB 14001.93 GB) Used Dev Size : 1953394688 (1862.90 GiB 2000.28 GB) Raid Devices : 9 Total Devices : 9 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 29 00:22:11 2017 State : clean Active Devices : 9 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : ganon:150 (local to host ganon) UUID : 84098bfe:74c1f70c:958a7d8a:ccb2ef74 Events : 0 Number Major Minor RaidDevice State 0 252 11 0 active sync /dev/dm-11 1 252 9 1 active sync /dev/dm-9 2 252 16 2 active sync /dev/dm-16 3 252 17 3 active sync /dev/dm-17 4 252 10 4 active sync /dev/dm-10 5 252 14 5 active sync /dev/dm-14 6 252 12 6 active sync /dev/dm-12 7 252 13 7 active sync /dev/dm-13 8 252 15 8 active sync /dev/dm-15 What do you think? Am I hosed here? Is there any way I can get my data back? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 4:23 Accidentally resized array to 9 Eli Ben-Shoshan @ 2017-09-29 12:38 ` John Stoffel 2017-09-29 14:47 ` Eli Ben-Shoshan 2017-09-29 12:55 ` Roman Mamedov 1 sibling, 1 reply; 13+ messages in thread From: John Stoffel @ 2017-09-29 12:38 UTC (permalink / raw) To: Eli Ben-Shoshan; +Cc: linux-raid >>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: Eli> I need to add another disk to my array (/dev/md128) when I accidentally Eli> did an array resize to 9 with the following command: Eli> First I add the disk to the array with the following: Eli> mdadm --manage /dev/md128 --add /dev/sdl Eli> This was a RAID6 with 8 devices. Instead of using --grow with Eli> --raid-devices set to 9, I did the following: Eli> mdadm --grow /dev/md128 --size 9 Eli> This happily returned without any errors so I went to go look at Eli> /proc/mdstat and did not see a resize operation going. So I shook my Eli> head and read the output of --grow --help and did the right thing which is: Eli> mdadm --grow /dev/md128 --raid-devices=9 Eli> Right after that everything hit the fan. dmesg reported a lot of Eli> filesystem errors. I quickly stopped all processes that were using this Eli> device and unmounted the filesystems. I then, stupidly, decided to Eli> reboot before looking around. I think you *might* be able to fix this with just a simple: mdadm --grow /dev/md128 --size max And then try to scan for your LVM configuration, then fsck your volume on there. I hope you had backups. And maybe there should be a warning when re-sizing raid array elements without a --force option if going smaller than the current size? Eli> I am now booted and can assemble this array but it seems like there is Eli> no data there. Here is the output of --misc --examine: Eli> ganon raid # cat md128 Eli> /dev/md128: Eli> Version : 1.2 Eli> Creation Time : Sat Aug 30 22:01:09 2014 Eli> Raid Level : raid6 Eli> Used Dev Size : unknown Eli> Raid Devices : 9 Eli> Total Devices : 9 Eli> Persistence : Superblock is persistent Eli> Update Time : Thu Sep 28 19:44:39 2017 Eli> State : clean, Not Started Eli> Active Devices : 9 Eli> Working Devices : 9 Eli> Failed Devices : 0 Eli> Spare Devices : 0 Eli> Layout : left-symmetric Eli> Chunk Size : 512K Eli> Name : ganon:ganon - large raid6 (local to host ganon) Eli> UUID : 2b3f41d5:ac904000:965be496:dd3ae4ae Eli> Events : 84345 Eli> Number Major Minor RaidDevice State Eli> 0 8 32 0 active sync /dev/sdc Eli> 1 8 48 1 active sync /dev/sdd Eli> 6 8 128 2 active sync /dev/sdi Eli> 3 8 96 3 active sync /dev/sdg Eli> 4 8 80 4 active sync /dev/sdf Eli> 8 8 160 5 active sync /dev/sdk Eli> 7 8 64 6 active sync /dev/sde Eli> 9 8 112 7 active sync /dev/sdh Eli> 10 8 176 8 active sync /dev/sdl Eli> You will note that the "Used Dev Size" is unknown. The output of --misc Eli> --examine on each disk looks similar to this: Eli> /dev/sdc: Eli> Magic : a92b4efc Eli> Version : 1.2 Eli> Feature Map : 0x0 Eli> Array UUID : 2b3f41d5:ac904000:965be496:dd3ae4ae Eli> Name : ganon:ganon - large raid6 (local to host ganon) Eli> Creation Time : Sat Aug 30 22:01:09 2014 Eli> Raid Level : raid6 Eli> Raid Devices : 9 Eli> Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB) Eli> Array Size : 0 Eli> Used Dev Size : 0 Eli> Data Offset : 239616 sectors Eli> Super Offset : 8 sectors Eli> Unused Space : before=239528 sectors, after=3906789552 sectors Eli> State : clean Eli> Device UUID : b1bd681a:36849191:b3fdad44:22567d99 Eli> Update Time : Thu Sep 28 19:44:39 2017 Eli> Bad Block Log : 512 entries available at offset 72 sectors Eli> Checksum : bca7b1d5 - correct Eli> Events : 84345 Eli> Layout : left-symmetric Eli> Chunk Size : 512K Eli> Device Role : Active device 0 Eli> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == Eli> replacing) Eli> I followed directions to create overlays and I tried to re-create the Eli> array with the following: Eli> mdadm --create /dev/md150 --assume-clean --metadata=1.2 Eli> --data-offset=117M --level=6 --layout=ls --chunk=512 --raid-devices=9 Eli> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sdi /dev/mapper/sdg Eli> /dev/mapper/sdf /dev/mapper/sdk /dev/mapper/sde /dev/mapper/sdh Eli> /dev/mapper/sdl Eli> while this creates a /dev/md150, it is basically empty. There should be Eli> an LVM PV label on this disk but pvck returns: Eli> Could not find LVM label on /dev/md150 Eli> The output of --misc --examine looks like this with the overlay: Eli> /dev/md150: Eli> Version : 1.2 Eli> Creation Time : Fri Sep 29 00:22:11 2017 Eli> Raid Level : raid6 Eli> Array Size : 13673762816 (13040.32 GiB 14001.93 GB) Eli> Used Dev Size : 1953394688 (1862.90 GiB 2000.28 GB) Eli> Raid Devices : 9 Eli> Total Devices : 9 Eli> Persistence : Superblock is persistent Eli> Intent Bitmap : Internal Eli> Update Time : Fri Sep 29 00:22:11 2017 Eli> State : clean Eli> Active Devices : 9 Eli> Working Devices : 9 Eli> Failed Devices : 0 Eli> Spare Devices : 0 Eli> Layout : left-symmetric Eli> Chunk Size : 512K Eli> Name : ganon:150 (local to host ganon) Eli> UUID : 84098bfe:74c1f70c:958a7d8a:ccb2ef74 Eli> Events : 0 Eli> Number Major Minor RaidDevice State Eli> 0 252 11 0 active sync /dev/dm-11 Eli> 1 252 9 1 active sync /dev/dm-9 Eli> 2 252 16 2 active sync /dev/dm-16 Eli> 3 252 17 3 active sync /dev/dm-17 Eli> 4 252 10 4 active sync /dev/dm-10 Eli> 5 252 14 5 active sync /dev/dm-14 Eli> 6 252 12 6 active sync /dev/dm-12 Eli> 7 252 13 7 active sync /dev/dm-13 Eli> 8 252 15 8 active sync /dev/dm-15 Eli> What do you think? Am I hosed here? Is there any way I can get my data back? Eli> -- Eli> To unsubscribe from this list: send the line "unsubscribe linux-raid" in Eli> the body of a message to majordomo@vger.kernel.org Eli> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 12:38 ` John Stoffel @ 2017-09-29 14:47 ` Eli Ben-Shoshan 2017-09-29 19:33 ` John Stoffel 0 siblings, 1 reply; 13+ messages in thread From: Eli Ben-Shoshan @ 2017-09-29 14:47 UTC (permalink / raw) To: John Stoffel; +Cc: linux-raid On 09/29/2017 08:38 AM, John Stoffel wrote: >>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: > > Eli> I need to add another disk to my array (/dev/md128) when I accidentally > Eli> did an array resize to 9 with the following command: > > Eli> First I add the disk to the array with the following: > > Eli> mdadm --manage /dev/md128 --add /dev/sdl > > Eli> This was a RAID6 with 8 devices. Instead of using --grow with > Eli> --raid-devices set to 9, I did the following: > > Eli> mdadm --grow /dev/md128 --size 9 > > Eli> This happily returned without any errors so I went to go look at > Eli> /proc/mdstat and did not see a resize operation going. So I shook my > Eli> head and read the output of --grow --help and did the right thing which is: > > Eli> mdadm --grow /dev/md128 --raid-devices=9 > > Eli> Right after that everything hit the fan. dmesg reported a lot of > Eli> filesystem errors. I quickly stopped all processes that were using this > Eli> device and unmounted the filesystems. I then, stupidly, decided to > Eli> reboot before looking around. > > > I think you *might* be able to fix this with just a simple: > > mdadm --grow /dev/md128 --size max > > And then try to scan for your LVM configuration, then fsck your volume > on there. I hope you had backups. > > And maybe there should be a warning when re-sizing raid array elements > without a --force option if going smaller than the current size? I just tried that and got the following error: mdadm: Cannot set device size in this type of array Trying to go further down this path, I also tried to set the size explicitly with: mdadm --grow /dev/md150 --size 1953383512 but got: mdadm: Cannot set device size in this type of array I am curious if my data is actually still there on disk. What does the --size with --grow actually do? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 14:47 ` Eli Ben-Shoshan @ 2017-09-29 19:33 ` John Stoffel 2017-09-29 21:04 ` Eli Ben-Shoshan 0 siblings, 1 reply; 13+ messages in thread From: John Stoffel @ 2017-09-29 19:33 UTC (permalink / raw) To: Eli Ben-Shoshan; +Cc: John Stoffel, linux-raid >>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: Eli> On 09/29/2017 08:38 AM, John Stoffel wrote: >>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >> Eli> I need to add another disk to my array (/dev/md128) when I accidentally Eli> did an array resize to 9 with the following command: >> Eli> First I add the disk to the array with the following: >> Eli> mdadm --manage /dev/md128 --add /dev/sdl >> Eli> This was a RAID6 with 8 devices. Instead of using --grow with Eli> --raid-devices set to 9, I did the following: >> Eli> mdadm --grow /dev/md128 --size 9 >> Eli> This happily returned without any errors so I went to go look at Eli> /proc/mdstat and did not see a resize operation going. So I shook my Eli> head and read the output of --grow --help and did the right thing which is: >> Eli> mdadm --grow /dev/md128 --raid-devices=9 >> Eli> Right after that everything hit the fan. dmesg reported a lot of Eli> filesystem errors. I quickly stopped all processes that were using this Eli> device and unmounted the filesystems. I then, stupidly, decided to Eli> reboot before looking around. >> >> >> I think you *might* be able to fix this with just a simple: >> >> mdadm --grow /dev/md128 --size max >> >> And then try to scan for your LVM configuration, then fsck your volume >> on there. I hope you had backups. >> >> And maybe there should be a warning when re-sizing raid array elements >> without a --force option if going smaller than the current size? Eli> I just tried that and got the following error: Eli> mdadm: Cannot set device size in this type of array Eli> Trying to go further down this path, I also tried to set the size Eli> explicitly with: Eli> mdadm --grow /dev/md150 --size 1953383512 Eli> but got: Eli> mdadm: Cannot set device size in this type of array Eli> I am curious if my data is actually still there on disk. Eli> What does the --size with --grow actually do? It changes the size of each member of the array. The man page explains it, though not ... obviously. Are you still running with the overlays? That would explain why it can't resize them bigger. But I'm also behind on email today... John ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 19:33 ` John Stoffel @ 2017-09-29 21:04 ` Eli Ben-Shoshan 2017-09-29 21:17 ` John Stoffel 0 siblings, 1 reply; 13+ messages in thread From: Eli Ben-Shoshan @ 2017-09-29 21:04 UTC (permalink / raw) To: John Stoffel; +Cc: linux-raid On 09/29/2017 03:33 PM, John Stoffel wrote: >>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: > > Eli> On 09/29/2017 08:38 AM, John Stoffel wrote: >>>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >>> > Eli> I need to add another disk to my array (/dev/md128) when I accidentally > Eli> did an array resize to 9 with the following command: >>> > Eli> First I add the disk to the array with the following: >>> > Eli> mdadm --manage /dev/md128 --add /dev/sdl >>> > Eli> This was a RAID6 with 8 devices. Instead of using --grow with > Eli> --raid-devices set to 9, I did the following: >>> > Eli> mdadm --grow /dev/md128 --size 9 >>> > Eli> This happily returned without any errors so I went to go look at > Eli> /proc/mdstat and did not see a resize operation going. So I shook my > Eli> head and read the output of --grow --help and did the right thing which is: >>> > Eli> mdadm --grow /dev/md128 --raid-devices=9 >>> > Eli> Right after that everything hit the fan. dmesg reported a lot of > Eli> filesystem errors. I quickly stopped all processes that were using this > Eli> device and unmounted the filesystems. I then, stupidly, decided to > Eli> reboot before looking around. >>> >>> >>> I think you *might* be able to fix this with just a simple: >>> >>> mdadm --grow /dev/md128 --size max >>> >>> And then try to scan for your LVM configuration, then fsck your volume >>> on there. I hope you had backups. >>> >>> And maybe there should be a warning when re-sizing raid array elements >>> without a --force option if going smaller than the current size? > > Eli> I just tried that and got the following error: > > Eli> mdadm: Cannot set device size in this type of array > > Eli> Trying to go further down this path, I also tried to set the size > Eli> explicitly with: > > Eli> mdadm --grow /dev/md150 --size 1953383512 > > Eli> but got: > > Eli> mdadm: Cannot set device size in this type of array > > Eli> I am curious if my data is actually still there on disk. > > Eli> What does the --size with --grow actually do? > > It changes the size of each member of the array. The man page > explains it, though not ... obviously. > > Are you still running with the overlays? That would explain why it > can't resize them bigger. But I'm also behind on email today... > > John > I was still using the overlay. I just tried the grow without the overlay and got the same error. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 21:04 ` Eli Ben-Shoshan @ 2017-09-29 21:17 ` John Stoffel 2017-09-29 21:49 ` Eli Ben-Shoshan 0 siblings, 1 reply; 13+ messages in thread From: John Stoffel @ 2017-09-29 21:17 UTC (permalink / raw) To: Eli Ben-Shoshan; +Cc: John Stoffel, linux-raid >>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: Eli> On 09/29/2017 03:33 PM, John Stoffel wrote: >>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >> Eli> On 09/29/2017 08:38 AM, John Stoffel wrote: >>>>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >>>> Eli> I need to add another disk to my array (/dev/md128) when I accidentally Eli> did an array resize to 9 with the following command: >>>> Eli> First I add the disk to the array with the following: >>>> Eli> mdadm --manage /dev/md128 --add /dev/sdl >>>> Eli> This was a RAID6 with 8 devices. Instead of using --grow with Eli> --raid-devices set to 9, I did the following: >>>> Eli> mdadm --grow /dev/md128 --size 9 >>>> Eli> This happily returned without any errors so I went to go look at Eli> /proc/mdstat and did not see a resize operation going. So I shook my Eli> head and read the output of --grow --help and did the right thing which is: >>>> Eli> mdadm --grow /dev/md128 --raid-devices=9 >>>> Eli> Right after that everything hit the fan. dmesg reported a lot of Eli> filesystem errors. I quickly stopped all processes that were using this Eli> device and unmounted the filesystems. I then, stupidly, decided to Eli> reboot before looking around. >>>> >>>> >>>> I think you *might* be able to fix this with just a simple: >>>> >>>> mdadm --grow /dev/md128 --size max >>>> >>>> And then try to scan for your LVM configuration, then fsck your volume >>>> on there. I hope you had backups. >>>> >>>> And maybe there should be a warning when re-sizing raid array elements >>>> without a --force option if going smaller than the current size? >> Eli> I just tried that and got the following error: >> Eli> mdadm: Cannot set device size in this type of array >> Eli> Trying to go further down this path, I also tried to set the size Eli> explicitly with: >> Eli> mdadm --grow /dev/md150 --size 1953383512 >> Eli> but got: >> Eli> mdadm: Cannot set device size in this type of array >> Eli> I am curious if my data is actually still there on disk. >> Eli> What does the --size with --grow actually do? >> >> It changes the size of each member of the array. The man page >> explains it, though not ... obviously. >> >> Are you still running with the overlays? That would explain why it >> can't resize them bigger. But I'm also behind on email today... Eli> I was still using the overlay. I just tried the grow without the overlay Eli> and got the same error. Hmm.. what do the partitions on the disk look like now? You might need to do more digging. But I would say that using --grow and having it *shrink* without any warnings is a bad idea for the mdadm tools. It should scream loudly and only run when forced to like that. Aw crap... you used the whole disk. I don't like doing this because A) if I get a disk slightly *smaller* than what I currently have, it will be painful, B) it's easy to use a small partition starting 4mb from the start and a few hundred Mb (or even a Gb) from the end. In your case, can you try to do the 'mdadm --grow /dev/md### --size max' but with a version of mdadm compiled with debugging info, or at least using the latest version of the code if at all possible. Grab it from https://github.com/neilbrown/mdadm and when you configure it, make sure you enable debugging. Or grab it from https://www.kernel.org/pub/linux/utils/raid/mdadm/ and try the same thing. Can you show the output of: cat /proc/partitions as well? Maybe you need to do: mdadm --grow <dev> --size ######## which is the smallest of the max size of all your disks. Might work... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 21:17 ` John Stoffel @ 2017-09-29 21:49 ` Eli Ben-Shoshan 0 siblings, 0 replies; 13+ messages in thread From: Eli Ben-Shoshan @ 2017-09-29 21:49 UTC (permalink / raw) To: John Stoffel; +Cc: linux-raid On 09/29/2017 05:17 PM, John Stoffel wrote: >>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: > > Eli> On 09/29/2017 03:33 PM, John Stoffel wrote: >>>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >>> > Eli> On 09/29/2017 08:38 AM, John Stoffel wrote: >>>>>>>>>> "Eli" == Eli Ben-Shoshan <eli@benshoshan.com> writes: >>>>> > Eli> I need to add another disk to my array (/dev/md128) when I accidentally > Eli> did an array resize to 9 with the following command: >>>>> > Eli> First I add the disk to the array with the following: >>>>> > Eli> mdadm --manage /dev/md128 --add /dev/sdl >>>>> > Eli> This was a RAID6 with 8 devices. Instead of using --grow with > Eli> --raid-devices set to 9, I did the following: >>>>> > Eli> mdadm --grow /dev/md128 --size 9 >>>>> > Eli> This happily returned without any errors so I went to go look at > Eli> /proc/mdstat and did not see a resize operation going. So I shook my > Eli> head and read the output of --grow --help and did the right thing which is: >>>>> > Eli> mdadm --grow /dev/md128 --raid-devices=9 >>>>> > Eli> Right after that everything hit the fan. dmesg reported a lot of > Eli> filesystem errors. I quickly stopped all processes that were using this > Eli> device and unmounted the filesystems. I then, stupidly, decided to > Eli> reboot before looking around. >>>>> >>>>> >>>>> I think you *might* be able to fix this with just a simple: >>>>> >>>>> mdadm --grow /dev/md128 --size max >>>>> >>>>> And then try to scan for your LVM configuration, then fsck your volume >>>>> on there. I hope you had backups. >>>>> >>>>> And maybe there should be a warning when re-sizing raid array elements >>>>> without a --force option if going smaller than the current size? >>> > Eli> I just tried that and got the following error: >>> > Eli> mdadm: Cannot set device size in this type of array >>> > Eli> Trying to go further down this path, I also tried to set the size > Eli> explicitly with: >>> > Eli> mdadm --grow /dev/md150 --size 1953383512 >>> > Eli> but got: >>> > Eli> mdadm: Cannot set device size in this type of array >>> > Eli> I am curious if my data is actually still there on disk. >>> > Eli> What does the --size with --grow actually do? >>> >>> It changes the size of each member of the array. The man page >>> explains it, though not ... obviously. >>> >>> Are you still running with the overlays? That would explain why it >>> can't resize them bigger. But I'm also behind on email today... > > > Eli> I was still using the overlay. I just tried the grow without the overlay > Eli> and got the same error. > > Hmm.. what do the partitions on the disk look like now? You might > need to do more digging. But I would say that using --grow and having > it *shrink* without any warnings is a bad idea for the mdadm tools. > It should scream loudly and only run when forced to like that. > > Aw crap... you used the whole disk. I don't like doing this because > A) if I get a disk slightly *smaller* than what I currently have, it > will be painful, B) it's easy to use a small partition starting 4mb > from the start and a few hundred Mb (or even a Gb) from the end. > > In your case, can you try to do the 'mdadm --grow /dev/md### --size > max' but with a version of mdadm compiled with debugging info, or at > least using the latest version of the code if at all possible. > > Grab it from https://github.com/neilbrown/mdadm and when you > configure it, make sure you enable debugging. Or grab it from > https://www.kernel.org/pub/linux/utils/raid/mdadm/ and try the same > thing. > > Can you show the output of: cat /proc/partitions as well? Maybe you > need to do: > > mdadm --grow <dev> --size ######## > > which is the smallest of the max size of all your disks. Might > work... > ganon mdadm-4.0 # cat /proc/partitions major minor #blocks name 1 0 8192 ram0 1 1 8192 ram1 1 2 8192 ram2 1 3 8192 ram3 1 4 8192 ram4 1 5 8192 ram5 1 6 8192 ram6 1 7 8192 ram7 1 8 8192 ram8 1 9 8192 ram9 1 10 8192 ram10 1 11 8192 ram11 1 12 8192 ram12 1 13 8192 ram13 1 14 8192 ram14 1 15 8192 ram15 8 0 234431064 sda 8 1 262144 sda1 8 2 204472320 sda2 8 3 2097152 sda3 8 16 234431064 sdb 8 17 262144 sdb1 8 18 204472320 sdb2 8 19 2097152 sdb3 8 32 1953514584 sdc 8 48 1953514584 sdd 8 64 1953514584 sde 8 80 1953514584 sdf 8 96 1953514584 sdg 9 126 262080 md126 9 127 204341248 md127 252 0 1572864 dm-0 252 1 9437184 dm-1 252 2 4194304 dm-2 252 3 16777216 dm-3 252 4 25165824 dm-4 252 5 7340032 dm-5 252 6 6291456 dm-6 252 7 53477376 dm-7 252 8 1048576 dm-8 8 112 1953514584 sdh 8 128 1953514584 sdi 8 144 1953514584 sdj 8 145 4194304 sdj1 8 146 524288 sdj2 8 147 1948793912 sdj3 8 160 1953514584 sdk 8 176 1953514584 sdl 8 192 1953514584 sdm 8 193 4194304 sdm1 8 194 524288 sdm2 8 195 1948793912 sdm3 9 131 1948662656 md131 9 129 4192192 md129 9 130 523968 md130 I got mdadm-4.0 compile with debug flags. Here is the output starting with --assemble --scan ganon mdadm-4.0 # ./mdadm --assemble --scan mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:5/end_device-14:5/target14:0:5/14:0:5:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:4/end_device-14:4/target14:0:4/14:0:4:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:3/end_device-14:3/target14:0:3/14:0:3:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:2/end_device-14:2/target14:0:2/14:0:2:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:1/end_device-14:1/target14:0:1/14:0:1:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:0/end_device-14:0/target14:0:0/14:0:0:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/ata13/host12/target12:0:0/12:0:0:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata10/host9/target9:0:0/9:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata9/host8/target8:0:0/8:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata8/host7/target7:0:0/7:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata7/host6/target6:0:0/6:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: start_array: /dev/md128 has been started with 9 drives. mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:5/end_device-14:5/target14:0:5/14:0:5:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:4/end_device-14:4/target14:0:4/14:0:4:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:3/end_device-14:3/target14:0:3/14:0:3:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:2/end_device-14:2/target14:0:2/14:0:2:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:1/end_device-14:1/target14:0:1/14:0:1:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host14/port-14:0/end_device-14:0/target14:0:0/14:0:0:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/ata13/host12/target12:0:0/12:0:0:0 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata10/host9/target9:0:0/9:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata9/host8/target8:0:0/8:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata8/host7/target7:0:0/7:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata7/host6/target6:0:0/6:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 mdadm: path_attached_to_hba: hba: /sys/devices/pci0000:00/0000:00:1f.2 - disk: /sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0 mdadm: scan: ptr->vendorID: 1103 __le16_to_cpu(ptr->deviceID): 2720 and now an attempt to --grow with --size max: ganon mdadm-4.0 # ./mdadm --grow /dev/md128 --size max mdadm: Grow_reshape: Cannot set device size in this type of array. I am not using overlays with the above commands. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 4:23 Accidentally resized array to 9 Eli Ben-Shoshan 2017-09-29 12:38 ` John Stoffel @ 2017-09-29 12:55 ` Roman Mamedov 2017-09-29 14:53 ` Eli Ben-Shoshan 1 sibling, 1 reply; 13+ messages in thread From: Roman Mamedov @ 2017-09-29 12:55 UTC (permalink / raw) To: Eli Ben-Shoshan; +Cc: linux-raid, John Stoffel On Fri, 29 Sep 2017 00:23:28 -0400 Eli Ben-Shoshan <eli@benshoshan.com> wrote: > This was a RAID6 with 8 devices. Instead of using --grow with > --raid-devices set to 9, I did the following: > > mdadm --grow /dev/md128 --size 9 > > This happily returned without any errors so I went to go look at > /proc/mdstat and did not see a resize operation going. So I shook my > head and read the output of --grow --help and did the right thing which is: > > mdadm --grow /dev/md128 --raid-devices=9 The output of the first command is: # mdadm --grow /dev/md0 --size 9 mdadm: component size of /dev/md0 has been set to 9K unfreeze It didn't occur to you that you FIRST need to restore the "component size" back to what it was previously?... And yes as John says there should be a confirmation request on reducing the array size. In fact I couldn't believe there isn't one already, that's why I went checking. But nope, there are no warnings or confirmation requests neither for reducing --size, nor --array-size. I might be remembering that there were some, from LVM, not MD. -- With respect, Roman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 12:55 ` Roman Mamedov @ 2017-09-29 14:53 ` Eli Ben-Shoshan 2017-09-29 19:50 ` Roman Mamedov 0 siblings, 1 reply; 13+ messages in thread From: Eli Ben-Shoshan @ 2017-09-29 14:53 UTC (permalink / raw) To: Roman Mamedov; +Cc: linux-raid, John Stoffel On 09/29/2017 08:55 AM, Roman Mamedov wrote: > On Fri, 29 Sep 2017 00:23:28 -0400 > Eli Ben-Shoshan <eli@benshoshan.com> wrote: > >> This was a RAID6 with 8 devices. Instead of using --grow with >> --raid-devices set to 9, I did the following: >> >> mdadm --grow /dev/md128 --size 9 >> >> This happily returned without any errors so I went to go look at >> /proc/mdstat and did not see a resize operation going. So I shook my >> head and read the output of --grow --help and did the right thing which is: >> >> mdadm --grow /dev/md128 --raid-devices=9 > > The output of the first command is: > > # mdadm --grow /dev/md0 --size 9 > mdadm: component size of /dev/md0 has been set to 9K > unfreeze > > It didn't occur to you that you FIRST need to restore the "component size" back > to what it was previously?... I am not sure that I actually got any response at all from setting --size. I am running version: mdadm - v3.4 - 28th January 2016 If I did get output, I totally missed it. I get that this is my fault for using the wrong commmand and any data loss is totally due to my stupidity. I am just hoping that there might be a way that I can get the data back. > > And yes as John says there should be a confirmation request on reducing the > array size. In fact I couldn't believe there isn't one already, that's why I > went checking. But nope, there are no warnings or confirmation requests > neither for reducing --size, nor --array-size. I might be remembering that > there were some, from LVM, not MD. > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 14:53 ` Eli Ben-Shoshan @ 2017-09-29 19:50 ` Roman Mamedov 2017-09-30 16:21 ` Phil Turmel 0 siblings, 1 reply; 13+ messages in thread From: Roman Mamedov @ 2017-09-29 19:50 UTC (permalink / raw) To: Eli Ben-Shoshan; +Cc: linux-raid, John Stoffel On Fri, 29 Sep 2017 10:53:57 -0400 Eli Ben-Shoshan <eli@benshoshan.com> wrote: > I am just hoping that there might be a way that I can get the > data back. In theory what you did was cut the array size to only use 9 KB of each device, then reshaped THAT tiny array from 8 to 9 devices, with the rest left completely untouched. So you could try removing the "new" disk, then try --create --assume-clean with old devices only and --raid-devices=8. But I'm not sure how you would get the device order right. Ideally what you can hope for, is you would get the bulk of array data intact, only with the first 9 KB of each device *(8-2), so about the first 54 KB of data on the md array, corrupted and unusable. It is likely the LVM and filesystem tools will not recognize anything due to that, so you will need to use some data recovery software to look for and save the data. -- With respect, Roman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-29 19:50 ` Roman Mamedov @ 2017-09-30 16:21 ` Phil Turmel 2017-09-30 16:29 ` Roman Mamedov 2017-09-30 23:30 ` John Stoffel 0 siblings, 2 replies; 13+ messages in thread From: Phil Turmel @ 2017-09-30 16:21 UTC (permalink / raw) To: Roman Mamedov, Eli Ben-Shoshan; +Cc: linux-raid, John Stoffel On 09/29/2017 03:50 PM, Roman Mamedov wrote: > On Fri, 29 Sep 2017 10:53:57 -0400 > Eli Ben-Shoshan <eli@benshoshan.com> wrote: > >> I am just hoping that there might be a way that I can get the >> data back. > > In theory what you did was cut the array size to only use 9 KB of each device, > then reshaped THAT tiny array from 8 to 9 devices, with the rest left > completely untouched. > > So you could try removing the "new" disk, then try --create --assume-clean > with old devices only and --raid-devices=8. > > But I'm not sure how you would get the device order right. > > Ideally what you can hope for, is you would get the bulk of array data intact, > only with the first 9 KB of each device *(8-2), so about the first 54 KB of > data on the md array, corrupted and unusable. It is likely the LVM and > filesystem tools will not recognize anything due to that, so you will need to > use some data recovery software to look for and save the data. > I agree with Roman. Most of your array should be still on the 8-disk layout. But you were mounted and had writing processes immediately after the broken grow, so there's probably other corruption due to writes on the 9-disk pattern in the 8 disks. Roman's suggestion is the best plan, but even after restoring LVM, expect breakage all over. Use overlays. Phil ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-30 16:21 ` Phil Turmel @ 2017-09-30 16:29 ` Roman Mamedov 2017-09-30 23:30 ` John Stoffel 1 sibling, 0 replies; 13+ messages in thread From: Roman Mamedov @ 2017-09-30 16:29 UTC (permalink / raw) To: Phil Turmel; +Cc: Eli Ben-Shoshan, linux-raid, John Stoffel On Sat, 30 Sep 2017 12:21:20 -0400 Phil Turmel <philip@turmel.org> wrote: > > Ideally what you can hope for, is you would get the bulk of array data intact, > > only with the first 9 KB of each device *(8-2), so about the first 54 KB of > > data on the md array, corrupted and unusable. It is likely the LVM and > > filesystem tools will not recognize anything due to that, so you will need to > > use some data recovery software to look for and save the data. > > > > I agree with Roman. Most of your array should be still on the 8-disk > layout. But you were mounted and had writing processes immediately > after the broken grow, so there's probably other corruption due to > writes on the 9-disk pattern in the 8 disks. Adding one afterthought that I had, you could probably salvage the 54 KB in question by reading them in (and saving to a file) from the current "9-device array of 9KB devices" that you got. With respect, Roman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Accidentally resized array to 9 2017-09-30 16:21 ` Phil Turmel 2017-09-30 16:29 ` Roman Mamedov @ 2017-09-30 23:30 ` John Stoffel 1 sibling, 0 replies; 13+ messages in thread From: John Stoffel @ 2017-09-30 23:30 UTC (permalink / raw) To: Phil Turmel; +Cc: Roman Mamedov, Eli Ben-Shoshan, linux-raid, John Stoffel >>>>> "Phil" == Phil Turmel <philip@turmel.org> writes: Phil> On 09/29/2017 03:50 PM, Roman Mamedov wrote: >> On Fri, 29 Sep 2017 10:53:57 -0400 >> Eli Ben-Shoshan <eli@benshoshan.com> wrote: >> >>> I am just hoping that there might be a way that I can get the >>> data back. >> >> In theory what you did was cut the array size to only use 9 KB of each device, >> then reshaped THAT tiny array from 8 to 9 devices, with the rest left >> completely untouched. >> >> So you could try removing the "new" disk, then try --create --assume-clean >> with old devices only and --raid-devices=8. >> >> But I'm not sure how you would get the device order right. >> >> Ideally what you can hope for, is you would get the bulk of array data intact, >> only with the first 9 KB of each device *(8-2), so about the first 54 KB of >> data on the md array, corrupted and unusable. It is likely the LVM and >> filesystem tools will not recognize anything due to that, so you will need to >> use some data recovery software to look for and save the data. >> Phil> I agree with Roman. Most of your array should be still on the 8-disk Phil> layout. But you were mounted and had writing processes immediately Phil> after the broken grow, so there's probably other corruption due to Phil> writes on the 9-disk pattern in the 8 disks. Phil> Roman's suggestion is the best plan, but even after restoring LVM, Phil> expect breakage all over. Use overlays. Maybe the answer is to remove the added disk, setup an overlay on the eight remaining disks, and then try to do mdadm --create ... on each of the permutations. Then you would bring up the LVs on there and see if you can fsck them and get some data back. I think the grow isn't going to work, it's really quite hosed at this point. If I find some time, I think I'll try to spin up a patch to mdadm to hopefully stop issues like this from happening, by stopping a --size to a smaller size with an explicit confirmation being asked, or overridden by a flag to force the shrink. Since it's so damn painful. I don't have alot of hope here for you unfortunately. I think you're now in the stage where a --create using the original eight disks is the way to go. You *might* be able to find RAID backups at some offset into the disks to tell you what order each disk is in. So the steps, roughly, would be: 1. stop /dev/md127 2. remove the new disk. 3. setup overlays again. 4. mdadm --create /dev/md127 --level 6 -n 8 /dev/sd{cdefghij} 5. vgchange -ay /dev/md127 6. lvs 7. fsck .... 8. if nothing, loop back to step four with a different order of devices. If you have any output from before of /proc/mdstat, that would be helpful, as would a mapping of device name (/dev/sd*) to serial number. [ There's a neat script called 'lsdrv' which you can grab here (https://github.com/pturmel/lsdrv) to grab and show all this data. But it's busted for lvcache devices. Oops! Time for more hacking! ] But I hate to say... I suspect you're toast. But don't listen to me. John ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-09-30 23:30 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-29 4:23 Accidentally resized array to 9 Eli Ben-Shoshan 2017-09-29 12:38 ` John Stoffel 2017-09-29 14:47 ` Eli Ben-Shoshan 2017-09-29 19:33 ` John Stoffel 2017-09-29 21:04 ` Eli Ben-Shoshan 2017-09-29 21:17 ` John Stoffel 2017-09-29 21:49 ` Eli Ben-Shoshan 2017-09-29 12:55 ` Roman Mamedov 2017-09-29 14:53 ` Eli Ben-Shoshan 2017-09-29 19:50 ` Roman Mamedov 2017-09-30 16:21 ` Phil Turmel 2017-09-30 16:29 ` Roman Mamedov 2017-09-30 23:30 ` John Stoffel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.