* Failed adadm RAID array after aborted Grown operation @ 2022-05-08 13:18 Bob Brand 2022-05-08 15:32 ` Wols Lists 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-08 13:18 UTC (permalink / raw) To: linux-raid Hi, Im somewhat new to Linux and mdadm although Ive certainly learnt a lot over the last 24 hours. I have a SuperMicro server running CentOS 7 (3.10.0-1160.11.1.e17.x86_64) with version 4.1 2018-10-01 of mdadm with that was happily running with 30 8TB disk in a RAID6 configuration. (It also has boot and root on a RAID1 array the RAID6 array being solely for data.) It was however starting to run out of space and I investigated adding more drives to the array (it can hold a total of 45 drives). Since this device is no longer under support, obtaining the same drives as it already contained wasnt an option and the supplier couldnt guarantee that they could supply compatible drives. We did come to an arrangement where I would try one drive and, if it didnt work, I could return any unopened units. I spent ages ensuring that the ones hed suggested were as compatible as possible and I based the specs of the existing drives off the invoice for the entire system. This turned out to be a mistake as the invoice stated they were 512e drives but, as I discovered after the new drives had arrived and I was doing a final check the existing were actually 4096k drives. Of course the new drives were 512e. Bother! After a lot more reading I found out that it might be possible to reformat the new drives from 512e to 4096k using sg_format. I installed the test drive and proceeded to see if it was possible to format them to 4096k using the command sg_format size=4096 /dev/sd<x>. All was proceeding smoothly when my ssh session terminated due a faulty docking station killing my Ethernet connection. So I logged onto the console and restarted the sg_format which completed OK, sort of it did convert the disk to 4096k but it did throw an I/O error or two but they didnt seem too concerning and I figured, if there was a problem, it would show up in the next couple of steps. Ive since discovered the dmesg log and that indicated that there were significantly more I/O errors than I thought. Anyway, since sg_format appeared to complete OK, I moved onto the next stage which was to partition the disk with the following commands parted -a optimal /dev/sd<x> (parted) mklabel msdos (parted) mkpart primary 2048s 100% (need to check that the start is correct) (parted) align-check optimal 1 (verify alignment of partition 1) (parted) set 1 raid on (set the FLAG to RAID) (parted) print Unfortunately, I dont have the results of the print command as my laptop unexpectedly shut down over night (it hasnt been a good weekend) but the partitioning appeared to complete without incident. I then added the new disk to the array: mdadm --add /dev/md125 /dev/sd<x> And it completed without any problems. I then proceeded to grow the array: mdadm --grow --raid-devices=31 --backup-file=/grow_md125.bak /dev/md125 I monitored this with cat /proc/mdstat and it showed that it was reshaping but the speed was 0K/sec and the reshape didnt progress from 0%. #cat /proc/mdstat produced: Personalities : [raid1] [raid6] [raid5] [raid4] md125 : active raid6 sdab1[30] sdw1[26] sdc1[6] sdm1[16] sdi1[12] sdz1[29] sdh1[11] sdg1[10] sds1[22] sdf1[9] sdq1[20] sdaa1[1] sdo1[18] sdu1[24] sdb1[5] sdae1[4] sdl1[15] sdj1[13] sdn1[17] sdp1[19] sdv1[25] sde1[8] sdd1[7] sdr1[21] sdt1[23] sdx1[27] sdad1[3] sdac1[2] sdy1[28] sda1[0] sdk1[14] 218789036032 blocks super 1.2 level 6, 512k chunk, algorithm 2 [31/31] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] [>....................] reshape = 0.0% (1/7813894144) finish=328606806584.3min speed=0K/sec bitmap: 0/59 pages [0KB], 65536KB chunk md126 : active raid1 sdaf1[0] sdag1[1] 100554752 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sdaf3[0] sdag2[1] 976832 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md125 produced: /dev/md125: Version : 1.2 Creation Time : Wed Sep 13 15:09:40 2017 Raid Level : raid6 Array Size : 218789036032 (203.76 TiB 224.04 TB) Used Dev Size : 7813894144 (7.28 TiB 8.00 TB) Raid Devices : 31 Total Devices : 31 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun May 8 00:47:35 2022 State : clean, reshaping Active Devices : 31 Working Devices : 31 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : bitmap Reshape Status : 0% complete Delta Devices : 1, (30->31) Name : localhost.localdomain:SW-RAID6 UUID : f9b65f55:5f257add:1140ccc0:46ca6c19 Events : 1053617 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 65 161 1 active sync /dev/sdaa1 2 65 193 2 active sync /dev/sdac1 3 65 209 3 active sync /dev/sdad1 4 65 225 4 active sync /dev/sdae1 5 8 17 5 active sync /dev/sdb1 6 8 33 6 active sync /dev/sdc1 7 8 49 7 active sync /dev/sdd1 8 8 65 8 active sync /dev/sde1 9 8 81 9 active sync /dev/sdf1 10 8 97 10 active sync /dev/sdg1 11 8 113 11 active sync /dev/sdh1 12 8 129 12 active sync /dev/sdi1 13 8 145 13 active sync /dev/sdj1 14 8 161 14 active sync /dev/sdk1 15 8 177 15 active sync /dev/sdl1 16 8 193 16 active sync /dev/sdm1 17 8 209 17 active sync /dev/sdn1 18 8 225 18 active sync /dev/sdo1 19 8 241 19 active sync /dev/sdp1 20 65 1 20 active sync /dev/sdq1 21 65 17 21 active sync /dev/sdr1 22 65 33 22 active sync /dev/sds1 23 65 49 23 active sync /dev/sdt1 24 65 65 24 active sync /dev/sdu1 25 65 81 25 active sync /dev/sdv1 26 65 97 26 active sync /dev/sdw1 27 65 113 27 active sync /dev/sdx1 28 65 129 28 active sync /dev/sdy1 29 65 145 29 active sync /dev/sdz1 30 65 177 30 active sync /dev/sdab1 NOTE: the new disk is /dev/sdab About 12 hours later, as the reshape hadnt progressed from 0%, I looked at ways of aborting it, such as mdadm --stop /dev/md125 which didn't work so I ended up rebooting the server and this is where things really went pear-shaped. The server came up in emergency mode, which I found odd given that the boot and root should have been OK. I was able to log on as root OK but the RAID6 array ws stuck in the reshape state. I then tried mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125 and this produced: mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got <varying numbers> mdadm: No RAID super block on /dev/sde . . mdadm: /dev/sde1 is identified as a member of /dev/md125, slot 6 . . mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /grow_md125.back mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. I've tried difference variations on this including mdadm --assemble --invalid-backup --force but I won't include all the different commands here because I'm having to type all this since I can't copy anything off the server while it's in Emergency Mode. I have also removed the suspect disk but this hasn't made any difference. But the closest I've come to fixing this is running mdadm /dev/md125 --assemble --invalid-backup --backup-file=/grow_md125.bak --verbose /dev/sdc1 /dev/sdd1 ....... /dev/sdaf1 and this produces: . . . mdadm: /dev/sdaf1 is identified as a member of /dev/md125, slot 4. mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /grow_md125.back mdadm: Failed to find backup of critical section mdadm: continuing without restoring backup mdadm: added /dev/sdac1 to /dev/md125 as 1 . . . mdadm: failed to RUN_ARRAY /dev/md125: Invalid argument dmesg has this information: md: md125 stopped. md/raid:md125: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... md: md125 stopped. If youve stuck with me and read all this way, thank you and I hope you can help me. Regards, Bob Brand ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-08 13:18 Failed adadm RAID array after aborted Grown operation Bob Brand @ 2022-05-08 15:32 ` Wols Lists 2022-05-08 22:04 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Wols Lists @ 2022-05-08 15:32 UTC (permalink / raw) To: Bob Brand, linux-raid; +Cc: Phil Turmel On 08/05/2022 14:18, Bob Brand wrote: > If you’ve stuck with me and read all this way, thank you and I hope you > can help me. https://raid.wiki.kernel.org/index.php/Linux_Raid Especially https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn What you need to do is revert the reshape. I know what may have happened, and what bothers me is your kernel version, 3.10. The first thing to try is to boot from up-to-date rescue media and see if an mdadm --revert works from there. If it does, your Centos should then bring everything back no problem. (You've currently got what I call a Frankensetup, a very old kernel, a pretty new mdadm, and a whole bunch of patches that does who knows what. You really need a matching kernel and mdadm, and your frankenkernel won't match anything ...) Let us know how that goes ... Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-08 15:32 ` Wols Lists @ 2022-05-08 22:04 ` Bob Brand 2022-05-08 22:15 ` Wol 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-08 22:04 UTC (permalink / raw) To: Wols Lists, linux-raid; +Cc: Phil Turmel Thank Wol. Should I use a CentOS 7 disk or a CentOS disk? Thanks -----Original Message----- From: Wols Lists <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 1:32 AM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: Re: Failed adadm RAID array after aborted Grown operation On 08/05/2022 14:18, Bob Brand wrote: > If you’ve stuck with me and read all this way, thank you and I hope > you can help me. https://raid.wiki.kernel.org/index.php/Linux_Raid Especially https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn What you need to do is revert the reshape. I know what may have happened, and what bothers me is your kernel version, 3.10. The first thing to try is to boot from up-to-date rescue media and see if an mdadm --revert works from there. If it does, your Centos should then bring everything back no problem. (You've currently got what I call a Frankensetup, a very old kernel, a pretty new mdadm, and a whole bunch of patches that does who knows what. You really need a matching kernel and mdadm, and your frankenkernel won't match anything ...) Let us know how that goes ... Cheers, Wol CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-08 22:04 ` Bob Brand @ 2022-05-08 22:15 ` Wol 2022-05-08 22:19 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Wol @ 2022-05-08 22:15 UTC (permalink / raw) To: Bob Brand, linux-raid; +Cc: Phil Turmel How old is CentOS 7? With that kernel I guess it's quite old? Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. Cheers, Wol On 08/05/2022 23:04, Bob Brand wrote: > Thank Wol. > > Should I use a CentOS 7 disk or a CentOS disk? > > Thanks > > -----Original Message----- > From: Wols Lists <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 1:32 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > On 08/05/2022 14:18, Bob Brand wrote: >> If you’ve stuck with me and read all this way, thank you and I hope >> you can help me. > > https://raid.wiki.kernel.org/index.php/Linux_Raid > > Especially > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > What you need to do is revert the reshape. I know what may have happened, > and what bothers me is your kernel version, 3.10. > > The first thing to try is to boot from up-to-date rescue media and see if an > mdadm --revert works from there. If it does, your Centos should then bring > everything back no problem. > > (You've currently got what I call a Frankensetup, a very old kernel, a > pretty new mdadm, and a whole bunch of patches that does who knows what. > You really need a matching kernel and mdadm, and your frankenkernel won't > match anything ...) > > Let us know how that goes ... > > Cheers, > Wol > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-08 22:15 ` Wol @ 2022-05-08 22:19 ` Bob Brand 2022-05-08 23:02 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-08 22:19 UTC (permalink / raw) To: Wol, linux-raid; +Cc: Phil Turmel OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to be the most recent they have. -----Original Message----- From: Wol <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 8:16 AM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: Re: Failed adadm RAID array after aborted Grown operation How old is CentOS 7? With that kernel I guess it's quite old? Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. Cheers, Wol On 08/05/2022 23:04, Bob Brand wrote: > Thank Wol. > > Should I use a CentOS 7 disk or a CentOS disk? > > Thanks > > -----Original Message----- > From: Wols Lists <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 1:32 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > On 08/05/2022 14:18, Bob Brand wrote: >> If you’ve stuck with me and read all this way, thank you and I hope >> you can help me. > > https://raid.wiki.kernel.org/index.php/Linux_Raid > > Especially > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > What you need to do is revert the reshape. I know what may have > happened, and what bothers me is your kernel version, 3.10. > > The first thing to try is to boot from up-to-date rescue media and see > if an mdadm --revert works from there. If it does, your Centos should > then bring everything back no problem. > > (You've currently got what I call a Frankensetup, a very old kernel, a > pretty new mdadm, and a whole bunch of patches that does who knows what. > You really need a matching kernel and mdadm, and your frankenkernel > won't match anything ...) > > Let us know how that goes ... > > Cheers, > Wol > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-08 22:19 ` Bob Brand @ 2022-05-08 23:02 ` Bob Brand 2022-05-08 23:32 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-08 23:02 UTC (permalink / raw) To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel Hi Wol, I've booted to the installation media and I've run the following command: mdadm /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 But I'm still getting the error: mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /mnt/sysimage/grow_md125.back mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. Should I try the --invalid_backup switch or --force? Thanks, Bob -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 8:19 AM To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to be the most recent they have. -----Original Message----- From: Wol <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 8:16 AM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: Re: Failed adadm RAID array after aborted Grown operation How old is CentOS 7? With that kernel I guess it's quite old? Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. Cheers, Wol On 08/05/2022 23:04, Bob Brand wrote: > Thank Wol. > > Should I use a CentOS 7 disk or a CentOS disk? > > Thanks > > -----Original Message----- > From: Wols Lists <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 1:32 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > On 08/05/2022 14:18, Bob Brand wrote: >> If you’ve stuck with me and read all this way, thank you and I hope >> you can help me. > > https://raid.wiki.kernel.org/index.php/Linux_Raid > > Especially > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > What you need to do is revert the reshape. I know what may have > happened, and what bothers me is your kernel version, 3.10. > > The first thing to try is to boot from up-to-date rescue media and see > if an mdadm --revert works from there. If it does, your Centos should > then bring everything back no problem. > > (You've currently got what I call a Frankensetup, a very old kernel, a > pretty new mdadm, and a whole bunch of patches that does who knows what. > You really need a matching kernel and mdadm, and your frankenkernel > won't match anything ...) > > Let us know how that goes ... > > Cheers, > Wol > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-08 23:02 ` Bob Brand @ 2022-05-08 23:32 ` Bob Brand 2022-05-09 0:09 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-08 23:32 UTC (permalink / raw) To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel I just tried it again with the --invalid_backup switch and it's now showing the State as "clean, degraded".and it's showing all the disks except for the suspect one that I removed. I'm unable to mount it and see the contents. I get the error "mount: /dev/md125: can't read superblock." Is there more that I need to do? Thanks -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 9:02 AM To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation Hi Wol, I've booted to the installation media and I've run the following command: mdadm /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 But I'm still getting the error: mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /mnt/sysimage/grow_md125.back mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. Should I try the --invalid_backup switch or --force? Thanks, Bob -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 8:19 AM To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to be the most recent they have. -----Original Message----- From: Wol <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 8:16 AM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: Re: Failed adadm RAID array after aborted Grown operation How old is CentOS 7? With that kernel I guess it's quite old? Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. Cheers, Wol On 08/05/2022 23:04, Bob Brand wrote: > Thank Wol. > > Should I use a CentOS 7 disk or a CentOS disk? > > Thanks > > -----Original Message----- > From: Wols Lists <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 1:32 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > On 08/05/2022 14:18, Bob Brand wrote: >> If you’ve stuck with me and read all this way, thank you and I hope >> you can help me. > > https://raid.wiki.kernel.org/index.php/Linux_Raid > > Especially > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > What you need to do is revert the reshape. I know what may have > happened, and what bothers me is your kernel version, 3.10. > > The first thing to try is to boot from up-to-date rescue media and see > if an mdadm --revert works from there. If it does, your Centos should > then bring everything back no problem. > > (You've currently got what I call a Frankensetup, a very old kernel, a > pretty new mdadm, and a whole bunch of patches that does who knows what. > You really need a matching kernel and mdadm, and your frankenkernel > won't match anything ...) > > Let us know how that goes ... > > Cheers, > Wol > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-08 23:32 ` Bob Brand @ 2022-05-09 0:09 ` Bob Brand 2022-05-09 6:52 ` Wols Lists 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-09 0:09 UTC (permalink / raw) To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel Hi Wol, My apologies for continually bothering you but I have a couple of questions: 1. How do I overcome the error message "mount: /dev/md125: can't read superblock." Do it use fsck? 2. The removed disk is showing as " - 0 0 30 removed". Is it safe to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to overcome this? Thank you! -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 9:33 AM To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation I just tried it again with the --invalid_backup switch and it's now showing the State as "clean, degraded".and it's showing all the disks except for the suspect one that I removed. I'm unable to mount it and see the contents. I get the error "mount: /dev/md125: can't read superblock." Is there more that I need to do? Thanks -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 9:02 AM To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation Hi Wol, I've booted to the installation media and I've run the following command: mdadm /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 But I'm still getting the error: mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /mnt/sysimage/grow_md125.back mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. Should I try the --invalid_backup switch or --force? Thanks, Bob -----Original Message----- From: Bob Brand <brand@wmawater.com.au> Sent: Monday, 9 May 2022 8:19 AM To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: RE: Failed adadm RAID array after aborted Grown operation OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to be the most recent they have. -----Original Message----- From: Wol <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 8:16 AM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org> Subject: Re: Failed adadm RAID array after aborted Grown operation How old is CentOS 7? With that kernel I guess it's quite old? Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. Cheers, Wol On 08/05/2022 23:04, Bob Brand wrote: > Thank Wol. > > Should I use a CentOS 7 disk or a CentOS disk? > > Thanks > > -----Original Message----- > From: Wols Lists <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 1:32 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > On 08/05/2022 14:18, Bob Brand wrote: >> If you’ve stuck with me and read all this way, thank you and I hope >> you can help me. > > https://raid.wiki.kernel.org/index.php/Linux_Raid > > Especially > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > What you need to do is revert the reshape. I know what may have > happened, and what bothers me is your kernel version, 3.10. > > The first thing to try is to boot from up-to-date rescue media and see > if an mdadm --revert works from there. If it does, your Centos should > then bring everything back no problem. > > (You've currently got what I call a Frankensetup, a very old kernel, a > pretty new mdadm, and a whole bunch of patches that does who knows what. > You really need a matching kernel and mdadm, and your frankenkernel > won't match anything ...) > > Let us know how that goes ... > > Cheers, > Wol > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-09 0:09 ` Bob Brand @ 2022-05-09 6:52 ` Wols Lists 2022-05-09 13:07 ` Bob Brand [not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com> 0 siblings, 2 replies; 23+ messages in thread From: Wols Lists @ 2022-05-09 6:52 UTC (permalink / raw) To: Bob Brand, linux-raid; +Cc: Phil Turmel, NeilBrown On 09/05/2022 01:09, Bob Brand wrote: > Hi Wol, > > My apologies for continually bothering you but I have a couple of questions: Did you read the links I sent you? > > 1. How do I overcome the error message "mount: /dev/md125: can't read > superblock." Do it use fsck? > > 2. The removed disk is showing as " - 0 0 30 removed". Is it safe > to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to > overcome this? I don't know :-( This is getting a bit out of my depth. But I'm SERIOUSLY concerned you're still futzing about with CentOS 7!!! Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or the latest Fedora? Why didn't you download SUSE SLES 15? Any and all CentOS 7 will come with either an out-of-date mdadm, or a Frankenkernel. NEITHER are a good idea. Go back to the links I gave you, download and run lsdrv, and post the output here. Hopefully somebody will tell you the next steps. I will do my best. > > Thank you! > Cheers, Wol > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:33 AM > To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; > linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > I just tried it again with the --invalid_backup switch and it's now showing > the State as "clean, degraded".and it's showing all the disks except for the > suspect one that I removed. > > I'm unable to mount it and see the contents. I get the error "mount: > /dev/md125: can't read superblock." > > Is there more that I need to do? > > Thanks > > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:02 AM > To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; > linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > Hi Wol, > > I've booted to the installation media and I've run the following command: > > mdadm > /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak > --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 > /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak > --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 > > But I'm still getting the error: > > mdadm: /dev/md125 has an active reshape - checking if critical section needs > to be restored > mdadm: No backup metadata on /mnt/sysimage/grow_md125.back > mdadm: Failed to find backup of critical section > mdadm: Failed to restore critical section for reshape, sorry. > > > Should I try the --invalid_backup switch or --force? > > Thanks, > Bob > > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 8:19 AM > To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to > be the most recent they have. > > > -----Original Message----- > From: Wol <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 8:16 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > How old is CentOS 7? With that kernel I guess it's quite old? > > Try and get a CentOS 8.5 disk. At the end of the day, the version of linux > doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are using the > latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), but > almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd try > starting it again with the rescue disk. It'll probably run fine. Let it > complete and then your old CentOS 7 will be fine with it. > > Cheers, > Wol > > On 08/05/2022 23:04, Bob Brand wrote: >> Thank Wol. >> >> Should I use a CentOS 7 disk or a CentOS disk? >> >> Thanks >> >> -----Original Message----- >> From: Wols Lists <antlists@youngman.org.uk> >> Sent: Monday, 9 May 2022 1:32 AM >> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org >> Cc: Phil Turmel <philip@turmel.org> >> Subject: Re: Failed adadm RAID array after aborted Grown operation >> >> On 08/05/2022 14:18, Bob Brand wrote: >>> If you’ve stuck with me and read all this way, thank you and I hope >>> you can help me. >> >> https://raid.wiki.kernel.org/index.php/Linux_Raid >> >> Especially >> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >> >> What you need to do is revert the reshape. I know what may have >> happened, and what bothers me is your kernel version, 3.10. >> >> The first thing to try is to boot from up-to-date rescue media and see >> if an mdadm --revert works from there. If it does, your Centos should >> then bring everything back no problem. >> >> (You've currently got what I call a Frankensetup, a very old kernel, a >> pretty new mdadm, and a whole bunch of patches that does who knows what. >> You really need a matching kernel and mdadm, and your frankenkernel >> won't match anything ...) >> >> Let us know how that goes ... >> >> Cheers, >> Wol >> >> >> >> CAUTION!!! This E-mail originated from outside of WMA Water. Do not >> click links or open attachments unless you recognize the sender and >> know the content is safe. >> >> > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-09 6:52 ` Wols Lists @ 2022-05-09 13:07 ` Bob Brand [not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com> 1 sibling, 0 replies; 23+ messages in thread From: Bob Brand @ 2022-05-09 13:07 UTC (permalink / raw) To: Wols Lists, linux-raid; +Cc: Phil Turmel, NeilBrown Hi Wol, I did read the links you sent, actually I'd already trawled through them prior to subscribing to the mailing list. They're how I learned about the mailing list. It seems that the conventional version of CentOS 8.5 is no longer available, there's just the CentOS 8 Streams version and I wasn't sure how it would go with the old style of CentOS. To be honest it didn't occur to me to go with another flavour of Linux, I just figured that I'd use CentOS to repair CentOS. Anyway, I did try using "mdadm /dev/md2 -r detached" and "mdadm /dev/md2 -r failed" to remove the removed disk to no avail. I ended up using "mdadm --grow /dev/md125 --array-size 218789036032 --backup-file=/mnt/sysimage/grow_md125_size_grow.bak --verbose" followed by "mdadm --grow /dev/md125 --raid-devices=30 --backup-file=/mnt/sysimage/grow_md125_grow_disks.bak --verbose" and it seems to be working in that it is reshaping the array although it is apparently going to take around 16,000 minutes (would that be because we've about 200TB of data?). My concern now is whether or not I'll still have the mount issue once it finally completes the reshape. If it does mount OK, does that mean I'm good to reboot it? With regards to your comment about downloading lsdrv, I'll try and do that although I'm having trouble configuring my DNS servers in the running rescue disk OS. I could run lsblk but, from what I see of lsdrv, lsblk doesn't have the detail that lsdrv has. I'll keep working on that and let you know what I get - it looks like I have to edit it to use the older version of Python that this installation has. Cheers, Bob -----Original Message----- From: Wols Lists <antlists@youngman.org.uk> Sent: Monday, 9 May 2022 4:52 PM To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org Cc: Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation On 09/05/2022 01:09, Bob Brand wrote: > Hi Wol, > > My apologies for continually bothering you but I have a couple of > questions: Did you read the links I sent you? > > 1. How do I overcome the error message "mount: /dev/md125: can't read > superblock." Do it use fsck? > > 2. The removed disk is showing as " - 0 0 30 removed". Is it > safe > to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to > overcome this? I don't know :-( This is getting a bit out of my depth. But I'm SERIOUSLY concerned you're still futzing about with CentOS 7!!! Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or the latest Fedora? Why didn't you download SUSE SLES 15? Any and all CentOS 7 will come with either an out-of-date mdadm, or a Frankenkernel. NEITHER are a good idea. Go back to the links I gave you, download and run lsdrv, and post the output here. Hopefully somebody will tell you the next steps. I will do my best. > > Thank you! > Cheers, Wol > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:33 AM > To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; > linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > I just tried it again with the --invalid_backup switch and it's now > showing the State as "clean, degraded".and it's showing all the disks > except for the suspect one that I removed. > > I'm unable to mount it and see the contents. I get the error "mount: > /dev/md125: can't read superblock." > > Is there more that I need to do? > > Thanks > > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:02 AM > To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>; > linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > Hi Wol, > > I've booted to the installation media and I've run the following command: > > mdadm > /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak > --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 > /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak > --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 > > But I'm still getting the error: > > mdadm: /dev/md125 has an active reshape - checking if critical section > needs to be restored > mdadm: No backup metadata on /mnt/sysimage/grow_md125.back > mdadm: Failed to find backup of critical section > mdadm: Failed to restore critical section for reshape, sorry. > > > Should I try the --invalid_backup switch or --force? > > Thanks, > Bob > > > -----Original Message----- > From: Bob Brand <brand@wmawater.com.au> > Sent: Monday, 9 May 2022 8:19 AM > To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that > seems to be the most recent they have. > > > -----Original Message----- > From: Wol <antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 8:16 AM > To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org > Cc: Phil Turmel <philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > How old is CentOS 7? With that kernel I guess it's quite old? > > Try and get a CentOS 8.5 disk. At the end of the day, the version of > linux doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are > using the latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), > but almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd > try starting it again with the rescue disk. It'll probably run fine. > Let it complete and then your old CentOS 7 will be fine with it. > > Cheers, > Wol > > On 08/05/2022 23:04, Bob Brand wrote: >> Thank Wol. >> >> Should I use a CentOS 7 disk or a CentOS disk? >> >> Thanks >> >> -----Original Message----- >> From: Wols Lists <antlists@youngman.org.uk> >> Sent: Monday, 9 May 2022 1:32 AM >> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org >> Cc: Phil Turmel <philip@turmel.org> >> Subject: Re: Failed adadm RAID array after aborted Grown operation >> >> On 08/05/2022 14:18, Bob Brand wrote: >>> If you’ve stuck with me and read all this way, thank you and I hope >>> you can help me. >> >> https://raid.wiki.kernel.org/index.php/Linux_Raid >> >> Especially >> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrog >> n >> >> What you need to do is revert the reshape. I know what may have >> happened, and what bothers me is your kernel version, 3.10. >> >> The first thing to try is to boot from up-to-date rescue media and >> see if an mdadm --revert works from there. If it does, your Centos >> should then bring everything back no problem. >> >> (You've currently got what I call a Frankensetup, a very old kernel, >> a pretty new mdadm, and a whole bunch of patches that does who knows >> what. >> You really need a matching kernel and mdadm, and your frankenkernel >> won't match anything ...) >> >> Let us know how that goes ... >> >> Cheers, >> Wol >> >> >> >> CAUTION!!! This E-mail originated from outside of WMA Water. Do not >> click links or open attachments unless you recognize the sender and >> know the content is safe. >> >> > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not > click links or open attachments unless you recognize the sender and > know the content is safe. > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>]
* RE: Failed adadm RAID array after aborted Grown operation [not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com> @ 2022-05-11 5:39 ` Bob Brand 2022-05-11 12:35 ` Reindl Harald 2022-05-20 15:13 ` Bob Brand 1 sibling, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-11 5:39 UTC (permalink / raw) To: Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Thanks Roger. My apologies for not replying earlier. By the time I read this I already had a reshape underway to reduce the size of the array back to the original 30 disks. So far it seems to be progressing OK although the ETA is around 10 days which is why I didn’t respond sooner – I’ve been bury dealing with the fallout from this. Do I understand that you would recommend upgrading our installation of Linux once the repair is complete or are advising downloading and compiling a new kernel as part of the repair? Or are you suggesting that it was the fact that we’re on such an old version of CentOS that caused this mess? I ask because once this is repaired (assuming it does complete successfully), I would like to extend the array to the full 45 drives of which this server is capable Thanks, Bob From: Roger Heflin <rogerheflin@gmail.com> Sent: Monday, 9 May 2022 9:05 PM To: Wols Lists <antlists@youngman.org.uk> Cc: Bob Brand <brand@wmawater.com.au>; Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation The short term easiest way for a new kernel might be this. Download a Fedora 35 livecd and boot from it. It will allow you to turn on the raid and/or reshape the raid and/or abort the reshape using the fedora 35 kernel and mdadm tools. Though all of this will need to be done manually from either the gui and/or command line, so it will be somewhat of a pain. The other choice is to download/compile/install a current http://kernel.org kernel. This takes some time (you have to install compiler/header rpms), and follow this (https://docs.rockylinux.org/guides/custom-linux-kernel/)--rockylinux so a redhat clone list of instructions. How long it takes will depend on the number of cpus your machine has and the value after the -j<cpustouse>. The biggest issue with this will likely be dealing with compile errors for missing dependencies you get for this or that tool and/or devel package being missing. And then you would still need to download the newest mdadm and compile and install it. These steps will take longer, but doing this will get your system on a new kernel and new tools, and typically once you know how to do this, this process of compiling/installing a kernel has for the most part not changed in a long time. And I have been doing this on and off for 20+ years and newer kernel on older userspace is widely used by a lot of the kernel developers so is generally well testing and in my experience just works to get you on a new kernel with minimal trouble. On Mon, May 9, 2022 at 5:24 AM Wols Lists <mailto:antlists@youngman.org.uk> wrote: On 09/05/2022 01:09, Bob Brand wrote: > Hi Wol, > > My apologies for continually bothering you but I have a couple of > questions: Did you read the links I sent you? > > 1. How do I overcome the error message "mount: /dev/md125: can't read > superblock." Do it use fsck? > > 2. The removed disk is showing as " - 0 0 30 removed". Is it > safe > to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to > overcome this? I don't know :-( This is getting a bit out of my depth. But I'm SERIOUSLY concerned you're still futzing about with CentOS 7!!! Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or the latest Fedora? Why didn't you download SUSE SLES 15? Any and all CentOS 7 will come with either an out-of-date mdadm, or a Frankenkernel. NEITHER are a good idea. Go back to the links I gave you, download and run lsdrv, and post the output here. Hopefully somebody will tell you the next steps. I will do my best. > > Thank you! > Cheers, Wol > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:33 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; Wol > <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > I just tried it again with the --invalid_backup switch and it's now > showing > the State as "clean, degraded".and it's showing all the disks except for > the > suspect one that I removed. > > I'm unable to mount it and see the contents. I get the error "mount: > /dev/md125: can't read superblock." > > Is there more that I need to do? > > Thanks > > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:02 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; Wol > <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > Hi Wol, > > I've booted to the installation media and I've run the following command: > > mdadm > /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak > --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 > /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak > --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 > > But I'm still getting the error: > > mdadm: /dev/md125 has an active reshape - checking if critical section > needs > to be restored > mdadm: No backup metadata on /mnt/sysimage/grow_md125.back > mdadm: Failed to find backup of critical section > mdadm: Failed to restore critical section for reshape, sorry. > > > Should I try the --invalid_backup switch or --force? > > Thanks, > Bob > > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 8:19 AM > To: Wol <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > OK. I've downloaded a Centos 7 - 2009 ISO from http://centos.org - that > seems to > be the most recent they have. > > > -----Original Message----- > From: Wol <mailto:antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 8:16 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > How old is CentOS 7? With that kernel I guess it's quite old? > > Try and get a CentOS 8.5 disk. At the end of the day, the version of linux > doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are using > the > latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), but > almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd try > starting it again with the rescue disk. It'll probably run fine. Let it > complete and then your old CentOS 7 will be fine with it. > > Cheers, > Wol > > On 08/05/2022 23:04, Bob Brand wrote: >> Thank Wol. >> >> Should I use a CentOS 7 disk or a CentOS disk? >> >> Thanks >> >> -----Original Message----- >> From: Wols Lists <mailto:antlists@youngman.org.uk> >> Sent: Monday, 9 May 2022 1:32 AM >> To: Bob Brand <mailto:brand@wmawater.com.au>; >> mailto:linux-raid@vger.kernel.org >> Cc: Phil Turmel <mailto:philip@turmel.org> >> Subject: Re: Failed adadm RAID array after aborted Grown operation >> >> On 08/05/2022 14:18, Bob Brand wrote: >>> If you’ve stuck with me and read all this way, thank you and I hope >>> you can help me. >> >> https://raid.wiki.kernel.org/index.php/Linux_Raid >> >> Especially >> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >> >> What you need to do is revert the reshape. I know what may have >> happened, and what bothers me is your kernel version, 3.10. >> >> The first thing to try is to boot from up-to-date rescue media and see >> if an mdadm --revert works from there. If it does, your Centos should >> then bring everything back no problem. >> >> (You've currently got what I call a Frankensetup, a very old kernel, a >> pretty new mdadm, and a whole bunch of patches that does who knows what. >> You really need a matching kernel and mdadm, and your frankenkernel >> won't match anything ...) >> >> Let us know how that goes ... >> >> Cheers, >> Wol >> >> >> >> CAUTION!!! This E-mail originated from outside of WMA Water. Do not >> click links or open attachments unless you recognize the sender and >> know the content is safe. >> >> > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-11 5:39 ` Bob Brand @ 2022-05-11 12:35 ` Reindl Harald 2022-05-11 13:22 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Reindl Harald @ 2022-05-11 12:35 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 11.05.22 um 07:39 schrieb Bob Brand: > Do I understand that you would recommend upgrading our installation of Linux > once the repair is complete or are advising downloading and compiling a new > kernel as part of the repair? Or are you suggesting that it was the fact > that we’re on such an old version of CentOS that caused this mess? I ask > because once this is repaired (assuming it does complete successfully), I > would like to extend the array to the full 45 drives of which this server is > capable you where adivised doing thatg with a live-iso of whatever distribution with a recent kernel and recent mdadm and leave your installed os alone ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-11 12:35 ` Reindl Harald @ 2022-05-11 13:22 ` Bob Brand 2022-05-11 14:56 ` Reindl Harald 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-11 13:22 UTC (permalink / raw) To: Reindl Harald, Roger Heflin, Wols Lists Cc: Linux RAID, Phil Turmel, NeilBrown Sorry Reindl. I'm not sure I understand. Are you saying I did or didn't do the right thing in booting from a CentOS rescue disk? At the moment it's running from the rescue disk and, be it the best distro to have used (or not), I would imagine that I need to keep running from the rescue disk until the reshape is complete as rebooting in the middle of a reshape is what got me in this mess. Thanks -----Original Message----- From: Reindl Harald <h.reindl@thelounge.net> Sent: Wednesday, 11 May 2022 10:36 PM To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>; Wols Lists <antlists@youngman.org.uk> Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation Am 11.05.22 um 07:39 schrieb Bob Brand: > Do I understand that you would recommend upgrading our installation of > Linux once the repair is complete or are advising downloading and > compiling a new kernel as part of the repair? Or are you suggesting > that it was the fact that we’re on such an old version of CentOS that > caused this mess? I ask because once this is repaired (assuming it > does complete successfully), I would like to extend the array to the > full 45 drives of which this server is capable you where adivised doing thatg with a live-iso of whatever distribution with a recent kernel and recent mdadm and leave your installed os alone CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-11 13:22 ` Bob Brand @ 2022-05-11 14:56 ` Reindl Harald 2022-05-11 14:59 ` Reindl Harald 0 siblings, 1 reply; 23+ messages in thread From: Reindl Harald @ 2022-05-11 14:56 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 11.05.22 um 15:22 schrieb Bob Brand: > Sorry Reindl. I'm not sure I understand. Are you saying I did or didn't do > the right thing in booting from a CentOS rescue disk? At the moment it's > running from the rescue disk and, be it the best distro to have used (or > not), I would imagine that I need to keep running from the rescue disk until > the reshape is complete as rebooting in the middle of a reshape is what got > me in this mess. and i don't understand what you did not understand in the clear response below you got days ago! due reshape you where advised use whatever rescue/live system with a recent kernel and mdadm, not more and not less just to avoid probaly long fixed bugs in your old kernel --------------------- Try and get a CentOS 8.5 disk. At the end of the day, the version of linux doesn't matter. What you need is an up-to-date rescue disk. Distro/whatever is unimportant - what IS important is that you are using the latest mdadm, and a kernel that matches. The problem you have sounds like a long-standing but now-fixed bug. An original CentOS disk might be okay (with matched kernel and mdadm), but almost certainly has what I consider to be a "dodgy" version of mdadm. If you can afford the downtime, after you've reverted the reshape, I'd try starting it again with the rescue disk. It'll probably run fine. Let it complete and then your old CentOS 7 will be fine with it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-11 14:56 ` Reindl Harald @ 2022-05-11 14:59 ` Reindl Harald 2022-05-13 5:32 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Reindl Harald @ 2022-05-11 14:59 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 11.05.22 um 16:56 schrieb Reindl Harald: > > > Am 11.05.22 um 15:22 schrieb Bob Brand: >> Sorry Reindl. I'm not sure I understand. Are you saying I did or >> didn't do >> the right thing in booting from a CentOS rescue disk? At the moment it's >> running from the rescue disk and, be it the best distro to have used (or >> not), I would imagine that I need to keep running from the rescue disk >> until >> the reshape is complete as rebooting in the middle of a reshape is >> what got >> me in this mess. and nowhere did i say reboot now and i only responded to your "Do I understand that you would recommend upgrading our installation of Linux once the repair is complete or are advising downloading and compiling a new kernel as part of the repair?" nobody said that - the only point was use a as recent kernel as possible with all rgow/reshape operations > and i don't understand what you did not understand in the clear response > below you got days ago! > > due reshape you where advised use whatever rescue/live system with a > recent kernel and mdadm, not more and not less > > just to avoid probaly long fixed bugs in your old kernel > > --------------------- > > Try and get a CentOS 8.5 disk. At the end of the day, the version of > linux doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are using > the latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), but > almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd > try starting it again with the rescue disk. It'll probably run fine. Let > it complete and then your old CentOS 7 will be fine with it ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-11 14:59 ` Reindl Harald @ 2022-05-13 5:32 ` Bob Brand 2022-05-13 8:18 ` Reindl Harald 0 siblings, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-13 5:32 UTC (permalink / raw) To: Reindl Harald, Roger Heflin, Wols Lists Cc: Linux RAID, Phil Turmel, NeilBrown This may not be the forum to ask this but what exactly is "compiling the kernel". From what I've been reading, it sounds like a somewhat involved and complex process - is it? Is compiling a new kernel the same as upgrading the OS? I'm getting the impression that it sort of is but sort of isn't. Is it possible to compile a kernel for a rescue CD (from the comments I've read, it is possible)? If I were to compile a new kernel, would I expect the version number for the kernel and mdadm to be the same? Sorry for all the question but, as I said at the outset, a lot of this is all very new to me. Thank you, Bob -----Original Message----- From: Reindl Harald <h.reindl@thelounge.net> Sent: Thursday, 12 May 2022 12:59 AM To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>; Wols Lists <antlists@youngman.org.uk> Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation Am 11.05.22 um 16:56 schrieb Reindl Harald: > > > Am 11.05.22 um 15:22 schrieb Bob Brand: >> Sorry Reindl. I'm not sure I understand. Are you saying I did or >> didn't do the right thing in booting from a CentOS rescue disk? At >> the moment it's running from the rescue disk and, be it the best >> distro to have used (or not), I would imagine that I need to keep >> running from the rescue disk until the reshape is complete as >> rebooting in the middle of a reshape is what got me in this mess. and nowhere did i say reboot now and i only responded to your "Do I understand that you would recommend upgrading our installation of Linux once the repair is complete or are advising downloading and compiling a new kernel as part of the repair?" nobody said that - the only point was use a as recent kernel as possible with all rgow/reshape operations > and i don't understand what you did not understand in the clear > response below you got days ago! > > due reshape you where advised use whatever rescue/live system with a > recent kernel and mdadm, not more and not less > > just to avoid probaly long fixed bugs in your old kernel > > --------------------- > > Try and get a CentOS 8.5 disk. At the end of the day, the version of > linux doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are > using the latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), > but almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd > try starting it again with the rescue disk. It'll probably run fine. > Let it complete and then your old CentOS 7 will be fine with it CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-13 5:32 ` Bob Brand @ 2022-05-13 8:18 ` Reindl Harald 0 siblings, 0 replies; 23+ messages in thread From: Reindl Harald @ 2022-05-13 8:18 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 13.05.22 um 07:32 schrieb Bob Brand: > This may not be the forum to ask it is because you can type that sort of questions also into google and there isn't a good reason to build your own kernel in 2022 for most usecases > this but what exactly is "compiling the > kernel". From what I've been reading, it sounds like a somewhat involved and > complex process - is it? Is compiling a new kernel the same as upgrading the > OS? I'm getting the impression that it sort of is but sort of isn't. Is it > possible to compile a kernel for a rescue CD (from the comments I've read, > it is possible)? If I were to compile a new kernel, would I expect the > version number for the kernel and mdadm to be the same? Sorry for all the > question but, as I said at the outset, a lot of this is all very new to me. don't get me wrong but "Is compiling a new kernel the same as upgrading the OS" and "what exactly is "compiling the kernel" implies just use a binary distribution when it sounds that you even don't know what compile software from source means > -----Original Message----- > From: Reindl Harald <h.reindl@thelounge.net> > Sent: Thursday, 12 May 2022 12:59 AM > To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>; > Wols Lists <antlists@youngman.org.uk> > Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel > <philip@turmel.org>; NeilBrown <neilb@suse.com> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > > > Am 11.05.22 um 16:56 schrieb Reindl Harald: >> >> >> Am 11.05.22 um 15:22 schrieb Bob Brand: >>> Sorry Reindl. I'm not sure I understand. Are you saying I did or >>> didn't do the right thing in booting from a CentOS rescue disk? At >>> the moment it's running from the rescue disk and, be it the best >>> distro to have used (or not), I would imagine that I need to keep >>> running from the rescue disk until the reshape is complete as >>> rebooting in the middle of a reshape is what got me in this mess. > > and nowhere did i say reboot now > > and i only responded to your "Do I understand that you would recommend > upgrading our installation of Linux once the repair is complete or are > advising downloading and compiling a new kernel as part of the repair?" > > nobody said that - the only point was use a as recent kernel as possible > with all rgow/reshape operations > >> and i don't understand what you did not understand in the clear >> response below you got days ago! >> >> due reshape you where advised use whatever rescue/live system with a >> recent kernel and mdadm, not more and not less >> >> just to avoid probaly long fixed bugs in your old kernel >> >> --------------------- >> >> Try and get a CentOS 8.5 disk. At the end of the day, the version of >> linux doesn't matter. What you need is an up-to-date rescue disk. >> Distro/whatever is unimportant - what IS important is that you are >> using the latest mdadm, and a kernel that matches. >> >> The problem you have sounds like a long-standing but now-fixed bug. An >> original CentOS disk might be okay (with matched kernel and mdadm), >> but almost certainly has what I consider to be a "dodgy" version of mdadm. >> >> If you can afford the downtime, after you've reverted the reshape, I'd >> try starting it again with the rescue disk. It'll probably run fine. >> Let it complete and then your old CentOS 7 will be fine with it > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation [not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com> 2022-05-11 5:39 ` Bob Brand @ 2022-05-20 15:13 ` Bob Brand 2022-05-20 15:41 ` Reindl Harald 1 sibling, 1 reply; 23+ messages in thread From: Bob Brand @ 2022-05-20 15:13 UTC (permalink / raw) To: Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown UPDATE: The array finally finished the reshape process (after almost two weeks!) and I now have an array that's showing as clean with the original 30 disks. However, when I try to mount it, I get the message "mount: /dev/md125: can't read superblock". Any suggestions as to what my next step should be? Note: it's still running from the rescue disk. Thank you, Bob From: Roger Heflin <rogerheflin@gmail.com> Sent: Monday, 9 May 2022 9:05 PM To: Wols Lists <antlists@youngman.org.uk> Cc: Bob Brand <brand@wmawater.com.au>; Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation The short term easiest way for a new kernel might be this. Download a Fedora 35 livecd and boot from it. It will allow you to turn on the raid and/or reshape the raid and/or abort the reshape using the fedora 35 kernel and mdadm tools. Though all of this will need to be done manually from either the gui and/or command line, so it will be somewhat of a pain. The other choice is to download/compile/install a current http://kernel.org kernel. This takes some time (you have to install compiler/header rpms), and follow this (https://docs.rockylinux.org/guides/custom-linux-kernel/)--rockylinux so a redhat clone list of instructions. How long it takes will depend on the number of cpus your machine has and the value after the -j<cpustouse>. The biggest issue with this will likely be dealing with compile errors for missing dependencies you get for this or that tool and/or devel package being missing. And then you would still need to download the newest mdadm and compile and install it. These steps will take longer, but doing this will get your system on a new kernel and new tools, and typically once you know how to do this, this process of compiling/installing a kernel has for the most part not changed in a long time. And I have been doing this on and off for 20+ years and newer kernel on older userspace is widely used by a lot of the kernel developers so is generally well testing and in my experience just works to get you on a new kernel with minimal trouble. On Mon, May 9, 2022 at 5:24 AM Wols Lists <mailto:antlists@youngman.org.uk> wrote: On 09/05/2022 01:09, Bob Brand wrote: > Hi Wol, > > My apologies for continually bothering you but I have a couple of > questions: Did you read the links I sent you? > > 1. How do I overcome the error message "mount: /dev/md125: can't read > superblock." Do it use fsck? > > 2. The removed disk is showing as " - 0 0 30 removed". Is it > safe > to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to > overcome this? I don't know :-( This is getting a bit out of my depth. But I'm SERIOUSLY concerned you're still futzing about with CentOS 7!!! Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or the latest Fedora? Why didn't you download SUSE SLES 15? Any and all CentOS 7 will come with either an out-of-date mdadm, or a Frankenkernel. NEITHER are a good idea. Go back to the links I gave you, download and run lsdrv, and post the output here. Hopefully somebody will tell you the next steps. I will do my best. > > Thank you! > Cheers, Wol > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:33 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; Wol > <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > I just tried it again with the --invalid_backup switch and it's now > showing > the State as "clean, degraded".and it's showing all the disks except for > the > suspect one that I removed. > > I'm unable to mount it and see the contents. I get the error "mount: > /dev/md125: can't read superblock." > > Is there more that I need to do? > > Thanks > > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 9:02 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; Wol > <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > Hi Wol, > > I've booted to the installation media and I've run the following command: > > mdadm > /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak > --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 > /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak > --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19 > > But I'm still getting the error: > > mdadm: /dev/md125 has an active reshape - checking if critical section > needs > to be restored > mdadm: No backup metadata on /mnt/sysimage/grow_md125.back > mdadm: Failed to find backup of critical section > mdadm: Failed to restore critical section for reshape, sorry. > > > Should I try the --invalid_backup switch or --force? > > Thanks, > Bob > > > -----Original Message----- > From: Bob Brand <mailto:brand@wmawater.com.au> > Sent: Monday, 9 May 2022 8:19 AM > To: Wol <mailto:antlists@youngman.org.uk>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: RE: Failed adadm RAID array after aborted Grown operation > > OK. I've downloaded a Centos 7 - 2009 ISO from http://centos.org - that > seems to > be the most recent they have. > > > -----Original Message----- > From: Wol <mailto:antlists@youngman.org.uk> > Sent: Monday, 9 May 2022 8:16 AM > To: Bob Brand <mailto:brand@wmawater.com.au>; > mailto:linux-raid@vger.kernel.org > Cc: Phil Turmel <mailto:philip@turmel.org> > Subject: Re: Failed adadm RAID array after aborted Grown operation > > How old is CentOS 7? With that kernel I guess it's quite old? > > Try and get a CentOS 8.5 disk. At the end of the day, the version of linux > doesn't matter. What you need is an up-to-date rescue disk. > Distro/whatever is unimportant - what IS important is that you are using > the > latest mdadm, and a kernel that matches. > > The problem you have sounds like a long-standing but now-fixed bug. An > original CentOS disk might be okay (with matched kernel and mdadm), but > almost certainly has what I consider to be a "dodgy" version of mdadm. > > If you can afford the downtime, after you've reverted the reshape, I'd try > starting it again with the rescue disk. It'll probably run fine. Let it > complete and then your old CentOS 7 will be fine with it. > > Cheers, > Wol > > On 08/05/2022 23:04, Bob Brand wrote: >> Thank Wol. >> >> Should I use a CentOS 7 disk or a CentOS disk? >> >> Thanks >> >> -----Original Message----- >> From: Wols Lists <mailto:antlists@youngman.org.uk> >> Sent: Monday, 9 May 2022 1:32 AM >> To: Bob Brand <mailto:brand@wmawater.com.au>; >> mailto:linux-raid@vger.kernel.org >> Cc: Phil Turmel <mailto:philip@turmel.org> >> Subject: Re: Failed adadm RAID array after aborted Grown operation >> >> On 08/05/2022 14:18, Bob Brand wrote: >>> If you’ve stuck with me and read all this way, thank you and I hope >>> you can help me. >> >> https://raid.wiki.kernel.org/index.php/Linux_Raid >> >> Especially >> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >> >> What you need to do is revert the reshape. I know what may have >> happened, and what bothers me is your kernel version, 3.10. >> >> The first thing to try is to boot from up-to-date rescue media and see >> if an mdadm --revert works from there. If it does, your Centos should >> then bring everything back no problem. >> >> (You've currently got what I call a Frankensetup, a very old kernel, a >> pretty new mdadm, and a whole bunch of patches that does who knows what. >> You really need a matching kernel and mdadm, and your frankenkernel >> won't match anything ...) >> >> Let us know how that goes ... >> >> Cheers, >> Wol >> >> >> >> CAUTION!!! This E-mail originated from outside of WMA Water. Do not >> click links or open attachments unless you recognize the sender and >> know the content is safe. >> >> > > > > CAUTION!!! This E-mail originated from outside of WMA Water. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-20 15:13 ` Bob Brand @ 2022-05-20 15:41 ` Reindl Harald 2022-05-22 4:13 ` Bob Brand 0 siblings, 1 reply; 23+ messages in thread From: Reindl Harald @ 2022-05-20 15:41 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 20.05.22 um 17:13 schrieb Bob Brand: > UPDATE: > > The array finally finished the reshape process (after almost two weeks!) and > I now have an array that's showing as clean with the original 30 disks. > However, when I try to mount it, I get the message "mount: /dev/md125: can't > read superblock". > > Any suggestions as to what my next step should be? Note: it's still running > from the rescue disk restore from a backup - the array is one thing, the filesystem is a different layer and it seems to be heavily damag after all the things which happened ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-20 15:41 ` Reindl Harald @ 2022-05-22 4:13 ` Bob Brand 2022-05-22 11:25 ` Reindl Harald 2022-05-22 13:31 ` Wols Lists 0 siblings, 2 replies; 23+ messages in thread From: Bob Brand @ 2022-05-22 4:13 UTC (permalink / raw) To: Reindl Harald, Roger Heflin, Wols Lists Cc: Linux RAID, Phil Turmel, NeilBrown Thanks Reindl. Is xfs_repair an option? And, if it is, do I run it on md125 or the individual sd devices? Unfortunately, restore from back up isn't an option - after all to where do you back up 200TB of data? This storage was originally set up with the understanding that it wasn't backed up and so no valuable data was supposed to have been stored on it. Unfortunately, people being what they are, valuable data has been stored there and I'm the mug now trying to get it back - it's a system that I've inherited. So, any help or constructive advice would be appreciated. Thanks, Bob -----Original Message----- From: Reindl Harald <h.reindl@thelounge.net> Sent: Saturday, 21 May 2022 1:41 AM To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>; Wols Lists <antlists@youngman.org.uk> Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation Am 20.05.22 um 17:13 schrieb Bob Brand: > UPDATE: > > The array finally finished the reshape process (after almost two > weeks!) and I now have an array that's showing as clean with the original > 30 disks. > However, when I try to mount it, I get the message "mount: /dev/md125: > can't read superblock". > > Any suggestions as to what my next step should be? Note: it's still > running from the rescue disk restore from a backup - the array is one thing, the filesystem is a different layer and it seems to be heavily damag after all the things which happened CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-22 4:13 ` Bob Brand @ 2022-05-22 11:25 ` Reindl Harald 2022-05-22 13:31 ` Wols Lists 1 sibling, 0 replies; 23+ messages in thread From: Reindl Harald @ 2022-05-22 11:25 UTC (permalink / raw) To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown Am 22.05.22 um 06:13 schrieb Bob Brand: > Is xfs_repair an option? unlikely in case the underlying device has all sort of damages - think of the RAID like a single disk and how the filesystem reacts if you shoot holes in it > And, if it is, do I run it on md125 or the > individual sd devices? when you think two seconds it's obvious - is the filesystem on top of single disks or on top of the whole RAID - the filesystem don't even know about individual devices > Unfortunately, restore from back up isn't an option - after all to where do > you back up 200TB of data? on a second machine in a different building - the inital sync is done local and then no matter how large the data rsync is enough - the daily delta don't vary just because the whole dataset is huge and don't get me wrong but who starts a reshape on a 200 TB storage when he knows that there is no backup? > This storage was originally set up with the > understanding that it wasn't backed up and so no valuable data was supposed > to have been stored on it. well, then i won't store it at all > Unfortunately, people being what they are, > valuable data has been stored there and I'm the mug now trying to get it > back - it's a system that I've inherited. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation 2022-05-22 4:13 ` Bob Brand 2022-05-22 11:25 ` Reindl Harald @ 2022-05-22 13:31 ` Wols Lists 2022-05-22 22:54 ` Bob Brand 1 sibling, 1 reply; 23+ messages in thread From: Wols Lists @ 2022-05-22 13:31 UTC (permalink / raw) To: Bob Brand, Reindl Harald, Roger Heflin; +Cc: Linux RAID, Phil Turmel, NeilBrown On 22/05/2022 05:13, Bob Brand wrote: > Unfortunately, restore from back up isn't an option - after all to where do > you back up 200TB of data? This storage was originally set up with the > understanding that it wasn't backed up and so no valuable data was supposed > to have been stored on it. Unfortunately, people being what they are, > valuable data has been stored there and I'm the mug now trying to get it > back - it's a system that I've inherited. > > So, any help or constructive advice would be appreciated. Unfortunately, about the only constructive advice I can give you is "live and learn". I made a similar massive cock-up at the start of my career, and I've always been excessively cautious about disks and data ever since. What your employer needs to take away from this - and no disrespect to yourself - is that if they run a system that was probably supported for about five years, then has been running on duck tape and baling wire for a further ten years, DON'T give it to someone with pretty much NO sysadmin or computer ops experience to carry out a potentially disastrous operation like messing about with a raid array! This is NOT a simple setup, and it seems clear to me that you have little familiarity with the basic concepts. Unfortunately, your employer was playing Russian Roulette, and the gun went off. On a *personal* level, and especially if your employer wants you to continue looking after their systems, they need to give you an (old?) box with a bunch of disk drives. Go back to the raid website and look at the article about building a new system. Take that system they've given you, and use that article as a guide to build it from scratch. It's actually about the computer being used right now to type this message. I use(d) gentoo as my distro. It's a great distro, but for a newbie I think it takes "throw them in at the deep end" to extremes. Go find Slackware and start with that. It's not a "hold their hands and do everything for them" distro, but nor is it a "here's the instructions, if they don't work for you then you're on your own" distro. Once you've got to grips with Slack, have a go at gentoo. And once you've managed to get gentoo working, you should have a pretty decent grasp of what's going "under the bonnet". CentOS/RedHat/SLES should be a breeze after that. Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation 2022-05-22 13:31 ` Wols Lists @ 2022-05-22 22:54 ` Bob Brand 0 siblings, 0 replies; 23+ messages in thread From: Bob Brand @ 2022-05-22 22:54 UTC (permalink / raw) To: Wols Lists, Reindl Harald, Roger Heflin Cc: Linux RAID, Phil Turmel, NeilBrown Thanks Wol. I can't really disagree with anything you've said except to mention that I do have a fair bit of experience (20+ years) but it's all been pretty much Microsoft/Windows and hardware RAID. Like I said this device was never meant to be used for critical data - if nothing else this has been something of a wake-up call for us. -----Original Message----- From: Wols Lists <antlists@youngman.org.uk> Sent: Sunday, 22 May 2022 11:31 PM To: Bob Brand <brand@wmawater.com.au>; Reindl Harald <h.reindl@thelounge.net>; Roger Heflin <rogerheflin@gmail.com> Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com> Subject: Re: Failed adadm RAID array after aborted Grown operation On 22/05/2022 05:13, Bob Brand wrote: > Unfortunately, restore from back up isn't an option - after all to > where do you back up 200TB of data? This storage was originally set up > with the understanding that it wasn't backed up and so no valuable > data was supposed to have been stored on it. Unfortunately, people > being what they are, valuable data has been stored there and I'm the > mug now trying to get it back - it's a system that I've inherited. > > So, any help or constructive advice would be appreciated. Unfortunately, about the only constructive advice I can give you is "live and learn". I made a similar massive cock-up at the start of my career, and I've always been excessively cautious about disks and data ever since. What your employer needs to take away from this - and no disrespect to yourself - is that if they run a system that was probably supported for about five years, then has been running on duck tape and baling wire for a further ten years, DON'T give it to someone with pretty much NO sysadmin or computer ops experience to carry out a potentially disastrous operation like messing about with a raid array! This is NOT a simple setup, and it seems clear to me that you have little familiarity with the basic concepts. Unfortunately, your employer was playing Russian Roulette, and the gun went off. On a *personal* level, and especially if your employer wants you to continue looking after their systems, they need to give you an (old?) box with a bunch of disk drives. Go back to the raid website and look at the article about building a new system. Take that system they've given you, and use that article as a guide to build it from scratch. It's actually about the computer being used right now to type this message. I use(d) gentoo as my distro. It's a great distro, but for a newbie I think it takes "throw them in at the deep end" to extremes. Go find Slackware and start with that. It's not a "hold their hands and do everything for them" distro, but nor is it a "here's the instructions, if they don't work for you then you're on your own" distro. Once you've got to grips with Slack, have a go at gentoo. And once you've managed to get gentoo working, you should have a pretty decent grasp of what's going "under the bonnet". CentOS/RedHat/SLES should be a breeze after that. Cheers, Wol CAUTION!!! This E-mail originated from outside of WMA Water. Do not click links or open attachments unless you recognize the sender and know the content is safe. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-05-22 22:54 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-08 13:18 Failed adadm RAID array after aborted Grown operation Bob Brand 2022-05-08 15:32 ` Wols Lists 2022-05-08 22:04 ` Bob Brand 2022-05-08 22:15 ` Wol 2022-05-08 22:19 ` Bob Brand 2022-05-08 23:02 ` Bob Brand 2022-05-08 23:32 ` Bob Brand 2022-05-09 0:09 ` Bob Brand 2022-05-09 6:52 ` Wols Lists 2022-05-09 13:07 ` Bob Brand [not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com> 2022-05-11 5:39 ` Bob Brand 2022-05-11 12:35 ` Reindl Harald 2022-05-11 13:22 ` Bob Brand 2022-05-11 14:56 ` Reindl Harald 2022-05-11 14:59 ` Reindl Harald 2022-05-13 5:32 ` Bob Brand 2022-05-13 8:18 ` Reindl Harald 2022-05-20 15:13 ` Bob Brand 2022-05-20 15:41 ` Reindl Harald 2022-05-22 4:13 ` Bob Brand 2022-05-22 11:25 ` Reindl Harald 2022-05-22 13:31 ` Wols Lists 2022-05-22 22:54 ` Bob Brand
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.