* Failed adadm RAID array after aborted Grown operation
@ 2022-05-08 13:18 Bob Brand
2022-05-08 15:32 ` Wols Lists
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-08 13:18 UTC (permalink / raw)
To: linux-raid
Hi,
Im somewhat new to Linux and mdadm although Ive certainly learnt a lot
over the last 24 hours.
I have a SuperMicro server running CentOS 7 (3.10.0-1160.11.1.e17.x86_64)
with version 4.1 2018-10-01 of mdadm with that was happily running with
30 8TB disk in a RAID6 configuration. (It also has boot and root on a
RAID1 array the RAID6 array being solely for data.) It was however
starting to run out of space and I investigated adding more drives to the
array (it can hold a total of 45 drives).
Since this device is no longer under support, obtaining the same drives as
it already contained wasnt an option and the supplier couldnt guarantee
that they could supply compatible drives. We did come to an arrangement
where I would try one drive and, if it didnt work, I could return any
unopened units.
I spent ages ensuring that the ones hed suggested were as compatible as
possible and I based the specs of the existing drives off the invoice for
the entire system. This turned out to be a mistake as the invoice stated
they were 512e drives but, as I discovered after the new drives had
arrived and I was doing a final check the existing were actually 4096k
drives. Of course the new drives were 512e. Bother! After a lot more
reading I found out that it might be possible to reformat the new drives
from 512e to 4096k using sg_format.
I installed the test drive and proceeded to see if it was possible to
format them to 4096k using the command sg_format size=4096 /dev/sd<x>.
All was proceeding smoothly when my ssh session terminated due a faulty
docking station killing my Ethernet connection.
So I logged onto the console and restarted the sg_format which completed
OK, sort of it did convert the disk to 4096k but it did throw an I/O
error or two but they didnt seem too concerning and I figured, if there
was a problem, it would show up in the next couple of steps. Ive since
discovered the dmesg log and that indicated that there were significantly
more I/O errors than I thought.
Anyway, since sg_format appeared to complete OK, I moved onto the next
stage which was to partition the disk with the following commands
parted -a optimal /dev/sd<x>
(parted) mklabel msdos
(parted) mkpart primary 2048s 100% (need to check that the start is
correct)
(parted) align-check optimal 1 (verify alignment of partition 1)
(parted) set 1 raid on (set the FLAG to RAID)
(parted) print
Unfortunately, I dont have the results of the print command as my laptop
unexpectedly shut down over night (it hasnt been a good weekend) but the
partitioning appeared to complete without incident.
I then added the new disk to the array:
mdadm --add /dev/md125 /dev/sd<x>
And it completed without any problems.
I then proceeded to grow the array:
mdadm --grow --raid-devices=31 --backup-file=/grow_md125.bak
/dev/md125
I monitored this with cat /proc/mdstat and it showed that it was reshaping
but the speed was 0K/sec and the reshape didnt progress from 0%.
#cat /proc/mdstat produced:
Personalities : [raid1] [raid6] [raid5] [raid4]
md125 : active raid6 sdab1[30] sdw1[26] sdc1[6] sdm1[16] sdi1[12]
sdz1[29] sdh1[11] sdg1[10] sds1[22] sdf1[9] sdq1[20] sdaa1[1] sdo1[18]
sdu1[24] sdb1[5] sdae1[4] sdl1[15] sdj1[13] sdn1[17] sdp1[19] sdv1[25]
sde1[8] sdd1[7] sdr1[21] sdt1[23] sdx1[27] sdad1[3] sdac1[2] sdy1[28]
sda1[0] sdk1[14]
218789036032 blocks super 1.2 level 6, 512k chunk, algorithm 2
[31/31] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
[>....................] reshape = 0.0% (1/7813894144)
finish=328606806584.3min speed=0K/sec
bitmap: 0/59 pages [0KB], 65536KB chunk
md126 : active raid1 sdaf1[0] sdag1[1]
100554752 blocks super 1.2 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md127 : active raid1 sdaf3[0] sdag2[1]
976832 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
# mdadm --detail /dev/md125 produced:
/dev/md125:
Version : 1.2
Creation Time : Wed Sep 13 15:09:40 2017
Raid Level : raid6
Array Size : 218789036032 (203.76 TiB 224.04 TB)
Used Dev Size : 7813894144 (7.28 TiB 8.00 TB)
Raid Devices : 31
Total Devices : 31
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun May 8 00:47:35 2022
State : clean, reshaping
Active Devices : 31
Working Devices : 31
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Reshape Status : 0% complete
Delta Devices : 1, (30->31)
Name : localhost.localdomain:SW-RAID6
UUID : f9b65f55:5f257add:1140ccc0:46ca6c19
Events : 1053617
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 65 161 1 active sync /dev/sdaa1
2 65 193 2 active sync /dev/sdac1
3 65 209 3 active sync /dev/sdad1
4 65 225 4 active sync /dev/sdae1
5 8 17 5 active sync /dev/sdb1
6 8 33 6 active sync /dev/sdc1
7 8 49 7 active sync /dev/sdd1
8 8 65 8 active sync /dev/sde1
9 8 81 9 active sync /dev/sdf1
10 8 97 10 active sync /dev/sdg1
11 8 113 11 active sync /dev/sdh1
12 8 129 12 active sync /dev/sdi1
13 8 145 13 active sync /dev/sdj1
14 8 161 14 active sync /dev/sdk1
15 8 177 15 active sync /dev/sdl1
16 8 193 16 active sync /dev/sdm1
17 8 209 17 active sync /dev/sdn1
18 8 225 18 active sync /dev/sdo1
19 8 241 19 active sync /dev/sdp1
20 65 1 20 active sync /dev/sdq1
21 65 17 21 active sync /dev/sdr1
22 65 33 22 active sync /dev/sds1
23 65 49 23 active sync /dev/sdt1
24 65 65 24 active sync /dev/sdu1
25 65 81 25 active sync /dev/sdv1
26 65 97 26 active sync /dev/sdw1
27 65 113 27 active sync /dev/sdx1
28 65 129 28 active sync /dev/sdy1
29 65 145 29 active sync /dev/sdz1
30 65 177 30 active sync /dev/sdab1
NOTE: the new disk is /dev/sdab
About 12 hours later, as the reshape hadnt progressed from 0%, I looked
at ways of aborting it, such as mdadm --stop /dev/md125 which didn't work
so I ended up rebooting the server and this is where things really went
pear-shaped.
The server came up in emergency mode, which I found odd given that the
boot and root should have been OK.
I was able to log on as root OK but the RAID6 array ws stuck in the
reshape state.
I then tried mdadm --assemble --update=revert-reshape
--backup-file=/grow_md125.bak --verbose --uuid=
f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125 and this produced:
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got
<varying numbers>
mdadm: No RAID super block on /dev/sde
.
.
mdadm: /dev/sde1 is identified as a member of /dev/md125, slot 6
.
.
mdadm: /dev/md125 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on /grow_md125.back
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
I've tried difference variations on this including mdadm --assemble
--invalid-backup --force but I won't include all the different commands
here because I'm having to type all this since I can't copy anything off
the server while it's in Emergency Mode.
I have also removed the suspect disk but this hasn't made any difference.
But the closest I've come to fixing this is running mdadm /dev/md125
--assemble --invalid-backup --backup-file=/grow_md125.bak --verbose
/dev/sdc1 /dev/sdd1 ....... /dev/sdaf1 and this produces:
.
.
.
mdadm: /dev/sdaf1 is identified as a member of /dev/md125, slot 4.
mdadm: /dev/md125 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on /grow_md125.back
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/sdac1 to /dev/md125 as 1
.
.
.
mdadm: failed to RUN_ARRAY /dev/md125: Invalid argument
dmesg has this information:
md: md125 stopped.
md/raid:md125: reshape_position too early for auto-recovery -
aborting.
md: pers->run() failed ...
md: md125 stopped.
If youve stuck with me and read all this way, thank you and I hope you
can help me.
Regards,
Bob Brand
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-08 13:18 Failed adadm RAID array after aborted Grown operation Bob Brand
@ 2022-05-08 15:32 ` Wols Lists
2022-05-08 22:04 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Wols Lists @ 2022-05-08 15:32 UTC (permalink / raw)
To: Bob Brand, linux-raid; +Cc: Phil Turmel
On 08/05/2022 14:18, Bob Brand wrote:
> If you’ve stuck with me and read all this way, thank you and I hope you
> can help me.
https://raid.wiki.kernel.org/index.php/Linux_Raid
Especially
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
What you need to do is revert the reshape. I know what may have
happened, and what bothers me is your kernel version, 3.10.
The first thing to try is to boot from up-to-date rescue media and see
if an mdadm --revert works from there. If it does, your Centos should
then bring everything back no problem.
(You've currently got what I call a Frankensetup, a very old kernel, a
pretty new mdadm, and a whole bunch of patches that does who knows what.
You really need a matching kernel and mdadm, and your frankenkernel
won't match anything ...)
Let us know how that goes ...
Cheers,
Wol
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-08 15:32 ` Wols Lists
@ 2022-05-08 22:04 ` Bob Brand
2022-05-08 22:15 ` Wol
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-08 22:04 UTC (permalink / raw)
To: Wols Lists, linux-raid; +Cc: Phil Turmel
Thank Wol.
Should I use a CentOS 7 disk or a CentOS disk?
Thanks
-----Original Message-----
From: Wols Lists <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 1:32 AM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: Re: Failed adadm RAID array after aborted Grown operation
On 08/05/2022 14:18, Bob Brand wrote:
> If you’ve stuck with me and read all this way, thank you and I hope
> you can help me.
https://raid.wiki.kernel.org/index.php/Linux_Raid
Especially
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
What you need to do is revert the reshape. I know what may have happened,
and what bothers me is your kernel version, 3.10.
The first thing to try is to boot from up-to-date rescue media and see if an
mdadm --revert works from there. If it does, your Centos should then bring
everything back no problem.
(You've currently got what I call a Frankensetup, a very old kernel, a
pretty new mdadm, and a whole bunch of patches that does who knows what.
You really need a matching kernel and mdadm, and your frankenkernel won't
match anything ...)
Let us know how that goes ...
Cheers,
Wol
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-08 22:04 ` Bob Brand
@ 2022-05-08 22:15 ` Wol
2022-05-08 22:19 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Wol @ 2022-05-08 22:15 UTC (permalink / raw)
To: Bob Brand, linux-raid; +Cc: Phil Turmel
How old is CentOS 7? With that kernel I guess it's quite old?
Try and get a CentOS 8.5 disk. At the end of the day, the version of
linux doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using
the latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd
try starting it again with the rescue disk. It'll probably run fine. Let
it complete and then your old CentOS 7 will be fine with it.
Cheers,
Wol
On 08/05/2022 23:04, Bob Brand wrote:
> Thank Wol.
>
> Should I use a CentOS 7 disk or a CentOS disk?
>
> Thanks
>
> -----Original Message-----
> From: Wols Lists <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 1:32 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> On 08/05/2022 14:18, Bob Brand wrote:
>> If you’ve stuck with me and read all this way, thank you and I hope
>> you can help me.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> Especially
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> What you need to do is revert the reshape. I know what may have happened,
> and what bothers me is your kernel version, 3.10.
>
> The first thing to try is to boot from up-to-date rescue media and see if an
> mdadm --revert works from there. If it does, your Centos should then bring
> everything back no problem.
>
> (You've currently got what I call a Frankensetup, a very old kernel, a
> pretty new mdadm, and a whole bunch of patches that does who knows what.
> You really need a matching kernel and mdadm, and your frankenkernel won't
> match anything ...)
>
> Let us know how that goes ...
>
> Cheers,
> Wol
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-08 22:15 ` Wol
@ 2022-05-08 22:19 ` Bob Brand
2022-05-08 23:02 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-08 22:19 UTC (permalink / raw)
To: Wol, linux-raid; +Cc: Phil Turmel
OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to
be the most recent they have.
-----Original Message-----
From: Wol <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 8:16 AM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: Re: Failed adadm RAID array after aborted Grown operation
How old is CentOS 7? With that kernel I guess it's quite old?
Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using the
latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd try
starting it again with the rescue disk. It'll probably run fine. Let it
complete and then your old CentOS 7 will be fine with it.
Cheers,
Wol
On 08/05/2022 23:04, Bob Brand wrote:
> Thank Wol.
>
> Should I use a CentOS 7 disk or a CentOS disk?
>
> Thanks
>
> -----Original Message-----
> From: Wols Lists <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 1:32 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> On 08/05/2022 14:18, Bob Brand wrote:
>> If you’ve stuck with me and read all this way, thank you and I hope
>> you can help me.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> Especially
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> What you need to do is revert the reshape. I know what may have
> happened, and what bothers me is your kernel version, 3.10.
>
> The first thing to try is to boot from up-to-date rescue media and see
> if an mdadm --revert works from there. If it does, your Centos should
> then bring everything back no problem.
>
> (You've currently got what I call a Frankensetup, a very old kernel, a
> pretty new mdadm, and a whole bunch of patches that does who knows what.
> You really need a matching kernel and mdadm, and your frankenkernel
> won't match anything ...)
>
> Let us know how that goes ...
>
> Cheers,
> Wol
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
> click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-08 22:19 ` Bob Brand
@ 2022-05-08 23:02 ` Bob Brand
2022-05-08 23:32 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-08 23:02 UTC (permalink / raw)
To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel
Hi Wol,
I've booted to the installation media and I've run the following command:
mdadm
/dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
--verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
/dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
--verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
But I'm still getting the error:
mdadm: /dev/md125 has an active reshape - checking if critical section needs
to be restored
mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Should I try the --invalid_backup switch or --force?
Thanks,
Bob
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 8:19 AM
To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to
be the most recent they have.
-----Original Message-----
From: Wol <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 8:16 AM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: Re: Failed adadm RAID array after aborted Grown operation
How old is CentOS 7? With that kernel I guess it's quite old?
Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using the
latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd try
starting it again with the rescue disk. It'll probably run fine. Let it
complete and then your old CentOS 7 will be fine with it.
Cheers,
Wol
On 08/05/2022 23:04, Bob Brand wrote:
> Thank Wol.
>
> Should I use a CentOS 7 disk or a CentOS disk?
>
> Thanks
>
> -----Original Message-----
> From: Wols Lists <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 1:32 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> On 08/05/2022 14:18, Bob Brand wrote:
>> If you’ve stuck with me and read all this way, thank you and I hope
>> you can help me.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> Especially
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> What you need to do is revert the reshape. I know what may have
> happened, and what bothers me is your kernel version, 3.10.
>
> The first thing to try is to boot from up-to-date rescue media and see
> if an mdadm --revert works from there. If it does, your Centos should
> then bring everything back no problem.
>
> (You've currently got what I call a Frankensetup, a very old kernel, a
> pretty new mdadm, and a whole bunch of patches that does who knows what.
> You really need a matching kernel and mdadm, and your frankenkernel
> won't match anything ...)
>
> Let us know how that goes ...
>
> Cheers,
> Wol
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
> click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-08 23:02 ` Bob Brand
@ 2022-05-08 23:32 ` Bob Brand
2022-05-09 0:09 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-08 23:32 UTC (permalink / raw)
To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel
I just tried it again with the --invalid_backup switch and it's now showing
the State as "clean, degraded".and it's showing all the disks except for the
suspect one that I removed.
I'm unable to mount it and see the contents. I get the error "mount:
/dev/md125: can't read superblock."
Is there more that I need to do?
Thanks
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 9:02 AM
To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
Hi Wol,
I've booted to the installation media and I've run the following command:
mdadm
/dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
--verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
/dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
--verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
But I'm still getting the error:
mdadm: /dev/md125 has an active reshape - checking if critical section needs
to be restored
mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Should I try the --invalid_backup switch or --force?
Thanks,
Bob
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 8:19 AM
To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to
be the most recent they have.
-----Original Message-----
From: Wol <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 8:16 AM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: Re: Failed adadm RAID array after aborted Grown operation
How old is CentOS 7? With that kernel I guess it's quite old?
Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using the
latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd try
starting it again with the rescue disk. It'll probably run fine. Let it
complete and then your old CentOS 7 will be fine with it.
Cheers,
Wol
On 08/05/2022 23:04, Bob Brand wrote:
> Thank Wol.
>
> Should I use a CentOS 7 disk or a CentOS disk?
>
> Thanks
>
> -----Original Message-----
> From: Wols Lists <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 1:32 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> On 08/05/2022 14:18, Bob Brand wrote:
>> If you’ve stuck with me and read all this way, thank you and I hope
>> you can help me.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> Especially
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> What you need to do is revert the reshape. I know what may have
> happened, and what bothers me is your kernel version, 3.10.
>
> The first thing to try is to boot from up-to-date rescue media and see
> if an mdadm --revert works from there. If it does, your Centos should
> then bring everything back no problem.
>
> (You've currently got what I call a Frankensetup, a very old kernel, a
> pretty new mdadm, and a whole bunch of patches that does who knows what.
> You really need a matching kernel and mdadm, and your frankenkernel
> won't match anything ...)
>
> Let us know how that goes ...
>
> Cheers,
> Wol
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
> click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-08 23:32 ` Bob Brand
@ 2022-05-09 0:09 ` Bob Brand
2022-05-09 6:52 ` Wols Lists
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-09 0:09 UTC (permalink / raw)
To: Bob Brand, Wol, linux-raid; +Cc: Phil Turmel
Hi Wol,
My apologies for continually bothering you but I have a couple of questions:
1. How do I overcome the error message "mount: /dev/md125: can't read
superblock." Do it use fsck?
2. The removed disk is showing as " - 0 0 30 removed". Is it safe
to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
overcome this?
Thank you!
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 9:33 AM
To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
I just tried it again with the --invalid_backup switch and it's now showing
the State as "clean, degraded".and it's showing all the disks except for the
suspect one that I removed.
I'm unable to mount it and see the contents. I get the error "mount:
/dev/md125: can't read superblock."
Is there more that I need to do?
Thanks
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 9:02 AM
To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
Hi Wol,
I've booted to the installation media and I've run the following command:
mdadm
/dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
--verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
/dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
--verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
But I'm still getting the error:
mdadm: /dev/md125 has an active reshape - checking if critical section needs
to be restored
mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Should I try the --invalid_backup switch or --force?
Thanks,
Bob
-----Original Message-----
From: Bob Brand <brand@wmawater.com.au>
Sent: Monday, 9 May 2022 8:19 AM
To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: RE: Failed adadm RAID array after aborted Grown operation
OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to
be the most recent they have.
-----Original Message-----
From: Wol <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 8:16 AM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>
Subject: Re: Failed adadm RAID array after aborted Grown operation
How old is CentOS 7? With that kernel I guess it's quite old?
Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using the
latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd try
starting it again with the rescue disk. It'll probably run fine. Let it
complete and then your old CentOS 7 will be fine with it.
Cheers,
Wol
On 08/05/2022 23:04, Bob Brand wrote:
> Thank Wol.
>
> Should I use a CentOS 7 disk or a CentOS disk?
>
> Thanks
>
> -----Original Message-----
> From: Wols Lists <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 1:32 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> On 08/05/2022 14:18, Bob Brand wrote:
>> If you’ve stuck with me and read all this way, thank you and I hope
>> you can help me.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> Especially
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> What you need to do is revert the reshape. I know what may have
> happened, and what bothers me is your kernel version, 3.10.
>
> The first thing to try is to boot from up-to-date rescue media and see
> if an mdadm --revert works from there. If it does, your Centos should
> then bring everything back no problem.
>
> (You've currently got what I call a Frankensetup, a very old kernel, a
> pretty new mdadm, and a whole bunch of patches that does who knows what.
> You really need a matching kernel and mdadm, and your frankenkernel
> won't match anything ...)
>
> Let us know how that goes ...
>
> Cheers,
> Wol
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
> click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-09 0:09 ` Bob Brand
@ 2022-05-09 6:52 ` Wols Lists
2022-05-09 13:07 ` Bob Brand
[not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>
0 siblings, 2 replies; 23+ messages in thread
From: Wols Lists @ 2022-05-09 6:52 UTC (permalink / raw)
To: Bob Brand, linux-raid; +Cc: Phil Turmel, NeilBrown
On 09/05/2022 01:09, Bob Brand wrote:
> Hi Wol,
>
> My apologies for continually bothering you but I have a couple of questions:
Did you read the links I sent you?
>
> 1. How do I overcome the error message "mount: /dev/md125: can't read
> superblock." Do it use fsck?
>
> 2. The removed disk is showing as " - 0 0 30 removed". Is it safe
> to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
> overcome this?
I don't know :-( This is getting a bit out of my depth. But I'm
SERIOUSLY concerned you're still futzing about with CentOS 7!!!
Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or
the latest Fedora? Why didn't you download SUSE SLES 15?
Any and all CentOS 7 will come with either an out-of-date mdadm, or a
Frankenkernel. NEITHER are a good idea.
Go back to the links I gave you, download and run lsdrv, and post the
output here. Hopefully somebody will tell you the next steps. I will do
my best.
>
> Thank you!
>
Cheers,
Wol
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:33 AM
> To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
> linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> I just tried it again with the --invalid_backup switch and it's now showing
> the State as "clean, degraded".and it's showing all the disks except for the
> suspect one that I removed.
>
> I'm unable to mount it and see the contents. I get the error "mount:
> /dev/md125: can't read superblock."
>
> Is there more that I need to do?
>
> Thanks
>
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:02 AM
> To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
> linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> Hi Wol,
>
> I've booted to the installation media and I've run the following command:
>
> mdadm
> /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
> --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
> /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
> --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
>
> But I'm still getting the error:
>
> mdadm: /dev/md125 has an active reshape - checking if critical section needs
> to be restored
> mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Should I try the --invalid_backup switch or --force?
>
> Thanks,
> Bob
>
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 8:19 AM
> To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that seems to
> be the most recent they have.
>
>
> -----Original Message-----
> From: Wol <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 8:16 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> How old is CentOS 7? With that kernel I guess it's quite old?
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
> doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are using the
> latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm), but
> almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd try
> starting it again with the rescue disk. It'll probably run fine. Let it
> complete and then your old CentOS 7 will be fine with it.
>
> Cheers,
> Wol
>
> On 08/05/2022 23:04, Bob Brand wrote:
>> Thank Wol.
>>
>> Should I use a CentOS 7 disk or a CentOS disk?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Wols Lists <antlists@youngman.org.uk>
>> Sent: Monday, 9 May 2022 1:32 AM
>> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
>> Cc: Phil Turmel <philip@turmel.org>
>> Subject: Re: Failed adadm RAID array after aborted Grown operation
>>
>> On 08/05/2022 14:18, Bob Brand wrote:
>>> If you’ve stuck with me and read all this way, thank you and I hope
>>> you can help me.
>>
>> https://raid.wiki.kernel.org/index.php/Linux_Raid
>>
>> Especially
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>>
>> What you need to do is revert the reshape. I know what may have
>> happened, and what bothers me is your kernel version, 3.10.
>>
>> The first thing to try is to boot from up-to-date rescue media and see
>> if an mdadm --revert works from there. If it does, your Centos should
>> then bring everything back no problem.
>>
>> (You've currently got what I call a Frankensetup, a very old kernel, a
>> pretty new mdadm, and a whole bunch of patches that does who knows what.
>> You really need a matching kernel and mdadm, and your frankenkernel
>> won't match anything ...)
>>
>> Let us know how that goes ...
>>
>> Cheers,
>> Wol
>>
>>
>>
>> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
>> click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-09 6:52 ` Wols Lists
@ 2022-05-09 13:07 ` Bob Brand
[not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>
1 sibling, 0 replies; 23+ messages in thread
From: Bob Brand @ 2022-05-09 13:07 UTC (permalink / raw)
To: Wols Lists, linux-raid; +Cc: Phil Turmel, NeilBrown
Hi Wol,
I did read the links you sent, actually I'd already trawled through them
prior to subscribing to the mailing list. They're how I learned about the
mailing list.
It seems that the conventional version of CentOS 8.5 is no longer available,
there's just the CentOS 8 Streams version and I wasn't sure how it would go
with the old style of CentOS. To be honest it didn't occur to me to go with
another flavour of Linux, I just figured that I'd use CentOS to repair
CentOS.
Anyway, I did try using "mdadm /dev/md2 -r detached" and "mdadm /dev/md2 -r
failed" to remove the removed disk to no avail. I ended up using
"mdadm --grow /dev/md125 --array-size
218789036032 --backup-file=/mnt/sysimage/grow_md125_size_grow.bak --verbose"
followed by "mdadm --grow
/dev/md125 --raid-devices=30 --backup-file=/mnt/sysimage/grow_md125_grow_disks.bak
--verbose" and it seems to be working in that it is reshaping the array
although it is apparently going to take around 16,000 minutes (would that be
because we've about 200TB of data?).
My concern now is whether or not I'll still have the mount issue once it
finally completes the reshape. If it does mount OK, does that mean I'm good
to reboot it?
With regards to your comment about downloading lsdrv, I'll try and do that
although I'm having trouble configuring my DNS servers in the running rescue
disk OS. I could run lsblk but, from what I see of lsdrv, lsblk doesn't have
the detail that lsdrv has. I'll keep working on that and let you know what I
get - it looks like I have to edit it to use the older version of Python
that this installation has.
Cheers,
Bob
-----Original Message-----
From: Wols Lists <antlists@youngman.org.uk>
Sent: Monday, 9 May 2022 4:52 PM
To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
Cc: Phil Turmel <philip@turmel.org>; NeilBrown <neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
On 09/05/2022 01:09, Bob Brand wrote:
> Hi Wol,
>
> My apologies for continually bothering you but I have a couple of
> questions:
Did you read the links I sent you?
>
> 1. How do I overcome the error message "mount: /dev/md125: can't read
> superblock." Do it use fsck?
>
> 2. The removed disk is showing as " - 0 0 30 removed". Is it
> safe
> to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
> overcome this?
I don't know :-( This is getting a bit out of my depth. But I'm SERIOUSLY
concerned you're still futzing about with CentOS 7!!!
Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or the
latest Fedora? Why didn't you download SUSE SLES 15?
Any and all CentOS 7 will come with either an out-of-date mdadm, or a
Frankenkernel. NEITHER are a good idea.
Go back to the links I gave you, download and run lsdrv, and post the output
here. Hopefully somebody will tell you the next steps. I will do my best.
>
> Thank you!
>
Cheers,
Wol
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:33 AM
> To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
> linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> I just tried it again with the --invalid_backup switch and it's now
> showing the State as "clean, degraded".and it's showing all the disks
> except for the suspect one that I removed.
>
> I'm unable to mount it and see the contents. I get the error "mount:
> /dev/md125: can't read superblock."
>
> Is there more that I need to do?
>
> Thanks
>
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:02 AM
> To: Bob Brand <brand@wmawater.com.au>; Wol <antlists@youngman.org.uk>;
> linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> Hi Wol,
>
> I've booted to the installation media and I've run the following command:
>
> mdadm
> /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
> --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
> /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
> --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
>
> But I'm still getting the error:
>
> mdadm: /dev/md125 has an active reshape - checking if critical section
> needs to be restored
> mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Should I try the --invalid_backup switch or --force?
>
> Thanks,
> Bob
>
>
> -----Original Message-----
> From: Bob Brand <brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 8:19 AM
> To: Wol <antlists@youngman.org.uk>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> OK. I've downloaded a Centos 7 - 2009 ISO from centos.org - that
> seems to be the most recent they have.
>
>
> -----Original Message-----
> From: Wol <antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 8:16 AM
> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
> Cc: Phil Turmel <philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> How old is CentOS 7? With that kernel I guess it's quite old?
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of
> linux doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are
> using the latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm),
> but almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd
> try starting it again with the rescue disk. It'll probably run fine.
> Let it complete and then your old CentOS 7 will be fine with it.
>
> Cheers,
> Wol
>
> On 08/05/2022 23:04, Bob Brand wrote:
>> Thank Wol.
>>
>> Should I use a CentOS 7 disk or a CentOS disk?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Wols Lists <antlists@youngman.org.uk>
>> Sent: Monday, 9 May 2022 1:32 AM
>> To: Bob Brand <brand@wmawater.com.au>; linux-raid@vger.kernel.org
>> Cc: Phil Turmel <philip@turmel.org>
>> Subject: Re: Failed adadm RAID array after aborted Grown operation
>>
>> On 08/05/2022 14:18, Bob Brand wrote:
>>> If you’ve stuck with me and read all this way, thank you and I hope
>>> you can help me.
>>
>> https://raid.wiki.kernel.org/index.php/Linux_Raid
>>
>> Especially
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrog
>> n
>>
>> What you need to do is revert the reshape. I know what may have
>> happened, and what bothers me is your kernel version, 3.10.
>>
>> The first thing to try is to boot from up-to-date rescue media and
>> see if an mdadm --revert works from there. If it does, your Centos
>> should then bring everything back no problem.
>>
>> (You've currently got what I call a Frankensetup, a very old kernel,
>> a pretty new mdadm, and a whole bunch of patches that does who knows
>> what.
>> You really need a matching kernel and mdadm, and your frankenkernel
>> won't match anything ...)
>>
>> Let us know how that goes ...
>>
>> Cheers,
>> Wol
>>
>>
>>
>> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
>> click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
> click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
>
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
[not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>
@ 2022-05-11 5:39 ` Bob Brand
2022-05-11 12:35 ` Reindl Harald
2022-05-20 15:13 ` Bob Brand
1 sibling, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-11 5:39 UTC (permalink / raw)
To: Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Thanks Roger.
My apologies for not replying earlier. By the time I read this I already
had a reshape underway to reduce the size of the array back to the original
30 disks. So far it seems to be progressing OK although the ETA is around
10 days which is why I didn’t respond sooner – I’ve been bury dealing with
the fallout from this.
Do I understand that you would recommend upgrading our installation of Linux
once the repair is complete or are advising downloading and compiling a new
kernel as part of the repair? Or are you suggesting that it was the fact
that we’re on such an old version of CentOS that caused this mess? I ask
because once this is repaired (assuming it does complete successfully), I
would like to extend the array to the full 45 drives of which this server is
capable
Thanks,
Bob
From: Roger Heflin <rogerheflin@gmail.com>
Sent: Monday, 9 May 2022 9:05 PM
To: Wols Lists <antlists@youngman.org.uk>
Cc: Bob Brand <brand@wmawater.com.au>; Linux RAID
<linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown
<neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
The short term easiest way for a new kernel might be this.
Download a Fedora 35 livecd and boot from it. It will allow you to turn on
the raid and/or reshape the raid and/or abort the reshape using the fedora
35 kernel and mdadm tools. Though all of this will need to be done
manually from either the gui and/or command line, so it will be somewhat of
a pain.
The other choice is to download/compile/install a current http://kernel.org
kernel. This takes some time (you have to install compiler/header rpms),
and follow this
(https://docs.rockylinux.org/guides/custom-linux-kernel/)--rockylinux so a
redhat clone list of instructions. How long it takes will depend on the
number of cpus your machine has and the value after the -j<cpustouse>.
The biggest issue with this will likely be dealing with compile errors for
missing dependencies you get for this or that tool and/or devel package
being missing. And then you would still need to download the newest mdadm
and compile and install it. These steps will take longer, but doing this
will get your system on a new kernel and new tools, and typically once you
know how to do this, this process of compiling/installing a kernel has for
the most part not changed in a long time. And I have been doing this on and
off for 20+ years and newer kernel on older userspace is widely used by a
lot of the kernel developers so is generally well testing and in my
experience just works to get you on a new kernel with minimal trouble.
On Mon, May 9, 2022 at 5:24 AM Wols Lists <mailto:antlists@youngman.org.uk>
wrote:
On 09/05/2022 01:09, Bob Brand wrote:
> Hi Wol,
>
> My apologies for continually bothering you but I have a couple of
> questions:
Did you read the links I sent you?
>
> 1. How do I overcome the error message "mount: /dev/md125: can't read
> superblock." Do it use fsck?
>
> 2. The removed disk is showing as " - 0 0 30 removed". Is it
> safe
> to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
> overcome this?
I don't know :-( This is getting a bit out of my depth. But I'm
SERIOUSLY concerned you're still futzing about with CentOS 7!!!
Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or
the latest Fedora? Why didn't you download SUSE SLES 15?
Any and all CentOS 7 will come with either an out-of-date mdadm, or a
Frankenkernel. NEITHER are a good idea.
Go back to the links I gave you, download and run lsdrv, and post the
output here. Hopefully somebody will tell you the next steps. I will do
my best.
>
> Thank you!
>
Cheers,
Wol
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:33 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>; Wol
> <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> I just tried it again with the --invalid_backup switch and it's now
> showing
> the State as "clean, degraded".and it's showing all the disks except for
> the
> suspect one that I removed.
>
> I'm unable to mount it and see the contents. I get the error "mount:
> /dev/md125: can't read superblock."
>
> Is there more that I need to do?
>
> Thanks
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:02 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>; Wol
> <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> Hi Wol,
>
> I've booted to the installation media and I've run the following command:
>
> mdadm
> /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
> --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
> /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
> --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
>
> But I'm still getting the error:
>
> mdadm: /dev/md125 has an active reshape - checking if critical section
> needs
> to be restored
> mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Should I try the --invalid_backup switch or --force?
>
> Thanks,
> Bob
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 8:19 AM
> To: Wol <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> OK. I've downloaded a Centos 7 - 2009 ISO from http://centos.org - that
> seems to
> be the most recent they have.
>
>
> -----Original Message-----
> From: Wol <mailto:antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 8:16 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> How old is CentOS 7? With that kernel I guess it's quite old?
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
> doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are using
> the
> latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm), but
> almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd try
> starting it again with the rescue disk. It'll probably run fine. Let it
> complete and then your old CentOS 7 will be fine with it.
>
> Cheers,
> Wol
>
> On 08/05/2022 23:04, Bob Brand wrote:
>> Thank Wol.
>>
>> Should I use a CentOS 7 disk or a CentOS disk?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Wols Lists <mailto:antlists@youngman.org.uk>
>> Sent: Monday, 9 May 2022 1:32 AM
>> To: Bob Brand <mailto:brand@wmawater.com.au>;
>> mailto:linux-raid@vger.kernel.org
>> Cc: Phil Turmel <mailto:philip@turmel.org>
>> Subject: Re: Failed adadm RAID array after aborted Grown operation
>>
>> On 08/05/2022 14:18, Bob Brand wrote:
>>> If you’ve stuck with me and read all this way, thank you and I hope
>>> you can help me.
>>
>> https://raid.wiki.kernel.org/index.php/Linux_Raid
>>
>> Especially
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>>
>> What you need to do is revert the reshape. I know what may have
>> happened, and what bothers me is your kernel version, 3.10.
>>
>> The first thing to try is to boot from up-to-date rescue media and see
>> if an mdadm --revert works from there. If it does, your Centos should
>> then bring everything back no problem.
>>
>> (You've currently got what I call a Frankensetup, a very old kernel, a
>> pretty new mdadm, and a whole bunch of patches that does who knows what.
>> You really need a matching kernel and mdadm, and your frankenkernel
>> won't match anything ...)
>>
>> Let us know how that goes ...
>>
>> Cheers,
>> Wol
>>
>>
>>
>> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
>> click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-11 5:39 ` Bob Brand
@ 2022-05-11 12:35 ` Reindl Harald
2022-05-11 13:22 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Reindl Harald @ 2022-05-11 12:35 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 11.05.22 um 07:39 schrieb Bob Brand:
> Do I understand that you would recommend upgrading our installation of Linux
> once the repair is complete or are advising downloading and compiling a new
> kernel as part of the repair? Or are you suggesting that it was the fact
> that we’re on such an old version of CentOS that caused this mess? I ask
> because once this is repaired (assuming it does complete successfully), I
> would like to extend the array to the full 45 drives of which this server is
> capable
you where adivised doing thatg with a live-iso of whatever distribution
with a recent kernel and recent mdadm and leave your installed os alone
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-11 12:35 ` Reindl Harald
@ 2022-05-11 13:22 ` Bob Brand
2022-05-11 14:56 ` Reindl Harald
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-11 13:22 UTC (permalink / raw)
To: Reindl Harald, Roger Heflin, Wols Lists
Cc: Linux RAID, Phil Turmel, NeilBrown
Sorry Reindl. I'm not sure I understand. Are you saying I did or didn't do
the right thing in booting from a CentOS rescue disk? At the moment it's
running from the rescue disk and, be it the best distro to have used (or
not), I would imagine that I need to keep running from the rescue disk until
the reshape is complete as rebooting in the middle of a reshape is what got
me in this mess.
Thanks
-----Original Message-----
From: Reindl Harald <h.reindl@thelounge.net>
Sent: Wednesday, 11 May 2022 10:36 PM
To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>;
Wols Lists <antlists@youngman.org.uk>
Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel
<philip@turmel.org>; NeilBrown <neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
Am 11.05.22 um 07:39 schrieb Bob Brand:
> Do I understand that you would recommend upgrading our installation of
> Linux once the repair is complete or are advising downloading and
> compiling a new kernel as part of the repair? Or are you suggesting
> that it was the fact that we’re on such an old version of CentOS that
> caused this mess? I ask because once this is repaired (assuming it
> does complete successfully), I would like to extend the array to the
> full 45 drives of which this server is capable
you where adivised doing thatg with a live-iso of whatever distribution with
a recent kernel and recent mdadm and leave your installed os alone
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-11 13:22 ` Bob Brand
@ 2022-05-11 14:56 ` Reindl Harald
2022-05-11 14:59 ` Reindl Harald
0 siblings, 1 reply; 23+ messages in thread
From: Reindl Harald @ 2022-05-11 14:56 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 11.05.22 um 15:22 schrieb Bob Brand:
> Sorry Reindl. I'm not sure I understand. Are you saying I did or didn't do
> the right thing in booting from a CentOS rescue disk? At the moment it's
> running from the rescue disk and, be it the best distro to have used (or
> not), I would imagine that I need to keep running from the rescue disk until
> the reshape is complete as rebooting in the middle of a reshape is what got
> me in this mess.
and i don't understand what you did not understand in the clear response
below you got days ago!
due reshape you where advised use whatever rescue/live system with a
recent kernel and mdadm, not more and not less
just to avoid probaly long fixed bugs in your old kernel
---------------------
Try and get a CentOS 8.5 disk. At the end of the day, the version of
linux doesn't matter. What you need is an up-to-date rescue disk.
Distro/whatever is unimportant - what IS important is that you are using
the latest mdadm, and a kernel that matches.
The problem you have sounds like a long-standing but now-fixed bug. An
original CentOS disk might be okay (with matched kernel and mdadm), but
almost certainly has what I consider to be a "dodgy" version of mdadm.
If you can afford the downtime, after you've reverted the reshape, I'd
try starting it again with the rescue disk. It'll probably run fine. Let
it complete and then your old CentOS 7 will be fine with it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-11 14:56 ` Reindl Harald
@ 2022-05-11 14:59 ` Reindl Harald
2022-05-13 5:32 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Reindl Harald @ 2022-05-11 14:59 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 11.05.22 um 16:56 schrieb Reindl Harald:
>
>
> Am 11.05.22 um 15:22 schrieb Bob Brand:
>> Sorry Reindl. I'm not sure I understand. Are you saying I did or
>> didn't do
>> the right thing in booting from a CentOS rescue disk? At the moment it's
>> running from the rescue disk and, be it the best distro to have used (or
>> not), I would imagine that I need to keep running from the rescue disk
>> until
>> the reshape is complete as rebooting in the middle of a reshape is
>> what got
>> me in this mess.
and nowhere did i say reboot now
and i only responded to your "Do I understand that you would recommend
upgrading our installation of Linux once the repair is complete or are
advising downloading and compiling a new kernel as part of the repair?"
nobody said that - the only point was use a as recent kernel as possible
with all rgow/reshape operations
> and i don't understand what you did not understand in the clear response
> below you got days ago!
>
> due reshape you where advised use whatever rescue/live system with a
> recent kernel and mdadm, not more and not less
>
> just to avoid probaly long fixed bugs in your old kernel
>
> ---------------------
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of
> linux doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are using
> the latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm), but
> almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd
> try starting it again with the rescue disk. It'll probably run fine. Let
> it complete and then your old CentOS 7 will be fine with it
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-11 14:59 ` Reindl Harald
@ 2022-05-13 5:32 ` Bob Brand
2022-05-13 8:18 ` Reindl Harald
0 siblings, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-13 5:32 UTC (permalink / raw)
To: Reindl Harald, Roger Heflin, Wols Lists
Cc: Linux RAID, Phil Turmel, NeilBrown
This may not be the forum to ask this but what exactly is "compiling the
kernel". From what I've been reading, it sounds like a somewhat involved and
complex process - is it? Is compiling a new kernel the same as upgrading the
OS? I'm getting the impression that it sort of is but sort of isn't. Is it
possible to compile a kernel for a rescue CD (from the comments I've read,
it is possible)? If I were to compile a new kernel, would I expect the
version number for the kernel and mdadm to be the same? Sorry for all the
question but, as I said at the outset, a lot of this is all very new to me.
Thank you,
Bob
-----Original Message-----
From: Reindl Harald <h.reindl@thelounge.net>
Sent: Thursday, 12 May 2022 12:59 AM
To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>;
Wols Lists <antlists@youngman.org.uk>
Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel
<philip@turmel.org>; NeilBrown <neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
Am 11.05.22 um 16:56 schrieb Reindl Harald:
>
>
> Am 11.05.22 um 15:22 schrieb Bob Brand:
>> Sorry Reindl. I'm not sure I understand. Are you saying I did or
>> didn't do the right thing in booting from a CentOS rescue disk? At
>> the moment it's running from the rescue disk and, be it the best
>> distro to have used (or not), I would imagine that I need to keep
>> running from the rescue disk until the reshape is complete as
>> rebooting in the middle of a reshape is what got me in this mess.
and nowhere did i say reboot now
and i only responded to your "Do I understand that you would recommend
upgrading our installation of Linux once the repair is complete or are
advising downloading and compiling a new kernel as part of the repair?"
nobody said that - the only point was use a as recent kernel as possible
with all rgow/reshape operations
> and i don't understand what you did not understand in the clear
> response below you got days ago!
>
> due reshape you where advised use whatever rescue/live system with a
> recent kernel and mdadm, not more and not less
>
> just to avoid probaly long fixed bugs in your old kernel
>
> ---------------------
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of
> linux doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are
> using the latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm),
> but almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd
> try starting it again with the rescue disk. It'll probably run fine.
> Let it complete and then your old CentOS 7 will be fine with it
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-13 5:32 ` Bob Brand
@ 2022-05-13 8:18 ` Reindl Harald
0 siblings, 0 replies; 23+ messages in thread
From: Reindl Harald @ 2022-05-13 8:18 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 13.05.22 um 07:32 schrieb Bob Brand:
> This may not be the forum to ask
it is because you can type that sort of questions also into google and
there isn't a good reason to build your own kernel in 2022 for most usecases
> this but what exactly is "compiling the
> kernel". From what I've been reading, it sounds like a somewhat involved and
> complex process - is it? Is compiling a new kernel the same as upgrading the
> OS? I'm getting the impression that it sort of is but sort of isn't. Is it
> possible to compile a kernel for a rescue CD (from the comments I've read,
> it is possible)? If I were to compile a new kernel, would I expect the
> version number for the kernel and mdadm to be the same? Sorry for all the
> question but, as I said at the outset, a lot of this is all very new to me.
don't get me wrong but "Is compiling a new kernel the same as upgrading
the OS" and "what exactly is "compiling the kernel" implies just use a
binary distribution when it sounds that you even don't know what compile
software from source means
> -----Original Message-----
> From: Reindl Harald <h.reindl@thelounge.net>
> Sent: Thursday, 12 May 2022 12:59 AM
> To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>;
> Wols Lists <antlists@youngman.org.uk>
> Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel
> <philip@turmel.org>; NeilBrown <neilb@suse.com>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
>
>
> Am 11.05.22 um 16:56 schrieb Reindl Harald:
>>
>>
>> Am 11.05.22 um 15:22 schrieb Bob Brand:
>>> Sorry Reindl. I'm not sure I understand. Are you saying I did or
>>> didn't do the right thing in booting from a CentOS rescue disk? At
>>> the moment it's running from the rescue disk and, be it the best
>>> distro to have used (or not), I would imagine that I need to keep
>>> running from the rescue disk until the reshape is complete as
>>> rebooting in the middle of a reshape is what got me in this mess.
>
> and nowhere did i say reboot now
>
> and i only responded to your "Do I understand that you would recommend
> upgrading our installation of Linux once the repair is complete or are
> advising downloading and compiling a new kernel as part of the repair?"
>
> nobody said that - the only point was use a as recent kernel as possible
> with all rgow/reshape operations
>
>> and i don't understand what you did not understand in the clear
>> response below you got days ago!
>>
>> due reshape you where advised use whatever rescue/live system with a
>> recent kernel and mdadm, not more and not less
>>
>> just to avoid probaly long fixed bugs in your old kernel
>>
>> ---------------------
>>
>> Try and get a CentOS 8.5 disk. At the end of the day, the version of
>> linux doesn't matter. What you need is an up-to-date rescue disk.
>> Distro/whatever is unimportant - what IS important is that you are
>> using the latest mdadm, and a kernel that matches.
>>
>> The problem you have sounds like a long-standing but now-fixed bug. An
>> original CentOS disk might be okay (with matched kernel and mdadm),
>> but almost certainly has what I consider to be a "dodgy" version of mdadm.
>>
>> If you can afford the downtime, after you've reverted the reshape, I'd
>> try starting it again with the rescue disk. It'll probably run fine.
>> Let it complete and then your old CentOS 7 will be fine with it
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
[not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>
2022-05-11 5:39 ` Bob Brand
@ 2022-05-20 15:13 ` Bob Brand
2022-05-20 15:41 ` Reindl Harald
1 sibling, 1 reply; 23+ messages in thread
From: Bob Brand @ 2022-05-20 15:13 UTC (permalink / raw)
To: Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
UPDATE:
The array finally finished the reshape process (after almost two weeks!) and
I now have an array that's showing as clean with the original 30 disks.
However, when I try to mount it, I get the message "mount: /dev/md125: can't
read superblock".
Any suggestions as to what my next step should be? Note: it's still running
from the rescue disk.
Thank you,
Bob
From: Roger Heflin <rogerheflin@gmail.com>
Sent: Monday, 9 May 2022 9:05 PM
To: Wols Lists <antlists@youngman.org.uk>
Cc: Bob Brand <brand@wmawater.com.au>; Linux RAID
<linux-raid@vger.kernel.org>; Phil Turmel <philip@turmel.org>; NeilBrown
<neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
The short term easiest way for a new kernel might be this.
Download a Fedora 35 livecd and boot from it. It will allow you to turn on
the raid and/or reshape the raid and/or abort the reshape using the fedora
35 kernel and mdadm tools. Though all of this will need to be done
manually from either the gui and/or command line, so it will be somewhat of
a pain.
The other choice is to download/compile/install a current http://kernel.org
kernel. This takes some time (you have to install compiler/header rpms),
and follow this
(https://docs.rockylinux.org/guides/custom-linux-kernel/)--rockylinux so a
redhat clone list of instructions. How long it takes will depend on the
number of cpus your machine has and the value after the -j<cpustouse>.
The biggest issue with this will likely be dealing with compile errors for
missing dependencies you get for this or that tool and/or devel package
being missing. And then you would still need to download the newest mdadm
and compile and install it. These steps will take longer, but doing this
will get your system on a new kernel and new tools, and typically once you
know how to do this, this process of compiling/installing a kernel has for
the most part not changed in a long time. And I have been doing this on and
off for 20+ years and newer kernel on older userspace is widely used by a
lot of the kernel developers so is generally well testing and in my
experience just works to get you on a new kernel with minimal trouble.
On Mon, May 9, 2022 at 5:24 AM Wols Lists <mailto:antlists@youngman.org.uk>
wrote:
On 09/05/2022 01:09, Bob Brand wrote:
> Hi Wol,
>
> My apologies for continually bothering you but I have a couple of
> questions:
Did you read the links I sent you?
>
> 1. How do I overcome the error message "mount: /dev/md125: can't read
> superblock." Do it use fsck?
>
> 2. The removed disk is showing as " - 0 0 30 removed". Is it
> safe
> to use "mdadm /dev/md2 -r detached" or "mdadm /dev/md2 -r failed" to
> overcome this?
I don't know :-( This is getting a bit out of my depth. But I'm
SERIOUSLY concerned you're still futzing about with CentOS 7!!!
Why didn't you download CentOS 8.5? Why didn't you download RHEL 8.5, or
the latest Fedora? Why didn't you download SUSE SLES 15?
Any and all CentOS 7 will come with either an out-of-date mdadm, or a
Frankenkernel. NEITHER are a good idea.
Go back to the links I gave you, download and run lsdrv, and post the
output here. Hopefully somebody will tell you the next steps. I will do
my best.
>
> Thank you!
>
Cheers,
Wol
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:33 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>; Wol
> <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> I just tried it again with the --invalid_backup switch and it's now
> showing
> the State as "clean, degraded".and it's showing all the disks except for
> the
> suspect one that I removed.
>
> I'm unable to mount it and see the contents. I get the error "mount:
> /dev/md125: can't read superblock."
>
> Is there more that I need to do?
>
> Thanks
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 9:02 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>; Wol
> <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> Hi Wol,
>
> I've booted to the installation media and I've run the following command:
>
> mdadm
> /dev/md125 --assemble --update=revert-reshape --backup-file=/mnt/sysimage/grow_md125.bak
> --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19
> /dev/md125mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak
> --verbose --uuid=f9b65f55:5f257add:1140ccc0:46ca6c19
>
> But I'm still getting the error:
>
> mdadm: /dev/md125 has an active reshape - checking if critical section
> needs
> to be restored
> mdadm: No backup metadata on /mnt/sysimage/grow_md125.back
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Should I try the --invalid_backup switch or --force?
>
> Thanks,
> Bob
>
>
> -----Original Message-----
> From: Bob Brand <mailto:brand@wmawater.com.au>
> Sent: Monday, 9 May 2022 8:19 AM
> To: Wol <mailto:antlists@youngman.org.uk>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: RE: Failed adadm RAID array after aborted Grown operation
>
> OK. I've downloaded a Centos 7 - 2009 ISO from http://centos.org - that
> seems to
> be the most recent they have.
>
>
> -----Original Message-----
> From: Wol <mailto:antlists@youngman.org.uk>
> Sent: Monday, 9 May 2022 8:16 AM
> To: Bob Brand <mailto:brand@wmawater.com.au>;
> mailto:linux-raid@vger.kernel.org
> Cc: Phil Turmel <mailto:philip@turmel.org>
> Subject: Re: Failed adadm RAID array after aborted Grown operation
>
> How old is CentOS 7? With that kernel I guess it's quite old?
>
> Try and get a CentOS 8.5 disk. At the end of the day, the version of linux
> doesn't matter. What you need is an up-to-date rescue disk.
> Distro/whatever is unimportant - what IS important is that you are using
> the
> latest mdadm, and a kernel that matches.
>
> The problem you have sounds like a long-standing but now-fixed bug. An
> original CentOS disk might be okay (with matched kernel and mdadm), but
> almost certainly has what I consider to be a "dodgy" version of mdadm.
>
> If you can afford the downtime, after you've reverted the reshape, I'd try
> starting it again with the rescue disk. It'll probably run fine. Let it
> complete and then your old CentOS 7 will be fine with it.
>
> Cheers,
> Wol
>
> On 08/05/2022 23:04, Bob Brand wrote:
>> Thank Wol.
>>
>> Should I use a CentOS 7 disk or a CentOS disk?
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Wols Lists <mailto:antlists@youngman.org.uk>
>> Sent: Monday, 9 May 2022 1:32 AM
>> To: Bob Brand <mailto:brand@wmawater.com.au>;
>> mailto:linux-raid@vger.kernel.org
>> Cc: Phil Turmel <mailto:philip@turmel.org>
>> Subject: Re: Failed adadm RAID array after aborted Grown operation
>>
>> On 08/05/2022 14:18, Bob Brand wrote:
>>> If you’ve stuck with me and read all this way, thank you and I hope
>>> you can help me.
>>
>> https://raid.wiki.kernel.org/index.php/Linux_Raid
>>
>> Especially
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>>
>> What you need to do is revert the reshape. I know what may have
>> happened, and what bothers me is your kernel version, 3.10.
>>
>> The first thing to try is to boot from up-to-date rescue media and see
>> if an mdadm --revert works from there. If it does, your Centos should
>> then bring everything back no problem.
>>
>> (You've currently got what I call a Frankensetup, a very old kernel, a
>> pretty new mdadm, and a whole bunch of patches that does who knows what.
>> You really need a matching kernel and mdadm, and your frankenkernel
>> won't match anything ...)
>>
>> Let us know how that goes ...
>>
>> Cheers,
>> Wol
>>
>>
>>
>> CAUTION!!! This E-mail originated from outside of WMA Water. Do not
>> click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>
>
>
> CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-20 15:13 ` Bob Brand
@ 2022-05-20 15:41 ` Reindl Harald
2022-05-22 4:13 ` Bob Brand
0 siblings, 1 reply; 23+ messages in thread
From: Reindl Harald @ 2022-05-20 15:41 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 20.05.22 um 17:13 schrieb Bob Brand:
> UPDATE:
>
> The array finally finished the reshape process (after almost two weeks!) and
> I now have an array that's showing as clean with the original 30 disks.
> However, when I try to mount it, I get the message "mount: /dev/md125: can't
> read superblock".
>
> Any suggestions as to what my next step should be? Note: it's still running
> from the rescue disk
restore from a backup - the array is one thing, the filesystem is a
different layer and it seems to be heavily damag after all the things
which happened
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-20 15:41 ` Reindl Harald
@ 2022-05-22 4:13 ` Bob Brand
2022-05-22 11:25 ` Reindl Harald
2022-05-22 13:31 ` Wols Lists
0 siblings, 2 replies; 23+ messages in thread
From: Bob Brand @ 2022-05-22 4:13 UTC (permalink / raw)
To: Reindl Harald, Roger Heflin, Wols Lists
Cc: Linux RAID, Phil Turmel, NeilBrown
Thanks Reindl.
Is xfs_repair an option? And, if it is, do I run it on md125 or the
individual sd devices?
Unfortunately, restore from back up isn't an option - after all to where do
you back up 200TB of data? This storage was originally set up with the
understanding that it wasn't backed up and so no valuable data was supposed
to have been stored on it. Unfortunately, people being what they are,
valuable data has been stored there and I'm the mug now trying to get it
back - it's a system that I've inherited.
So, any help or constructive advice would be appreciated.
Thanks,
Bob
-----Original Message-----
From: Reindl Harald <h.reindl@thelounge.net>
Sent: Saturday, 21 May 2022 1:41 AM
To: Bob Brand <brand@wmawater.com.au>; Roger Heflin <rogerheflin@gmail.com>;
Wols Lists <antlists@youngman.org.uk>
Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel
<philip@turmel.org>; NeilBrown <neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
Am 20.05.22 um 17:13 schrieb Bob Brand:
> UPDATE:
>
> The array finally finished the reshape process (after almost two
> weeks!) and I now have an array that's showing as clean with the original
> 30 disks.
> However, when I try to mount it, I get the message "mount: /dev/md125:
> can't read superblock".
>
> Any suggestions as to what my next step should be? Note: it's still
> running from the rescue disk
restore from a backup - the array is one thing, the filesystem is a
different layer and it seems to be heavily damag after all the things which
happened
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-22 4:13 ` Bob Brand
@ 2022-05-22 11:25 ` Reindl Harald
2022-05-22 13:31 ` Wols Lists
1 sibling, 0 replies; 23+ messages in thread
From: Reindl Harald @ 2022-05-22 11:25 UTC (permalink / raw)
To: Bob Brand, Roger Heflin, Wols Lists; +Cc: Linux RAID, Phil Turmel, NeilBrown
Am 22.05.22 um 06:13 schrieb Bob Brand:
> Is xfs_repair an option?
unlikely in case the underlying device has all sort of damages - think
of the RAID like a single disk and how the filesystem reacts if you
shoot holes in it
> And, if it is, do I run it on md125 or the
> individual sd devices?
when you think two seconds it's obvious - is the filesystem on top of
single disks or on top of the whole RAID - the filesystem don't even
know about individual devices
> Unfortunately, restore from back up isn't an option - after all to where do
> you back up 200TB of data?
on a second machine in a different building - the inital sync is done
local and then no matter how large the data rsync is enough - the daily
delta don't vary just because the whole dataset is huge
and don't get me wrong but who starts a reshape on a 200 TB storage when
he knows that there is no backup?
> This storage was originally set up with the
> understanding that it wasn't backed up and so no valuable data was supposed
> to have been stored on it.
well, then i won't store it at all
> Unfortunately, people being what they are,
> valuable data has been stored there and I'm the mug now trying to get it
> back - it's a system that I've inherited.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed adadm RAID array after aborted Grown operation
2022-05-22 4:13 ` Bob Brand
2022-05-22 11:25 ` Reindl Harald
@ 2022-05-22 13:31 ` Wols Lists
2022-05-22 22:54 ` Bob Brand
1 sibling, 1 reply; 23+ messages in thread
From: Wols Lists @ 2022-05-22 13:31 UTC (permalink / raw)
To: Bob Brand, Reindl Harald, Roger Heflin; +Cc: Linux RAID, Phil Turmel, NeilBrown
On 22/05/2022 05:13, Bob Brand wrote:
> Unfortunately, restore from back up isn't an option - after all to where do
> you back up 200TB of data? This storage was originally set up with the
> understanding that it wasn't backed up and so no valuable data was supposed
> to have been stored on it. Unfortunately, people being what they are,
> valuable data has been stored there and I'm the mug now trying to get it
> back - it's a system that I've inherited.
>
> So, any help or constructive advice would be appreciated.
Unfortunately, about the only constructive advice I can give you is
"live and learn". I made a similar massive cock-up at the start of my
career, and I've always been excessively cautious about disks and data
ever since.
What your employer needs to take away from this - and no disrespect to
yourself - is that if they run a system that was probably supported for
about five years, then has been running on duck tape and baling wire for
a further ten years, DON'T give it to someone with pretty much NO
sysadmin or computer ops experience to carry out a potentially
disastrous operation like messing about with a raid array!
This is NOT a simple setup, and it seems clear to me that you have
little familiarity with the basic concepts. Unfortunately, your employer
was playing Russian Roulette, and the gun went off.
On a *personal* level, and especially if your employer wants you to
continue looking after their systems, they need to give you an (old?)
box with a bunch of disk drives. Go back to the raid website and look at
the article about building a new system. Take that system they've given
you, and use that article as a guide to build it from scratch. It's
actually about the computer being used right now to type this message.
I use(d) gentoo as my distro. It's a great distro, but for a newbie I
think it takes "throw them in at the deep end" to extremes. Go find
Slackware and start with that. It's not a "hold their hands and do
everything for them" distro, but nor is it a "here's the instructions,
if they don't work for you then you're on your own" distro. Once you've
got to grips with Slack, have a go at gentoo. And once you've managed to
get gentoo working, you should have a pretty decent grasp of what's
going "under the bonnet". CentOS/RedHat/SLES should be a breeze after that.
Cheers,
Wol
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: Failed adadm RAID array after aborted Grown operation
2022-05-22 13:31 ` Wols Lists
@ 2022-05-22 22:54 ` Bob Brand
0 siblings, 0 replies; 23+ messages in thread
From: Bob Brand @ 2022-05-22 22:54 UTC (permalink / raw)
To: Wols Lists, Reindl Harald, Roger Heflin
Cc: Linux RAID, Phil Turmel, NeilBrown
Thanks Wol.
I can't really disagree with anything you've said except to mention that I
do have a fair bit of experience (20+ years) but it's all been pretty much
Microsoft/Windows and hardware RAID.
Like I said this device was never meant to be used for critical data - if
nothing else this has been something of a wake-up call for us.
-----Original Message-----
From: Wols Lists <antlists@youngman.org.uk>
Sent: Sunday, 22 May 2022 11:31 PM
To: Bob Brand <brand@wmawater.com.au>; Reindl Harald
<h.reindl@thelounge.net>; Roger Heflin <rogerheflin@gmail.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>; Phil Turmel
<philip@turmel.org>; NeilBrown <neilb@suse.com>
Subject: Re: Failed adadm RAID array after aborted Grown operation
On 22/05/2022 05:13, Bob Brand wrote:
> Unfortunately, restore from back up isn't an option - after all to
> where do you back up 200TB of data? This storage was originally set up
> with the understanding that it wasn't backed up and so no valuable
> data was supposed to have been stored on it. Unfortunately, people
> being what they are, valuable data has been stored there and I'm the
> mug now trying to get it back - it's a system that I've inherited.
>
> So, any help or constructive advice would be appreciated.
Unfortunately, about the only constructive advice I can give you is "live
and learn". I made a similar massive cock-up at the start of my career, and
I've always been excessively cautious about disks and data ever since.
What your employer needs to take away from this - and no disrespect to
yourself - is that if they run a system that was probably supported for
about five years, then has been running on duck tape and baling wire for a
further ten years, DON'T give it to someone with pretty much NO sysadmin or
computer ops experience to carry out a potentially disastrous operation like
messing about with a raid array!
This is NOT a simple setup, and it seems clear to me that you have little
familiarity with the basic concepts. Unfortunately, your employer was
playing Russian Roulette, and the gun went off.
On a *personal* level, and especially if your employer wants you to continue
looking after their systems, they need to give you an (old?) box with a
bunch of disk drives. Go back to the raid website and look at the article
about building a new system. Take that system they've given you, and use
that article as a guide to build it from scratch. It's actually about the
computer being used right now to type this message.
I use(d) gentoo as my distro. It's a great distro, but for a newbie I think
it takes "throw them in at the deep end" to extremes. Go find Slackware and
start with that. It's not a "hold their hands and do everything for them"
distro, but nor is it a "here's the instructions, if they don't work for you
then you're on your own" distro. Once you've got to grips with Slack, have a
go at gentoo. And once you've managed to get gentoo working, you should have
a pretty decent grasp of what's going "under the bonnet". CentOS/RedHat/SLES
should be a breeze after that.
Cheers,
Wol
CAUTION!!! This E-mail originated from outside of WMA Water. Do not click
links or open attachments unless you recognize the sender and know the
content is safe.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-05-22 22:54 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08 13:18 Failed adadm RAID array after aborted Grown operation Bob Brand
2022-05-08 15:32 ` Wols Lists
2022-05-08 22:04 ` Bob Brand
2022-05-08 22:15 ` Wol
2022-05-08 22:19 ` Bob Brand
2022-05-08 23:02 ` Bob Brand
2022-05-08 23:32 ` Bob Brand
2022-05-09 0:09 ` Bob Brand
2022-05-09 6:52 ` Wols Lists
2022-05-09 13:07 ` Bob Brand
[not found] ` <CAAMCDecTb69YY+jGzq9HVqx4xZmdVGiRa54BD55Amcz5yaZo1Q@mail.gmail.com>
2022-05-11 5:39 ` Bob Brand
2022-05-11 12:35 ` Reindl Harald
2022-05-11 13:22 ` Bob Brand
2022-05-11 14:56 ` Reindl Harald
2022-05-11 14:59 ` Reindl Harald
2022-05-13 5:32 ` Bob Brand
2022-05-13 8:18 ` Reindl Harald
2022-05-20 15:13 ` Bob Brand
2022-05-20 15:41 ` Reindl Harald
2022-05-22 4:13 ` Bob Brand
2022-05-22 11:25 ` Reindl Harald
2022-05-22 13:31 ` Wols Lists
2022-05-22 22:54 ` Bob Brand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.