All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid-6 cannot reshape
@ 2020-04-01 10:16 Alexander Shenkin
       [not found] ` <6b9b6d37-6325-6515-f693-0ff3b641a67a@shenkin.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-01 10:16 UTC (permalink / raw)
  To: Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 1525 bytes --]

Hi all,

I had a problem that caused my Ubuntu Server 14 to go down, and it now
will not boot.  I added a drive to my raid1 (/dev/md0 -> /boot) + raid6
(/dev/md2 -> /) setup.  I was previously running with /dev/md0 (raid1)
assembled from /dev/sd[a-f]1, and /dev/md2 (raid6) assembled
/dev/sd[a-f]3.  Both partitions from the new drive were successfully
added to the two md arrays, and the raid1 (the smaller of the two)
resync'd successfully it seemed (I think that resync is the correct word
to use, meaning that mdadm is spreading out the data amongst the drives
once an array has been grown).  When resyncing the larger raid6,
however, the sync speed was quite slow (kb's/sec), and got slower and
slower (7kb/sec... 5kb/sec... 3kb/sec... 1kb/sec...) until the system
entirely halted and I eventually turned the power off.  It now will not
boot.

Thanks to Roger Heflin's help, I've booted into a livecd environment
(ubuntu server 18) with all the necessary raid personalities available.

When trying to --assemble md127 (raid6), i get the following error:
"Failure to restore critical section for reshape, sorry.  Perhaps you
needed to specify the --backup-file".  Needless to say, I didn't save a
backup file when adding the drive.

I have attached some diagnostic output here.  It seems that my raid6
array is not being recognized as such.  I'm not sure what the next steps
are - do I need to figure out how to get the resync up and running
again?  Any help would be greatly appreciated.

Many thanks,

Allie


[-- Attachment #2: mdstat.txt --]
[-- Type: text/plain, Size: 395 bytes --]

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md127 : inactive sdd3[5](S) sdf3[8](S) sdc3[2](S) sdg3[9](S) sde3[7](S) sdb3[4](S) sda3[6](S)
      20441322496 blocks super 1.2
       
md126 : active (auto-read-only) raid1 sdf1[8] sde1[7] sdg1[9] sda1[6] sdd1[5] sdc1[2] sdb1[4]
      1950656 blocks super 1.2 [7/7] [UUUUUUU]
      
unused devices: <none>

[-- Attachment #3: fdisk.txt --]
[-- Type: text/plain, Size: 6695 bytes --]

Disk /dev/loop0: 265.3 MiB, 278147072 bytes, 543256 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop1: 70.1 MiB, 73531392 bytes, 143616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop2: 49.7 MiB, 52060160 bytes, 101680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop3: 36 MiB, 37707776 bytes, 73648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop4: 27.4 MiB, 28717056 bytes, 56088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop5: 89.1 MiB, 93417472 bytes, 182456 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop6: 51.9 MiB, 54358016 bytes, 106168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop7: 51.9 MiB, 54407168 bytes, 106264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D607CD2D-05F4-42C6-914C-B99CE56E934F

Device          Start        End    Sectors  Size Type
/dev/sda1        2048    3905535    3903488  1.9G Linux RAID
/dev/sda2     3905536    3907583       2048    1M BIOS boot
/dev/sda3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sda4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D96D7513-3D74-435B-8B2F-26C2A32B0586

Device          Start        End    Sectors  Size Type
/dev/sdb1        2048    3905535    3903488  1.9G Linux RAID
/dev/sdb2     3905536    3907583       2048    1M BIOS boot
/dev/sdb3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sdb4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4B356AFA-8F48-4227-86F0-329565146D7A

Device          Start        End    Sectors  Size Type
/dev/sdc1        2048    3905535    3903488  1.9G Linux RAID
/dev/sdc2     3905536    3907583       2048    1M BIOS boot
/dev/sdc3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sdc4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D4D1E2EB-8520-4B3E-8263-AB5B5BD2D7E2

Device          Start        End    Sectors  Size Type
/dev/sdd1        2048    3905535    3903488  1.9G Linux RAID
/dev/sdd2     3905536    3907583       2048    1M BIOS boot
/dev/sdd3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sdd4  5844547584 5860532223   15984640  7.6G Linux filesystem






Disk /dev/sde: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 52F044AB-9D50-4363-BF60-651A87159A17

Device          Start        End    Sectors  Size Type
/dev/sde1        2048    3905535    3903488  1.9G Linux RAID
/dev/sde2     3905536    3907583       2048    1M BIOS boot
/dev/sde3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sde4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/sdg: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: BE2DDABE-9106-4436-8ABC-6A919EBFB2E6

Device          Start        End    Sectors  Size Type
/dev/sdg1        2048    3905535    3903488  1.9G Linux RAID
/dev/sdg2     3905536    3907583       2048    1M BIOS boot
/dev/sdg3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sdg4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/sdf: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 36615F7D-974F-4A0B-B79B-165D872EF418

Device          Start        End    Sectors  Size Type
/dev/sdf1        2048    3905535    3903488  1.9G Linux RAID
/dev/sdf2     3905536    3907583       2048    1M BIOS boot
/dev/sdf3     3907584 5844547583 5840640000  2.7T Linux RAID
/dev/sdf4  5844547584 5860532223   15984640  7.6G Linux filesystem


Disk /dev/md126: 1.9 GiB, 1997471744 bytes, 3901312 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdh: 29.2 GiB, 31376707072 bytes, 61282631 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00f674a6

Device     Boot Start      End  Sectors  Size Id Type
/dev/sdh1  *     2048 61282630 61280583 29.2G  c W95 FAT32 (LBA)


Disk /dev/sdi: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00e19bdc

Device     Boot Start      End  Sectors Size Id Type
/dev/sdi1  *     2048 31457279 31455232  15G  c W95 FAT32 (LBA)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
       [not found] ` <6b9b6d37-6325-6515-f693-0ff3b641a67a@shenkin.org>
@ 2020-04-06 15:27   ` Alexander Shenkin
  2020-04-06 16:12     ` Roger Heflin
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-06 15:27 UTC (permalink / raw)
  To: Linux-RAID



On 4/4/2020 9:19 AM, Alexander Shenkin wrote:
> On 4/1/2020 11:16 AM, Alexander Shenkin wrote:
>> Hi all,
>>
>> I had a problem that caused my Ubuntu Server 14 to go down, and it now
>> will not boot.  I added a drive to my raid1 (/dev/md0 -> /boot) + raid6
>> (/dev/md2 -> /) setup.  I was previously running with /dev/md0 (raid1)
>> assembled from /dev/sd[a-f]1, and /dev/md2 (raid6) assembled
>> /dev/sd[a-f]3.  Both partitions from the new drive were successfully
>> added to the two md arrays, and the raid1 (the smaller of the two)
>> resync'd successfully it seemed (I think that resync is the correct word
>> to use, meaning that mdadm is spreading out the data amongst the drives
>> once an array has been grown).  When resyncing the larger raid6,
>> however, the sync speed was quite slow (kb's/sec), and got slower and
>> slower (7kb/sec... 5kb/sec... 3kb/sec... 1kb/sec...) until the system
>> entirely halted and I eventually turned the power off.  It now will not
>> boot.
>>
>> Thanks to Roger Heflin's help, I've booted into a livecd environment
>> (ubuntu server 18) with all the necessary raid personalities available.
>>
>> When trying to --assemble md127 (raid6), i get the following error:
>> "Failure to restore critical section for reshape, sorry.  Perhaps you
>> needed to specify the --backup-file".  Needless to say, I didn't save a
>> backup file when adding the drive.
>>
>> I have attached some diagnostic output here.  It seems that my raid6
>> array is not being recognized as such.  I'm not sure what the next steps
>> are - do I need to figure out how to get the resync up and running
>> again?  Any help would be greatly appreciated.
>>
>> Many thanks,
>>
>> Allie
>>
> 
> Hello again,
> 
> Reading through https://raid.wiki.kernel.org/index.php/Assemble_Run and
> https://raid.wiki.kernel.org/index.php/RAID_Recovery, the advice seems
> to be to force assemble the array given that my event counts are all the
> same.  However, I don't want to do that until some experts have chimed
> in.  I've seen some other threads about using overlays to avoid data
> loss... not sure if that is still a recommended method.
> 
> I'm including the dmesg and mdadm --examine output here...  any advice
> much appreciated!
> 
> Thanks,
> Allie
> 
> (note below - arrays are on /dev/md[a-g])
> 
> root@ubuntu-server:/home/ubuntu-server# mdadm -A --scan --verbose
> mdadm: looking for devices for further assembly
> mdadm: Cannot assemble mbr metadata on /dev/sdh1
> mdadm: Cannot assemble mbr metadata on /dev/sdh
> mdadm: no recogniseable superblock on /dev/md/bobbiflekman:0
> mdadm: no recogniseable superblock on /dev/sdf4
> mdadm: /dev/sdf3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sdf2
> mdadm: /dev/sdf1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdf
> mdadm: no recogniseable superblock on /dev/sdg4
> mdadm: /dev/sdg3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sdg2
> mdadm: /dev/sdg1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdg
> mdadm: no recogniseable superblock on /dev/sde4
> mdadm: /dev/sde3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sde2
> mdadm: /dev/sde1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sde
> mdadm: cannot open device /dev/sr1: No medium found
> mdadm: cannot open device /dev/sr0: No medium found
> mdadm: no recogniseable superblock on /dev/sdd4
> mdadm: /dev/sdd3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sdd2
> mdadm: /dev/sdd1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdd
> mdadm: no recogniseable superblock on /dev/sdc4
> mdadm: /dev/sdc3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sdc2
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdc
> mdadm: no recogniseable superblock on /dev/sdb4
> mdadm: /dev/sdb3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sdb2
> mdadm: /dev/sdb1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sdb
> mdadm: no recogniseable superblock on /dev/sda4
> mdadm: /dev/sda3 is busy - skipping
> mdadm: no recogniseable superblock on /dev/sda2
> mdadm: /dev/sda1 is busy - skipping
> mdadm: Cannot assemble mbr metadata on /dev/sda
> mdadm: no recogniseable superblock on /dev/loop7
> mdadm: no recogniseable superblock on /dev/loop6
> mdadm: no recogniseable superblock on /dev/loop5
> mdadm: no recogniseable superblock on /dev/loop4
> mdadm: no recogniseable superblock on /dev/loop3
> mdadm: no recogniseable superblock on /dev/loop2
> mdadm: no recogniseable superblock on /dev/loop1
> mdadm: no recogniseable superblock on /dev/loop0
> mdadm: No arrays found in config file or automatically
> 

Hi again all,

Apologies for all the self-followed-up emails.  I realized my previous
--examine was done without stopping the array in question.  When stopped
and reexamined, the critical bit seems to be this:

mdadm: /dev/sdf3 is identified as a member of /dev/md/ubuntu:2, slot 4.
mdadm: /dev/sdg3 is identified as a member of /dev/md/ubuntu:2, slot 6.
mdadm: /dev/sde3 is identified as a member of /dev/md/ubuntu:2, slot 5.
mdadm: /dev/sdd3 is identified as a member of /dev/md/ubuntu:2, slot 3.
mdadm: /dev/sdc3 is identified as a member of /dev/md/ubuntu:2, slot 2.
mdadm: /dev/sdb3 is identified as a member of /dev/md/ubuntu:2, slot 1.
mdadm: /dev/sda3 is identified as a member of /dev/md/ubuntu:2, slot 0.
mdadm: /dev/md/ubuntu:2 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on device-6
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
       Possibly you needed to specify the --backup-file

That is, it thinks there is an active reshape.

Thanks,
Allie


root@ubuntu-server:/home/ubuntu-server# mdadm -A --scan --verbose
mdadm: looking for devices for further assembly
mdadm: Cannot assemble mbr metadata on /dev/sdh1
mdadm: Cannot assemble mbr metadata on /dev/sdh
mdadm: no recogniseable superblock on /dev/md/bobbiflekman:0
mdadm: no recogniseable superblock on /dev/sdf4
mdadm: No super block found on /dev/sdf2 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdf2
mdadm: /dev/sdf1 is busy - skipping
mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdf
mdadm: No super block found on /dev/sdg4 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdg4
mdadm: No super block found on /dev/sdg2 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdg2
mdadm: /dev/sdg1 is busy - skipping
mdadm: No super block found on /dev/sdg (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdg
mdadm: No super block found on /dev/sde4 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sde4
mdadm: No super block found on /dev/sde2 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sde2
mdadm: /dev/sde1 is busy - skipping
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sde
mdadm: cannot open device /dev/sr1: No medium found
mdadm: cannot open device /dev/sr0: No medium found
mdadm: No super block found on /dev/sdd4 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdd4
mdadm: No super block found on /dev/sdd2 (Expected magic a92b4efc, got
9c6196f1)
mdadm: no RAID superblock on /dev/sdd2
mdadm: /dev/sdd1 is busy - skipping
mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdd
mdadm: No super block found on /dev/sdc4 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdc4
mdadm: No super block found on /dev/sdc2 (Expected magic a92b4efc, got
9c6196f1)
mdadm: no RAID superblock on /dev/sdc2
mdadm: /dev/sdc1 is busy - skipping
mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdc
mdadm: No super block found on /dev/sdb4 (Expected magic a92b4efc, got
53425553)
mdadm: no RAID superblock on /dev/sdb4
mdadm: No super block found on /dev/sdb2 (Expected magic a92b4efc, got
9c6196f1)
mdadm: no RAID superblock on /dev/sdb2
mdadm: /dev/sdb1 is busy - skipping
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sda4 (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sda4
mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got
9c6196f1)
mdadm: no RAID superblock on /dev/sda2
mdadm: /dev/sda1 is busy - skipping
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got
00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: No super block found on /dev/loop7 (Expected magic a92b4efc, got
72756769)
mdadm: no RAID superblock on /dev/loop7
mdadm: No super block found on /dev/loop6 (Expected magic a92b4efc, got
69622f21)
mdadm: no RAID superblock on /dev/loop6
mdadm: No super block found on /dev/loop5 (Expected magic a92b4efc, got
14ea0a05)
mdadm: no RAID superblock on /dev/loop5
mdadm: No super block found on /dev/loop4 (Expected magic a92b4efc, got
1824ef5d)
mdadm: no RAID superblock on /dev/loop4
mdadm: No super block found on /dev/loop3 (Expected magic a92b4efc, got
1e993ae9)
mdadm: no RAID superblock on /dev/loop3
mdadm: No super block found on /dev/loop2 (Expected magic a92b4efc, got
cb4d8c8e)
mdadm: no RAID superblock on /dev/loop2
mdadm: No super block found on /dev/loop1 (Expected magic a92b4efc, got
d2964063)
mdadm: no RAID superblock on /dev/loop1
mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got
e7e108a6)
mdadm: no RAID superblock on /dev/loop0
mdadm: /dev/sdf3 is identified as a member of /dev/md/ubuntu:2, slot 4.
mdadm: /dev/sdg3 is identified as a member of /dev/md/ubuntu:2, slot 6.
mdadm: /dev/sde3 is identified as a member of /dev/md/ubuntu:2, slot 5.
mdadm: /dev/sdd3 is identified as a member of /dev/md/ubuntu:2, slot 3.
mdadm: /dev/sdc3 is identified as a member of /dev/md/ubuntu:2, slot 2.
mdadm: /dev/sdb3 is identified as a member of /dev/md/ubuntu:2, slot 1.
mdadm: /dev/sda3 is identified as a member of /dev/md/ubuntu:2, slot 0.
mdadm: /dev/md/ubuntu:2 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on device-6
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
       Possibly you needed to specify the --backup-file
mdadm: looking for devices for further assembly
mdadm: /dev/sdf1 is busy - skipping
mdadm: /dev/sdg1 is busy - skipping
mdadm: /dev/sde1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdb1 is busy - skipping
mdadm: /dev/sda1 is busy - skipping
mdadm: No arrays found in config file or automatically

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-06 15:27   ` Alexander Shenkin
@ 2020-04-06 16:12     ` Roger Heflin
  2020-04-06 16:27       ` Wols Lists
  0 siblings, 1 reply; 13+ messages in thread
From: Roger Heflin @ 2020-04-06 16:12 UTC (permalink / raw)
  To: Alexander Shenkin; +Cc: Linux-RAID

When I looked at your detailed files you sent a few days ago, all of
the reshapes (on all disks) indicated that they were at position 0, so
it kind of appears that the reshape never actually started at all and
hung immediately which is probably why it cannot find the critical
section, it hung prior to that getting done.   Not entirely sure how
to undo a reshape that failed like this.

On Mon, Apr 6, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote:
>
>
>
> On 4/4/2020 9:19 AM, Alexander Shenkin wrote:
> > On 4/1/2020 11:16 AM, Alexander Shenkin wrote:
> >> Hi all,
> >>
> >> I had a problem that caused my Ubuntu Server 14 to go down, and it now
> >> will not boot.  I added a drive to my raid1 (/dev/md0 -> /boot) + raid6
> >> (/dev/md2 -> /) setup.  I was previously running with /dev/md0 (raid1)
> >> assembled from /dev/sd[a-f]1, and /dev/md2 (raid6) assembled
> >> /dev/sd[a-f]3.  Both partitions from the new drive were successfully
> >> added to the two md arrays, and the raid1 (the smaller of the two)
> >> resync'd successfully it seemed (I think that resync is the correct word
> >> to use, meaning that mdadm is spreading out the data amongst the drives
> >> once an array has been grown).  When resyncing the larger raid6,
> >> however, the sync speed was quite slow (kb's/sec), and got slower and
> >> slower (7kb/sec... 5kb/sec... 3kb/sec... 1kb/sec...) until the system
> >> entirely halted and I eventually turned the power off.  It now will not
> >> boot.
> >>
> >> Thanks to Roger Heflin's help, I've booted into a livecd environment
> >> (ubuntu server 18) with all the necessary raid personalities available.
> >>
> >> When trying to --assemble md127 (raid6), i get the following error:
> >> "Failure to restore critical section for reshape, sorry.  Perhaps you
> >> needed to specify the --backup-file".  Needless to say, I didn't save a
> >> backup file when adding the drive.
> >>
> >> I have attached some diagnostic output here.  It seems that my raid6
> >> array is not being recognized as such.  I'm not sure what the next steps
> >> are - do I need to figure out how to get the resync up and running
> >> again?  Any help would be greatly appreciated.
> >>
> >> Many thanks,
> >>
> >> Allie
> >>
> >
> > Hello again,
> >
> > Reading through https://raid.wiki.kernel.org/index.php/Assemble_Run and
> > https://raid.wiki.kernel.org/index.php/RAID_Recovery, the advice seems
> > to be to force assemble the array given that my event counts are all the
> > same.  However, I don't want to do that until some experts have chimed
> > in.  I've seen some other threads about using overlays to avoid data
> > loss... not sure if that is still a recommended method.
> >
> > I'm including the dmesg and mdadm --examine output here...  any advice
> > much appreciated!
> >
> > Thanks,
> > Allie
> >
> > (note below - arrays are on /dev/md[a-g])
> >
> > root@ubuntu-server:/home/ubuntu-server# mdadm -A --scan --verbose
> > mdadm: looking for devices for further assembly
> > mdadm: Cannot assemble mbr metadata on /dev/sdh1
> > mdadm: Cannot assemble mbr metadata on /dev/sdh
> > mdadm: no recogniseable superblock on /dev/md/bobbiflekman:0
> > mdadm: no recogniseable superblock on /dev/sdf4
> > mdadm: /dev/sdf3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sdf2
> > mdadm: /dev/sdf1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdf
> > mdadm: no recogniseable superblock on /dev/sdg4
> > mdadm: /dev/sdg3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sdg2
> > mdadm: /dev/sdg1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdg
> > mdadm: no recogniseable superblock on /dev/sde4
> > mdadm: /dev/sde3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sde2
> > mdadm: /dev/sde1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sde
> > mdadm: cannot open device /dev/sr1: No medium found
> > mdadm: cannot open device /dev/sr0: No medium found
> > mdadm: no recogniseable superblock on /dev/sdd4
> > mdadm: /dev/sdd3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sdd2
> > mdadm: /dev/sdd1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdd
> > mdadm: no recogniseable superblock on /dev/sdc4
> > mdadm: /dev/sdc3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sdc2
> > mdadm: /dev/sdc1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdc
> > mdadm: no recogniseable superblock on /dev/sdb4
> > mdadm: /dev/sdb3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sdb2
> > mdadm: /dev/sdb1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sdb
> > mdadm: no recogniseable superblock on /dev/sda4
> > mdadm: /dev/sda3 is busy - skipping
> > mdadm: no recogniseable superblock on /dev/sda2
> > mdadm: /dev/sda1 is busy - skipping
> > mdadm: Cannot assemble mbr metadata on /dev/sda
> > mdadm: no recogniseable superblock on /dev/loop7
> > mdadm: no recogniseable superblock on /dev/loop6
> > mdadm: no recogniseable superblock on /dev/loop5
> > mdadm: no recogniseable superblock on /dev/loop4
> > mdadm: no recogniseable superblock on /dev/loop3
> > mdadm: no recogniseable superblock on /dev/loop2
> > mdadm: no recogniseable superblock on /dev/loop1
> > mdadm: no recogniseable superblock on /dev/loop0
> > mdadm: No arrays found in config file or automatically
> >
>
> Hi again all,
>
> Apologies for all the self-followed-up emails.  I realized my previous
> --examine was done without stopping the array in question.  When stopped
> and reexamined, the critical bit seems to be this:
>
> mdadm: /dev/sdf3 is identified as a member of /dev/md/ubuntu:2, slot 4.
> mdadm: /dev/sdg3 is identified as a member of /dev/md/ubuntu:2, slot 6.
> mdadm: /dev/sde3 is identified as a member of /dev/md/ubuntu:2, slot 5.
> mdadm: /dev/sdd3 is identified as a member of /dev/md/ubuntu:2, slot 3.
> mdadm: /dev/sdc3 is identified as a member of /dev/md/ubuntu:2, slot 2.
> mdadm: /dev/sdb3 is identified as a member of /dev/md/ubuntu:2, slot 1.
> mdadm: /dev/sda3 is identified as a member of /dev/md/ubuntu:2, slot 0.
> mdadm: /dev/md/ubuntu:2 has an active reshape - checking if critical
> section needs to be restored
> mdadm: No backup metadata on device-6
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>        Possibly you needed to specify the --backup-file
>
> That is, it thinks there is an active reshape.
>
> Thanks,
> Allie
>
>
> root@ubuntu-server:/home/ubuntu-server# mdadm -A --scan --verbose
> mdadm: looking for devices for further assembly
> mdadm: Cannot assemble mbr metadata on /dev/sdh1
> mdadm: Cannot assemble mbr metadata on /dev/sdh
> mdadm: no recogniseable superblock on /dev/md/bobbiflekman:0
> mdadm: no recogniseable superblock on /dev/sdf4
> mdadm: No super block found on /dev/sdf2 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdf2
> mdadm: /dev/sdf1 is busy - skipping
> mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdf
> mdadm: No super block found on /dev/sdg4 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdg4
> mdadm: No super block found on /dev/sdg2 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdg2
> mdadm: /dev/sdg1 is busy - skipping
> mdadm: No super block found on /dev/sdg (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdg
> mdadm: No super block found on /dev/sde4 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sde4
> mdadm: No super block found on /dev/sde2 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sde2
> mdadm: /dev/sde1 is busy - skipping
> mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sde
> mdadm: cannot open device /dev/sr1: No medium found
> mdadm: cannot open device /dev/sr0: No medium found
> mdadm: No super block found on /dev/sdd4 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdd4
> mdadm: No super block found on /dev/sdd2 (Expected magic a92b4efc, got
> 9c6196f1)
> mdadm: no RAID superblock on /dev/sdd2
> mdadm: /dev/sdd1 is busy - skipping
> mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdd
> mdadm: No super block found on /dev/sdc4 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdc4
> mdadm: No super block found on /dev/sdc2 (Expected magic a92b4efc, got
> 9c6196f1)
> mdadm: no RAID superblock on /dev/sdc2
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdc
> mdadm: No super block found on /dev/sdb4 (Expected magic a92b4efc, got
> 53425553)
> mdadm: no RAID superblock on /dev/sdb4
> mdadm: No super block found on /dev/sdb2 (Expected magic a92b4efc, got
> 9c6196f1)
> mdadm: no RAID superblock on /dev/sdb2
> mdadm: /dev/sdb1 is busy - skipping
> mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sdb
> mdadm: No super block found on /dev/sda4 (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sda4
> mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got
> 9c6196f1)
> mdadm: no RAID superblock on /dev/sda2
> mdadm: /dev/sda1 is busy - skipping
> mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got
> 00000000)
> mdadm: no RAID superblock on /dev/sda
> mdadm: No super block found on /dev/loop7 (Expected magic a92b4efc, got
> 72756769)
> mdadm: no RAID superblock on /dev/loop7
> mdadm: No super block found on /dev/loop6 (Expected magic a92b4efc, got
> 69622f21)
> mdadm: no RAID superblock on /dev/loop6
> mdadm: No super block found on /dev/loop5 (Expected magic a92b4efc, got
> 14ea0a05)
> mdadm: no RAID superblock on /dev/loop5
> mdadm: No super block found on /dev/loop4 (Expected magic a92b4efc, got
> 1824ef5d)
> mdadm: no RAID superblock on /dev/loop4
> mdadm: No super block found on /dev/loop3 (Expected magic a92b4efc, got
> 1e993ae9)
> mdadm: no RAID superblock on /dev/loop3
> mdadm: No super block found on /dev/loop2 (Expected magic a92b4efc, got
> cb4d8c8e)
> mdadm: no RAID superblock on /dev/loop2
> mdadm: No super block found on /dev/loop1 (Expected magic a92b4efc, got
> d2964063)
> mdadm: no RAID superblock on /dev/loop1
> mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got
> e7e108a6)
> mdadm: no RAID superblock on /dev/loop0
> mdadm: /dev/sdf3 is identified as a member of /dev/md/ubuntu:2, slot 4.
> mdadm: /dev/sdg3 is identified as a member of /dev/md/ubuntu:2, slot 6.
> mdadm: /dev/sde3 is identified as a member of /dev/md/ubuntu:2, slot 5.
> mdadm: /dev/sdd3 is identified as a member of /dev/md/ubuntu:2, slot 3.
> mdadm: /dev/sdc3 is identified as a member of /dev/md/ubuntu:2, slot 2.
> mdadm: /dev/sdb3 is identified as a member of /dev/md/ubuntu:2, slot 1.
> mdadm: /dev/sda3 is identified as a member of /dev/md/ubuntu:2, slot 0.
> mdadm: /dev/md/ubuntu:2 has an active reshape - checking if critical
> section needs to be restored
> mdadm: No backup metadata on device-6
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>        Possibly you needed to specify the --backup-file
> mdadm: looking for devices for further assembly
> mdadm: /dev/sdf1 is busy - skipping
> mdadm: /dev/sdg1 is busy - skipping
> mdadm: /dev/sde1 is busy - skipping
> mdadm: /dev/sdd1 is busy - skipping
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: /dev/sdb1 is busy - skipping
> mdadm: /dev/sda1 is busy - skipping
> mdadm: No arrays found in config file or automatically

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-06 16:12     ` Roger Heflin
@ 2020-04-06 16:27       ` Wols Lists
  2020-04-06 20:34         ` Phil Turmel
  0 siblings, 1 reply; 13+ messages in thread
From: Wols Lists @ 2020-04-06 16:27 UTC (permalink / raw)
  To: Roger Heflin, Alexander Shenkin; +Cc: Linux-RAID

On 06/04/20 17:12, Roger Heflin wrote:
> When I looked at your detailed files you sent a few days ago, all of
> the reshapes (on all disks) indicated that they were at position 0, so
> it kind of appears that the reshape never actually started at all and
> hung immediately which is probably why it cannot find the critical
> section, it hung prior to that getting done.   Not entirely sure how
> to undo a reshape that failed like this.

This seems quite common. Search the archives - it's probably something
like --assemble --revert-reshape.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-06 16:27       ` Wols Lists
@ 2020-04-06 20:34         ` Phil Turmel
  2020-04-07 10:25           ` Alexander Shenkin
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Turmel @ 2020-04-06 20:34 UTC (permalink / raw)
  To: Wols Lists, Roger Heflin, Alexander Shenkin; +Cc: Linux-RAID

On 4/6/20 12:27 PM, Wols Lists wrote:
> On 06/04/20 17:12, Roger Heflin wrote:
>> When I looked at your detailed files you sent a few days ago, all of
>> the reshapes (on all disks) indicated that they were at position 0, so
>> it kind of appears that the reshape never actually started at all and
>> hung immediately which is probably why it cannot find the critical
>> section, it hung prior to that getting done.   Not entirely sure how
>> to undo a reshape that failed like this.
> 
> This seems quite common. Search the archives - it's probably something
> like --assemble --revert-reshape.

Ah, yes.  I recall cases where mdmon wouldn't start or wouldn't open the 
array to start moving the stripes, so the kernel wouldn't advance. 
SystemD was one of the culprits, I believe, back then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-06 20:34         ` Phil Turmel
@ 2020-04-07 10:25           ` Alexander Shenkin
  2020-04-07 11:28             ` Phil Turmel
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-07 10:25 UTC (permalink / raw)
  To: Phil Turmel, Wols Lists, Roger Heflin; +Cc: Linux-RAID



On 4/6/2020 9:34 PM, Phil Turmel wrote:
> On 4/6/20 12:27 PM, Wols Lists wrote:
>> On 06/04/20 17:12, Roger Heflin wrote:
>>> When I looked at your detailed files you sent a few days ago, all of
>>> the reshapes (on all disks) indicated that they were at position 0, so
>>> it kind of appears that the reshape never actually started at all and
>>> hung immediately which is probably why it cannot find the critical
>>> section, it hung prior to that getting done.   Not entirely sure how
>>> to undo a reshape that failed like this.
>>
>> This seems quite common. Search the archives - it's probably something
>> like --assemble --revert-reshape.
> 
> Ah, yes.  I recall cases where mdmon wouldn't start or wouldn't open the
> array to start moving the stripes, so the kernel wouldn't advance.
> SystemD was one of the culprits, I believe, back then.

Thanks all.

So, is the following safe to run, and a good idea to try?

mdadm --assemble --update=revert-reshape /dev/md127 /dev/sd[a-g]3

And if that doesn't work, add a force?

mdadm --assemble --force --update=revert-reshape /dev/md127 /dev/sd[a-g]3

And adding --invalid-backup if it complains about backup files?

Thanks,
Allie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 10:25           ` Alexander Shenkin
@ 2020-04-07 11:28             ` Phil Turmel
  2020-04-07 11:46               ` Alexander Shenkin
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Turmel @ 2020-04-07 11:28 UTC (permalink / raw)
  To: Alexander Shenkin, Wols Lists, Roger Heflin; +Cc: Linux-RAID

Hi Allie,

On 4/7/20 6:25 AM, Alexander Shenkin wrote:
> 
> 
> On 4/6/2020 9:34 PM, Phil Turmel wrote:
>> On 4/6/20 12:27 PM, Wols Lists wrote:
>>> On 06/04/20 17:12, Roger Heflin wrote:
>>>> When I looked at your detailed files you sent a few days ago, all of
>>>> the reshapes (on all disks) indicated that they were at position 0, so
>>>> it kind of appears that the reshape never actually started at all and
>>>> hung immediately which is probably why it cannot find the critical
>>>> section, it hung prior to that getting done.   Not entirely sure how
>>>> to undo a reshape that failed like this.
>>>
>>> This seems quite common. Search the archives - it's probably something
>>> like --assemble --revert-reshape.
>>
>> Ah, yes.  I recall cases where mdmon wouldn't start or wouldn't open the
>> array to start moving the stripes, so the kernel wouldn't advance.
>> SystemD was one of the culprits, I believe, back then.
> 
> Thanks all.
> 
> So, is the following safe to run, and a good idea to try?
> 
> mdadm --assemble --update=revert-reshape /dev/md127 /dev/sd[a-g]3

Yes.

> And if that doesn't work, add a force? >
> mdadm --assemble --force --update=revert-reshape /dev/md127 /dev/sd[a-g]3

Yes.

> And adding --invalid-backup if it complains about backup files?

Yes.

> Thanks,
> Allie

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 11:28             ` Phil Turmel
@ 2020-04-07 11:46               ` Alexander Shenkin
  2020-04-07 12:28                 ` Phil Turmel
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-07 11:46 UTC (permalink / raw)
  To: Phil Turmel, Wols Lists, Roger Heflin; +Cc: Linux-RAID



On 4/7/2020 12:28 PM, Phil Turmel wrote:
> Hi Allie,
> 
> On 4/7/20 6:25 AM, Alexander Shenkin wrote:
>>
>>
>> On 4/6/2020 9:34 PM, Phil Turmel wrote:
>>> On 4/6/20 12:27 PM, Wols Lists wrote:
>>>> On 06/04/20 17:12, Roger Heflin wrote:
>>>>> When I looked at your detailed files you sent a few days ago, all of
>>>>> the reshapes (on all disks) indicated that they were at position 0, so
>>>>> it kind of appears that the reshape never actually started at all and
>>>>> hung immediately which is probably why it cannot find the critical
>>>>> section, it hung prior to that getting done.   Not entirely sure how
>>>>> to undo a reshape that failed like this.
>>>>
>>>> This seems quite common. Search the archives - it's probably something
>>>> like --assemble --revert-reshape.
>>>
>>> Ah, yes.  I recall cases where mdmon wouldn't start or wouldn't open the
>>> array to start moving the stripes, so the kernel wouldn't advance.
>>> SystemD was one of the culprits, I believe, back then.
>>
>> Thanks all.
>>
>> So, is the following safe to run, and a good idea to try?
>>
>> mdadm --assemble --update=revert-reshape /dev/md127 /dev/sd[a-g]3
> 
> Yes.
> 
>> And if that doesn't work, add a force? >
>> mdadm --assemble --force --update=revert-reshape /dev/md127 /dev/sd[a-g]3
> 
> Yes.
> 
>> And adding --invalid-backup if it complains about backup files?
> 
> Yes.
> 
>> Thanks,
>> Allie
> 
> Phil
> 

Thanks Phil,

The --invalid-backup parameter was necessary to get this up and running.
 It's now up with the 7th disk as a spare.  Shall I run fsck now, or can
I just try to grow again?

proposed grow operation:
> mdadm --grow -raid-devices=7 --backup-file=/dev/usb/grow_md127.bak
/dev/md127
> mdadm --stop /dev/md127
> umount /dev/md127 # not sure if this is necessary
> resize2fs /dev/md127

Thanks,
Allie

assemble operation results:

root@ubuntu-server:/home/ubuntu-server# mdadm --assemble
--invalid-backup --update=revert-reshape /dev/md127 /dev/sd[a-g]3
mdadm: device 12 in /dev/md127 has wrong state in superblock, but
/dev/sdg3 seems ok
mdadm: /dev/md127 has been started with 6 drives and 1 spare.

root@ubuntu-server:/home/ubuntu-server# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md127 : active raid6 sda3[6] sdg3[9](S) sde3[7] sdf3[8] sdd3[5] sdc3[2]
sdb3[4]
      11680755712 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/6] [UUUUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdf1[8] sde1[7] sdg1[9] sda1[6]
sdd1[5] sdc1[2] sdb1[4]
      1950656 blocks super 1.2 [7/7] [UUUUUUU]

unused devices: <none>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 11:46               ` Alexander Shenkin
@ 2020-04-07 12:28                 ` Phil Turmel
  2020-04-07 12:31                   ` Phil Turmel
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Turmel @ 2020-04-07 12:28 UTC (permalink / raw)
  To: Alexander Shenkin, Wols Lists, Roger Heflin; +Cc: Linux-RAID

> 
> Thanks Phil,
> 
> The --invalid-backup parameter was necessary to get this up and running.
>   It's now up with the 7th disk as a spare.  Shall I run fsck now, or can
> I just try to grow again?
> 
> proposed grow operation:
>> mdadm --grow -raid-devices=7 --backup-file=/dev/usb/grow_md127.bak
> /dev/md127
>> mdadm --stop /dev/md127
>> umount /dev/md127 # not sure if this is necessary
>> resize2fs /dev/md127

An fsck could help, if any blocks did get moved.

I would not attempt a grow again until you find out why the previous 
attempt didn't make progress.  Check if mdmon is running, and/or compile 
a fresh copy of mdadm from source.  If you don't figure it out, you'll 
just end up in the same spot again.

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 12:28                 ` Phil Turmel
@ 2020-04-07 12:31                   ` Phil Turmel
  2020-04-07 13:19                     ` Alexander Shenkin
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Turmel @ 2020-04-07 12:31 UTC (permalink / raw)
  To: Alexander Shenkin, Wols Lists, Roger Heflin; +Cc: Linux-RAID

On 4/7/20 8:28 AM, Phil Turmel wrote:
>>
>> Thanks Phil,
>>
>> The --invalid-backup parameter was necessary to get this up and running.
>>   It's now up with the 7th disk as a spare.  Shall I run fsck now, or can
>> I just try to grow again?
>>
>> proposed grow operation:
>>> mdadm --grow -raid-devices=7 --backup-file=/dev/usb/grow_md127.bak
>> /dev/md127
>>> mdadm --stop /dev/md127
>>> umount /dev/md127 # not sure if this is necessary
>>> resize2fs /dev/md127
> 
> An fsck could help, if any blocks did get moved.
> 
> I would not attempt a grow again until you find out why the previous 
> attempt didn't make progress.  Check if mdmon is running, and/or compile 
> a fresh copy of mdadm from source.  If you don't figure it out, you'll 
> just end up in the same spot again.


Oh, one more point:  Don't use a backup file.  Let mdadm shift the data 
offsets to get the temporary space needed.  (It'll run faster, too.)

(I don't see any mdadm --examine reports in the list thread.  Did you do 
them and keep the complete output?)

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 12:31                   ` Phil Turmel
@ 2020-04-07 13:19                     ` Alexander Shenkin
  2020-04-07 15:08                       ` antlists
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-07 13:19 UTC (permalink / raw)
  To: Phil Turmel, Wols Lists, Roger Heflin; +Cc: Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 1650 bytes --]

On 4/7/2020 1:31 PM, Phil Turmel wrote:
> On 4/7/20 8:28 AM, Phil Turmel wrote:
>>>
>>> Thanks Phil,
>>>
>>> The --invalid-backup parameter was necessary to get this up and running.
>>>   It's now up with the 7th disk as a spare.  Shall I run fsck now, or
>>> can
>>> I just try to grow again?
>>>
>>> proposed grow operation:
>>>> mdadm --grow -raid-devices=7 --backup-file=/dev/usb/grow_md127.bak
>>> /dev/md127
>>>> mdadm --stop /dev/md127
>>>> umount /dev/md127 # not sure if this is necessary
>>>> resize2fs /dev/md127
>>
>> An fsck could help, if any blocks did get moved.
>>
>> I would not attempt a grow again until you find out why the previous
>> attempt didn't make progress.  Check if mdmon is running, and/or
>> compile a fresh copy of mdadm from source.  If you don't figure it
>> out, you'll just end up in the same spot again.
> 
> 
> Oh, one more point:  Don't use a backup file.  Let mdadm shift the data
> offsets to get the temporary space needed.  (It'll run faster, too.)
> 
> (I don't see any mdadm --examine reports in the list thread.  Did you do
> them and keep the complete output?)
> 
> Phil
> 

Thanks Phil,

fsck is finding lots and lots of problems.  Figure I'll just run fsck -p
and see what happens... not sure what choice i have...

examine output attached here...

Re re-growing, I was hoping that running on a newer mdadm (4.1) might
fix the problem, and if i still encountered it, perhaps running the
following might unstick it:

echo frozen > /sys/block/md0/md/sync_action
echo reshape > /sys/block/md0/md/sync_action

But, I personally have no idea what happened (really), nor why... :-(

thanks,
allie

[-- Attachment #2: examine.txt --]
[-- Type: text/plain, Size: 6802 bytes --]

root@ubuntu-server:/home/ubuntu-server# mdadm --examine /dev/sd[a-g]3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 10bdbed5:cb70c8a9:566c384d:ec4c926e

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 307cff15 - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : cf70dad5:0c9ff5f6:ede689f2:ccee2eb0

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 64b3fd8b - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : f8839952:eaba2e9c:c2c401d4:3e0592a5

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 5d8720d6 - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 875a0dbd:965a9986:1b78eb3d:e15fee50

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : c7aba50f - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : dc0bda8c:2457fb4c:f87a4bec:8d5b58ed

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : a8a4517e - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : dc842dc3:09c910c7:c351c307:e2383d13

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 13:14:11 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 9a69f083 - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : c7303f62:d848d424:269581c8:83a045ec
           Name : ubuntu:2
  Creation Time : Sun Feb  5 23:39:58 2017
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB)
     Array Size : 11680755712 (11139.64 GiB 11961.09 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 635ef71b:e4add925:30ae4f0a:f6b46611

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  7 11:37:49 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 52b270d0 - correct
         Events : 316498

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 13:19                     ` Alexander Shenkin
@ 2020-04-07 15:08                       ` antlists
  2020-04-07 17:04                         ` Alexander Shenkin
  0 siblings, 1 reply; 13+ messages in thread
From: antlists @ 2020-04-07 15:08 UTC (permalink / raw)
  To: Alexander Shenkin, Phil Turmel, Roger Heflin; +Cc: Linux-RAID

On 07/04/2020 14:19, Alexander Shenkin wrote:
> Re re-growing, I was hoping that running on a newer mdadm (4.1) might
> fix the problem, and if i still encountered it, perhaps running the
> following might unstick it:
> 
> echo frozen > /sys/block/md0/md/sync_action
> echo reshape > /sys/block/md0/md/sync_action
> 
> But, I personally have no idea what happened (really), nor why...:-(

iirc pretty much all these reports come from oldish Ubuntu systems ...

What happened *could* be that you have an updated franken-kernel, plus 
an old mdadm, and the mess needs an Igor to stitch it all together...

If you ARE going to try the grow again, I'd use an up-to-date recovery 
system to run the grow, and then reboot back in to the old Ubuntu once 
your system is back.

And seriously think about upgrading your distro to the latest LTS.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid-6 cannot reshape
  2020-04-07 15:08                       ` antlists
@ 2020-04-07 17:04                         ` Alexander Shenkin
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Shenkin @ 2020-04-07 17:04 UTC (permalink / raw)
  To: antlists, Phil Turmel, Roger Heflin; +Cc: Linux-RAID

Thanks all,

My file system was a total mess, everything dropped into lost + found.
I've put everything back into what I think are the right directory
structures, and nice to not have all my data lost (i think)... but i may
just do a clean install of a new os once i have it up, running, and
grown again... many thanks to all...  (and btw, it's growing nicely with
ubuntu 18 at the moment...)

allie

On 4/7/2020 4:08 PM, antlists wrote:
> On 07/04/2020 14:19, Alexander Shenkin wrote:
>> Re re-growing, I was hoping that running on a newer mdadm (4.1) might
>> fix the problem, and if i still encountered it, perhaps running the
>> following might unstick it:
>>
>> echo frozen > /sys/block/md0/md/sync_action
>> echo reshape > /sys/block/md0/md/sync_action
>>
>> But, I personally have no idea what happened (really), nor why...:-(
> 
> iirc pretty much all these reports come from oldish Ubuntu systems ...
> 
> What happened *could* be that you have an updated franken-kernel, plus
> an old mdadm, and the mess needs an Igor to stitch it all together...
> 
> If you ARE going to try the grow again, I'd use an up-to-date recovery
> system to run the grow, and then reboot back in to the old Ubuntu once
> your system is back.
> 
> And seriously think about upgrading your distro to the latest LTS.
> 
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-04-07 17:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-01 10:16 Raid-6 cannot reshape Alexander Shenkin
     [not found] ` <6b9b6d37-6325-6515-f693-0ff3b641a67a@shenkin.org>
2020-04-06 15:27   ` Alexander Shenkin
2020-04-06 16:12     ` Roger Heflin
2020-04-06 16:27       ` Wols Lists
2020-04-06 20:34         ` Phil Turmel
2020-04-07 10:25           ` Alexander Shenkin
2020-04-07 11:28             ` Phil Turmel
2020-04-07 11:46               ` Alexander Shenkin
2020-04-07 12:28                 ` Phil Turmel
2020-04-07 12:31                   ` Phil Turmel
2020-04-07 13:19                     ` Alexander Shenkin
2020-04-07 15:08                       ` antlists
2020-04-07 17:04                         ` Alexander Shenkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.