linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Neuwirth <reddunur@online.de>
To: linux-raid@vger.kernel.org
Subject: linux mdadm assembly error: md: cannot handle concurrent replacement and reshape. (reboot while reshaping)
Date: Thu, 27 Apr 2023 23:09:40 +0200	[thread overview]
Message-ID: <e2f96772-bfbc-f43b-6da1-f520e5164536@online.de> (raw)

Hello linux-raid group.

I have an issue with my linux raid setup and I hope somebody here
could help me get my raid active again without data loss.

I have a debian 11 system with one raid array (6x 1TB hdd drives, raid level 5 )
that was active running till today, when I added two more 1TB hdd drives
and also changed the raid level to 6.

Note: For completition:

My raid setup month ago was

mdadm --create --verbose /dev/md0 -c 256K --level=5 --raid-devices=6  /dev/sdd /dev/sdc /dev/sdb /dev/sda /dev/sdg /dev/sdf

mkfs.xfs -d su=254k,sw=6 -l version=2,su=256k -s size=4k /dev/md0

mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf

update-initramfs -u

echo '/dev/md0 /mnt/data ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab


Today I did:

mdadm --add /dev/md0 /dev/sdg /dev/sdh

sudo mdadm --grow /dev/md0 --level=6


This started a growth process, I could observe with
watch -n 1 cat /proc/mdstat
and md0 was still usable all the day.
Due to speedy file access reasons I paused the grow and insertion
process today at about 50% by issue

echo "frozen" > /sys/block/md0/md/sync_action


After the file access was done, I restarted the
process with

echo reshape > /sys/block/md0/md/sync_action


but I saw in mdstat that it started form the scratch.
After about 5 min I noticed, that /dev/dm0 mount was gone with
an input/output error in syslog and I rebooted the computer, to see the
kernel would reassemble dm0 correctly. Maybe the this was a problem,
because the dm0 was still reshaping, I do not know..

(For some reason the drives sdx letter orders changed after boot,
but I did not change any devices)

After reboot dm0 was not reassembled and I couldn't reassemble
it manually. When I try to assemble it, there is always an Error

mdadm --assemble --run --force --update=resync /dev/md0
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

Due to the interruption while growing I'm not sure now,
  1) if my raid set actually is a raid5 or already a raid6
  2) if I have to attach the first 6 devices to raid or all 8
  3) how to resolve the issue stated in all logs:

  mdadm --assemble  /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdf /dev/sdi /dev/sdj /dev/sdg /dev/sdh
  mdadm: /dev/md0 assembled from 7 drives - need 8 to start (use --run to insist).
  mdadm: failed to start array /dev/md0: Input/output error
  [ 3415.023097] md: cannot handle concurrent replacement and reshape.
  [ 3415.023551] md/raid:md0: failed to run raid set.
  [ 3415.023553] md: pers->run() failed ...

  srv11:~# dmesg |tail

[ 3393.321837]  sdf:

[ 3415.020629] md/raid:md0: not clean -- starting background reconstruction

[ 3415.020771] md/raid:md0: device sdj operational as raid disk 4

[ 3415.020773] md/raid:md0: device sdi operational as raid disk 5

[ 3415.020774] md/raid:md0: device sdf operational as raid disk 0

[ 3415.020775] md/raid:md0: device sdc operational as raid disk 2

[ 3415.020776] md/raid:md0: device sdb operational as raid disk 1

[ 3415.023097] md: cannot handle concurrent replacement and reshape.

[ 3415.023551] md/raid:md0: failed to run raid set.

[ 3415.023553] md: pers->run() failed ...


I have no idea, how to solve the issue to "cannot handle concurrent replacement and reshape".
I have seen, that tow (probably the new) drives have "Events : 0", while the other 6 all
have "Events : 4700" in mdadm --examine.

Could anyone tell me, what is going on here and how I could successfully
finish the reshape process or atleast save my data from the defect raid ?

Any help is appreciated !

Peter

------------------------------------------------------------------------------------------------------------------------
Some Logs:
------------------------------------------------------------------------------------------------------------------------

uname -a ; mdadm --version
Linux srv11 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
mdadm - v4.1 - 2018-10-01

srv11:~# mdadm -D /dev/md0
/dev/md0:
            Version : 1.2
      Creation Time : Mon Mar  6 18:17:30 2023
         Raid Level : raid6
      Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
       Raid Devices : 7
      Total Devices : 6
        Persistence : Superblock is persistent

        Update Time : Thu Apr 27 17:36:15 2023
              State : active, FAILED, Not Started
     Active Devices : 5
    Working Devices : 6
     Failed Devices : 0
      Spare Devices : 1

             Layout : left-symmetric-6
         Chunk Size : 256K

Consistency Policy : unknown

         New Layout : left-symmetric

               Name : solidsrv11:0  (local to host solidsrv11)
               UUID : 1a87479e:7513dd65:37c61ca1:43184f65
             Events : 4700

     Number   Major   Minor   RaidDevice State
        -       0        0        0      removed
        -       0        0        1      removed
        -       0        0        2      removed
        -       0        0        3      removed
        -       0        0        4      removed
        -       0        0        5      removed
        -       0        0        6      removed

        -       8       32        2      sync   /dev/sdc
        -       8      144        4      sync   /dev/sdj
        -       8       80        0      sync   /dev/sdf
        -       8       16        1      sync   /dev/sdb
        -       8      128        5      sync   /dev/sdi
        -       8       96        4      spare rebuilding   /dev/sdg


---------------------

rv11:~# mdadm --examine /dev/sda
/dev/sda:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x45
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : beaa5492:b64439e2:cd410543:beaca0a5

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : 450df7f - correct
          Events : 0

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : spare
    Array State : AAAARA. ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdb
/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x45
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : f8eeac89:aa546f6e:ae563ef7:9dcda35a

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : f11a7bc8 - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Active device 1
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdc
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x45
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 8f4bbd11:d5584577:6bacfab3:5f656678

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : 5c7fca0a - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Active device 2
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdf
/dev/sdf:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x45
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : cdc63596:11a6e7c4:02196efa:86698b3f

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : 3c330341 - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Active device 0
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdg
/dev/sdg:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x57
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
Recovery Offset : 167950336 sectors
           State : active
     Device UUID : f7b30510:b2a723a1:223718d1:3dcd2148

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : 7b822bfb - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Replacement device 4
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdh
/dev/sdh:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x45
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 18748f5e:8a9f9407:7886862a:837c33ec

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors
        Checksum : 23fa1824 - correct
          Events : 0

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : spare
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdi
/dev/sdi:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4d
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : d3aa54c6:d1eba4e1:4fe2c3bd:407daa90

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
        Checksum : 9d840ca4 - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Active device 5
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)
srv11:~# mdadm --examine /dev/sdj
/dev/sdj:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4d
      Array UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Name : srv11:0  (local to host srv11)
   Creation Time : Mon Mar  6 18:17:30 2023
      Raid Level : raid6
    Raid Devices : 7

  Avail Dev Size : 1953260976 (931.39 GiB 1000.07 GB)
      Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
   Used Dev Size : 1953260544 (931.39 GiB 1000.07 GB)
     Data Offset : 264192 sectors
      New Offset : 133120 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : abf71cb0:3f4d01f1:8603e298:6603d6d7

Internal Bitmap : 8 sectors from superblock
   Reshape pos'n : 419875840 (400.42 GiB 429.95 GB)
      New Layout : left-symmetric

     Update Time : Thu Apr 27 17:36:15 2023
   Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
        Checksum : b8f2676d - correct
          Events : 4700

          Layout : left-symmetric-6
      Chunk Size : 256K

    Device Role : Active device 4
    Array State : AAAARAA ('A' == active, '.' == missing, 'R' == replacing)

------------------------------------------------

Bootup syslog snip:

|Apr 27 17:37:05 kernel: sde: sde1 sde2 sde3 sde4 Apr 27 17:37:05 kernel: sdj: Apr 27 17:37:05 kernel: sdg: sdg1 sdg2 sdg3 Apr 27 17:37:05 kernel: sdc: Apr 27 17:37:05 kernel: sda: Apr 27 17:37:05 kernel: sdi: Apr 27 17:37:05 kernel: sdh: sdh1 sdh2 sdh3 Apr 27 17:37:05 kernel: sdf: Apr 27 17:37:05 kernel: sd 0:0:0:0: [sda] Attached SCSI disk Apr 27 17:37:05 kernel: sd 3:0:0:0: [sdg] Attached SCSI disk Apr 27 17:37:05 kernel: sd 6:0:0:0: [sdj] Attached SCSI disk Apr 27 17:37:05 kernel: sd 0:0:1:0: [sdc] Attached SCSI disk Apr 27 17:37:05 kernel: sd 0:0:2:0: [sdb] Attached SCSI disk Apr 27 17:37:05 kernel: sd 2:0:0:0: [sde] Attached SCSI disk Apr 27 17:37:05 kernel: sd 5:0:0:0: [sdi] Attached SCSI disk Apr 27 17:37:05 kernel: sd 0:0:3:0: [sdf] Attached SCSI disk Apr 27 17:37:05 kernel: sd 4:0:0:0: [sdh] Attached SCSI disk Apr 27 17:37:05 kernel: sd 1:0:0:0: [sdd] Attached SCSI disk Apr 27 17:37:05 kernel: raid6: sse2x4 gen() 12392 MB/s Apr 27 17:37:05 kernel: raid6: sse2x4 
xor() 7042 MB/s Apr 27 17:37:05 kernel: raid6: sse2x2 gen() 11331 MB/s Apr 27 17:37:05 kernel: raid6: sse2x2 xor() 7148 MB/s Apr 27 17:37:05 kernel: raid6: sse2x1 gen() 10382 MB/s Apr 27 17:37:05 kernel: raid6: sse2x1 xor() 6645 MB/s Apr 27 17:37:05 kernel: raid6: using algorithm sse2x4 gen() 12392 MB/s Apr 27 17:37:05 kernel: raid6: .... xor() 7042 MB/s, rmw enabled Apr 27 17:37:05 kernel: raid6: using ssse3x2 recovery algorithm Apr 27 17:37:05 kernel: xor: automatically using best checksumming function avx Apr 27 17:37:05 kernel: async_tx: api initialized (async) Apr 27 17:37:05 kernel: md/raid:md0: device sdf operational as raid disk 0 Apr 27 17:37:05 kernel: md/raid:md0: device sdb operational as raid disk 1 Apr 27 17:37:05 kernel: md/raid:md0: device sda operational as raid disk 3 Apr 27 17:37:05 kernel: md/raid:md0: device sdi operational as raid disk 5 Apr 27 17:37:05 kernel: md/raid:md0: device sdc operational as raid disk 2 Apr 27 17:37:05 kernel: md/raid:md0: device 
sdj operational as raid disk 4|


             reply	other threads:[~2023-04-27 21:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27 21:09 Peter Neuwirth [this message]
2023-04-28  2:01 ` linux mdadm assembly error: md: cannot handle concurrent replacement and reshape. (reboot while reshaping) Yu Kuai
2023-05-04  8:16 ` Yu Kuai
2023-05-02 11:30 Peter Neuwirth
2023-05-04  1:57 ` Yu Kuai
2023-05-04  2:10   ` Yu Kuai
2023-05-04  8:36 Peter Neuwirth
2023-05-04  9:08 ` Yu Kuai
2023-05-04  9:43 Peter Neuwirth
2023-05-04 10:49 Peter Neuwirth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2f96772-bfbc-f43b-6da1-f520e5164536@online.de \
    --to=reddunur@online.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).