From: "Patrik Dahlström" <risca@powerlamerz.org>
To: linux-raid@vger.kernel.org
Subject: Re: I trashed my superblocks after reshape from raid5 to raid6 stalled - need help recovering
Date: Wed, 10 Feb 2021 23:10:16 +0100 [thread overview]
Message-ID: <0d0eb6c1-39db-1258-4b81-cac8e7f5c78d@powerlamerz.org> (raw)
In-Reply-To: <1e6c1c18-38c3-3419-505c-b310e9f55dd6@powerlamerz.org>
On 1/29/21 1:37 AM, Patrik Dahlström wrote:
> Hello,
>
> Logs and disk information is located at the end of this email. Please
> note that I also have a USB stick plugged into this computer that
> sometimes comes up as sda and sometimes sdi, which means that some of
> the collected data might be off-by-one (sda -> sdb, etc.).
>
> I will try to be as thorough as possible to explain what has happened
> and don't waste your time. The short version first:
>
> * Start reshape of raid5 with 7 disks to raid6 with 8 disks
> * Reshape stalls
> * Panic
> * Fail to create overlays
> * Become overconfident
> * Overwrite superblock (wrongly) without overlays
> * Realize mistake
> * Stop
> * Get overlays working
> * Much hard thinking and experimenting with device mapper
> * Successfully mount raid volume by combining 2 overlay sets
> * Need help restoring array
>
> If this is enough information, please skip to "Where I am now" below.
> For details on what I've written to my superblock, see "Frakking up".
>
> Long version
> ============
> This story begins with a perfectly healthy raid5 array with 7 x 10 TB
> drives. Well, mostly healthy. I had started to see these lines pop up
> in my syslog:
>
> Jan 21 18:01:06 rack-server-1 smartd[1586]: Device: /dev/sdb [SAT], 16 Currently unreadable (pending) sectors
>
> Because of this, I started to become paranoid that I would loose data
> when replacing the bad drive. I decided I should add another 10 TB to
> the array and convert to raid6. These are the commands I used to kick
> off that conversion:
>
> (mdadm 4.1 and Linux 4.15.0-132-generic)
>
> $ sudo mdadm --add /dev/md0 /dev/sdg
> $ sudo mdadm --grow /dev/md0 --level=6 --raid-disk=8
>
> This kicked off the reshape process successfully. A few days later, I
> started to notice I/O issues. More precisely: timeouts. It looks like
> the reshape process had stalled and any kind of I/O to the raid mount
> mount point would also stall, until some timeout error occurred. This
> was most likely caused by BBL, but I didn't know that at the time. At
> this point these messages started to show up in my kernel log:
>
> Jan 20 21:55:06 rack-server-1 kernel: INFO: task md0_reshape:29278 blocked for more than 120 seconds.
> Jan 20 21:55:06 rack-server-1 kernel: Tainted: G OE 4.15.0-132-generic #136-Ubuntu
> Jan 20 21:55:06 rack-server-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 20 21:55:06 rack-server-1 kernel: md0_reshape D 0 29278 2 0x80000000
> Jan 20 21:55:06 rack-server-1 kernel: Call Trace:
> Jan 20 21:55:06 rack-server-1 kernel: __schedule+0x24e/0x880
> Jan 20 21:55:06 rack-server-1 kernel: schedule+0x2c/0x80
> Jan 20 21:55:06 rack-server-1 kernel: md_do_sync+0xdf1/0xfa0
> Jan 20 21:55:06 rack-server-1 kernel: ? wait_woken+0x80/0x80
> Jan 20 21:55:06 rack-server-1 kernel: ? __switch_to_asm+0x35/0x70
> Jan 20 21:55:06 rack-server-1 kernel: md_thread+0x129/0x170
> Jan 20 21:55:06 rack-server-1 kernel: ? md_seq_next+0x90/0x90
> Jan 20 21:55:06 rack-server-1 kernel: ? md_thread+0x129/0x170
> Jan 20 21:55:06 rack-server-1 kernel: kthread+0x121/0x140
> Jan 20 21:55:06 rack-server-1 kernel: ? find_pers+0x70/0x70
> Jan 20 21:55:06 rack-server-1 kernel: ? kthread_create_worker_on_cpu+0x70/0x70
> Jan 20 21:55:06 rack-server-1 kernel: ret_from_fork+0x35/0x40
>
> Other tasks or user processes also started to become blocked. Almost
> anything I did would become blocked because it would access this mount
> point and stall. If I rebooted the server, it would stall during boot,
> when assembling the raid.
>
> By removing all the drives, I was able to at least boot the server. I
> decided to update to Ubuntu 20.04 and try again - no dice. I still got
> blocked. I did notice that the reshape progressed a little bit every
> time I booted.
>
> I figured I would revert the reshape and start from scratch and I found
> out that there is something called "--assemble --update=revert-reshape":
>
> (mdadm v4.1 and Linux-5.4.0-64-generic, USB stick is sda)
>
> $ sudo mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Apr 29 16:21:11 2017
> Raid Level : raid6
> Array Size : 58597880832 (55883.29 GiB 60004.23 GB)
> Used Dev Size : 9766313472 (9313.88 GiB 10000.70 GB)
> Raid Devices : 8
> Total Devices : 8
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Thu Jan 21 20:32:24 2021
> State : clean, degraded, reshaping
> Active Devices : 7
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> Consistency Policy : bitmap
>
> Reshape Status : 59% complete
> New Layout : left-symmetric
>
> Name : rack-server-1:1 (local to host rack-server-1)
> UUID : 7f289c7a:570e2f7e:2ac6f909:03b3970f
> Events : 728221
>
> Number Major Minor RaidDevice State
> 8 8 48 0 active sync /dev/sdd
> 13 8 64 1 active sync /dev/sde
> 12 8 96 2 active sync /dev/sdg
> 7 8 128 3 active sync /dev/sdi
> 10 8 80 4 active sync /dev/sdf
> 9 8 16 5 active sync /dev/sdb
> 11 8 32 6 active sync /dev/sdc
> 14 8 112 7 spare rebuilding /dev/sdh
> $ sudo mdadm --stop /dev/md0
> $ sudo mdadm --assemble --update=revert-reshape /dev/md0
>
> This did not do what I expected. Unfortunately, I forgot to save the
> output of "mdadm --detail /dev/md0" after the last command, but if I
> remember correctly it marked all my drives, except sdh, as faulty. I
> expected it to start going backwards in the reshape progress.
>
> At this point, I saved away the output of these commands:
>
> (mdadm v4.1 and Linux-5.4.0-64-generic, USB stick is sda)
>
> $ sudo mdadm --examine /dev/sdb
> $ sudo mdadm --examine /dev/sdc
> $ sudo mdadm --examine /dev/sdd
> $ sudo mdadm --examine /dev/sde
> $ sudo mdadm --examine /dev/sdf
> $ sudo mdadm --examine /dev/sdg
> $ sudo mdadm --examine /dev/sdh
> $ sudo mdadm --examine /dev/sdg
> $ sudo mdadm --examine /dev/sdh
> $ sudo mdadm --examine /dev/sdi
>
> (output located at the end)
>
> Fail to create overlays
> =======================
> I realized that I needed to start using overlays unless I mess up even
> more. However, that was easier said than done. No matter what I did, I
> always got this error as a result of "dmsetup create":
>
> Jan 21 21:19:10 rack-server-1 kernel: device-mapper: table: 253:1: snapshot: Cannot get origin device
> Jan 21 21:19:10 rack-server-1 kernel: device-mapper: ioctl: error adding target to table
>
> Frakking up
> ===========
> Now, what would be the sane thing to do when you can't create overlays?
>
> Stop. Ask for help.
>
> If this was a test on how I perform under pressure, I failed. After all,
> this wasn't my first time recovering from a failed reshape. Just search
> the mailing list for my name. I was confident in my abilities, and flew
> straight into the sun:
>
> (mdadm v4.1 and Linux-5.4.0-64-generic, USB stick is sdi)
>
> $ sudo mdadm --create --level=6 --raid-devices=8 --size=4883156736 /dev/md0 /dev/sdc /dev/sdd /dev/sdf /dev/sdh /dev/sde /dev/sda /dev/sdb missing
>
> Notice the lack of "--assume-clean" and the wrong "--size" parameter,
> not to mention the missing "--data-offset" since it was "non-default".
>
> This kicked off a rebuild of disk sdb (the last non-missing device).
> Fortunately, I realized my mistake within a few seconds - 39 seconds in
> fact, if my command history can be trusted - and stopped the array.
>
> What follows is a series of more attempts at re-creating the superblock
> with different parameters to "mdadm --create --assume-clean". The last
> one (I think) being:
>
> (mdadm v4.1 and Linux-5.4.0-64-generic, USB stick is sdi)
>
> $ sudo mdadm --create --assume-clean --level=6 --raid-devices=8 --size=9766313472 --data-offset=61440 /dev/md0 /dev/sdc /dev/sdd /dev/sdf /dev/sdh /dev/sde /dev/sda /dev/sdb missing
>
> Running "fsck.ext4 -n /dev/md0" on this array would at least start.
> However, it would eventually reach a point where it would start spewing
> a ton of errors. My guess is that it reaches the point where the array
> has not yet been reshaped.
>
> Getting overlays to work again
> ==============================
>
> Although my command history has no memory of it, "journalctl" tell me
> that I rebooted my server one more time after I failed to create
> overlays. After that, the "overlay_create" and "overlay_remove"
> functions just worked. Every. <censored>. Time.
>
> Once overlays were working, I got to work at thinking hard and
> experimenting. Some experiments quickly grew the overlay files
> and my storage space for them were only ~80 GB. I decided to
> scrap the newly added disk and re-use it as storage space for
> overlay files. In hindsight, I realize that I could have used
> the other 10 TB drive I had laying on the shelf below...
>
> Where I am now
> ==============
>
> I am able to mount my raid volume by creating 2 separate sets of overlay
> files, create an array on each set, and then use device mapper in linear
> mode to "stitch together" the 2 arrays at the exact reshape position:
>
> (mdadm v4.1 and Linux-5.4.0-64-generic)
>
> $ sudo mdadm --create --assume-clean --level=6 --raid-devices=8 --size=9766313472 --layout=left-symmetric --data-offset=61440 /dev/md0 /dev/dm-3 /dev/dm-4 /dev/dm-6 /dev/dm-7 /dev/dm-5 /dev/dm-1 /dev/dm-2 missing
> $ sudo mdadm --create --assume-clean --level=6 --raid-devices=8 --size=9766313472 --layout=left-symmetric-6 --data-offset=123392 /dev/md1 /dev/dm-10 /dev/dm-11 /dev/dm-13 /dev/dm-14 /dev/dm-12 /dev/dm-8 /dev/dm-9 missing
> $ echo "0 69529589760 linear /dev/md0 0
> 69529589760 47666171904 linear /dev/md1 69529589760" | sudo dmsetup create joined
> $ sudo mount -o ro /dev/dm-15 /storage
>
> The numbers are taken from the "mdadm -E <dev>" commands I ran earlier,
> only recalculated to fit the expected unit. The last drive in the array
> has been re-purposed as overlay storage.
>
> What now?
> =========
>
> This is where I need some more help:
> * How can I resume the reshape or otherwise fix my array?
> * Is resuming a reshape something that would be a useful feature?
> If so, I could look into adding support for it. Maybe used like this?
>
> # mdadm --create --assume-clean /dev/md0 <array definition>
> # mdadm --manage /dev/md0 --grow --reshape-pos=<number> <grow params>
>
> * Does wiping or overwriting the superblock also clear the BBL?
> * Is there any information missing?
Update! I've fully recovered from my mishaps and successfully reshaped
my array from raid5 to raid6!
I modified mdadm so that I could set the proper bits and values in the
superblocks when creating my array. These were my final commands to get
my array running again:
$ sudo ./mdadm --create --assume-clean --level=6 --raid-devices=8 --data-offset=61440 --layout=left-symmetric --size=9766313472 --reshape-position=34764794880 --new-data-offset=246784 --new-layout=left-symmetric-6 /dev/md0 /dev/sdc /dev/sdd /dev/sdf /dev/sdh /dev/sde /dev/sda /dev/sdb missing
$ sudo mdadm --stop /dev/md0
$ sudo ./mdadm --assemble --update="revert-reshape" /dev/md0 /dev/sdc /dev/sdd /dev/sdf /dev/sdh /dev/sde /dev/sda /dev/sdb
$ sudo mount -o ro /dev/md0 /storage
This resumed my array reshape from raid5 to raid6. Once that had
completed, I added the final 8th disk and let it rebuild. The added
flags are "--reshape-position", "--new-data-offset", and "--new-layout".
Are these flags something that would be considered useful for mdadm?
If so, I could clean up the patches a bit and post them.
// Patrik
prev parent reply other threads:[~2021-02-10 22:11 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-29 0:37 I trashed my superblocks after reshape from raid5 to raid6 stalled - need help recovering Patrik Dahlström
2021-02-10 22:10 ` Patrik Dahlström [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0d0eb6c1-39db-1258-4b81-cac8e7f5c78d@powerlamerz.org \
--to=risca@powerlamerz.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).