All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Rapp <george.rapp@gmail.com>
To: Shaohua Li <shli@kernel.org>
Cc: Linux-RAID <linux-raid@vger.kernel.org>,
	Matthew Krumwiede <matt.krumwiede@me.com>,
	NeilBrown <neilb@suse.com>,
	Jes.Sorensen@gmail.com
Subject: Re: Reshape stalled at first badblock location (was: RAID 5 --assemble doesn't recognize all overlays as component devices)
Date: Tue, 21 Feb 2017 20:12:14 -0500	[thread overview]
Message-ID: <CAF-KpgZ3tZQy93PwUFk0RZRfv1w0_WBRhU+FQ9C4=Hhh44H7KQ@mail.gmail.com> (raw)
In-Reply-To: <20170221175801.wt64t2tzcvg3sfmc@kernel.org>

> On Mon, Feb 20, 2017 at 05:18:46PM -0500, George Rapp wrote:
>> On Sat, Feb 11, 2017 at 7:32 PM, George Rapp <george.rapp@gmail.com> wrote:
>> [...snip...]
>>
>> When I try to assemble the RAID 5 array, though, the process gets
>> stuck at the location of the first bad block. The assemble command is:
>>
>> [...snip...]
>>
>> The md4_raid5 process immediately spikes to 100% CPU utilization, and
>> the reshape stops at 1901225472 KiB (which is exactly half of the
>> first bad sector value, 3802454640):
>>
> [...snip...]
On Tue, Feb 21, 2017 at 4:51 AM, Tomasz Majchrzak
<tomasz.majchrzak@intel.com> wrote:
> As long as you're sure the data on the disk is valid, I believe clearing
> bad block list manually in metadata (no easy way to do it) would allow
> reshape to complete.
>
> Tomek
On Tue, Feb 21, 2017 at 12:58 PM, Shaohua Li <shli@kernel.org> wrote:
>
> Add Neil and Jes.
>
> Yes, there were similar reports before. When reshape finds nadblocks, the
> reshape will do an infinite loop without any progress. I think there are two
> things we need to do:
>
> - Make reshape more robust. Maybe reshape should bail out if badblocks found.
> - Add an option in mdadm to force reset badblocks

OK, I examined the structure of the superblock and the badblocks
array. My first attempt was to zero out the bblog_offset and
bblog_size in the md superblock using dd (but that causes the checksum
to be different than the sb_csum in the superblock, and the mdadm
--assemble fails. I didn't want to research how to recalculate the
checksum unless I really, really have to.  8^)

Running mdadm under gdb, I determined that my bblog_offset was 72
sectors from the start of the md superblock), and filled that space
with 0xff characters in my overlay file:

# dd if=/dev/mapper/sdg4 bs=512 count=1 skip=73 of=ffffffff
# dd if=ffffffff of=/dev/mapper/sdg4 bs=512 count=1 seek=72

That convinced mdadm that I have a badblocks list, but it's empty:

# mdadm --examine-badblocks /dev/mapper/sdg4
Bad-blocks on /dev/mapper/sdg4:
#

Once I did that, and restarted the array with my overlay files:

# mdadm --assemble --force /dev/md4
--backup-file=/home/gwr/2017/2017-01/md4_backup__2017-01-25
/dev/mapper/sde4 /dev/mapper/sdf4 /dev/mapper/sdh4 /dev/mapper/sdl4
/dev/mapper/sdg4 /dev/mapper/sdk4 /dev/mapper/sdi4 /dev/mapper/sdj4
/dev/mapper/sdb4
mdadm: accepting backup with timestamp 1485366772 for array with
timestamp 1487645030
mdadm: /dev/md4 has been started with 9 drives (out of 10).
#

The reshape operation got past the two positions where it had frozen
earlier, and didn't throw any obvious errors to /var/log/messages, so
Tomek's suggestion seems to clear the badblocks seems to have worked.
However, this was in the overlay files, not the actual devices.

Before I proceed for real, does clearing the badblocks log and
assembling the array seem like my best option?

-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

  reply	other threads:[~2017-02-22  1:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-20 22:18 Reshape stalled at first badblock location (was: RAID 5 --assemble doesn't recognize all overlays as component devices) George Rapp
2017-02-21  9:51 ` Tomasz Majchrzak
2017-02-21 17:58 ` Shaohua Li
2017-02-22  1:12   ` George Rapp [this message]
2017-02-22 16:17     ` Phil Turmel
2017-02-22 18:39       ` George Rapp
2017-03-03  0:27   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF-KpgZ3tZQy93PwUFk0RZRfv1w0_WBRhU+FQ9C4=Hhh44H7KQ@mail.gmail.com' \
    --to=george.rapp@gmail.com \
    --cc=Jes.Sorensen@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=matt.krumwiede@me.com \
    --cc=neilb@suse.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.