All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <nfbrown@novell.com>
To: Austin S Hemmelgarn <ahferroin7@gmail.com>,
	Linux-Kernel mailing list <linux-kernel@vger.kernel.org>,
	linux-raid@vger.kernel.org,
	device-mapper development <dm-devel@redhat.com>
Subject: Re: Possible bug in DM-RAID.
Date: Wed, 21 Oct 2015 12:39:32 +1100	[thread overview]
Message-ID: <87y4ex0zl7.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <562659DD.1000505@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3457 bytes --]


Added dm-devel, which is probably the more appropriate list for dm
things.

NeilBrown

Austin S Hemmelgarn <ahferroin7@gmail.com> writes:

> I think I've stumbled upon a bug in DM-RAID.  The primary symptom is that when
> creating a new DM-RAID based device (using either LVM or dmsetup) in a RAID1
> configuration, it very quickly claims one by one that all of the disks failed
> except the first, and goes degraded.  When this happens on a given system, the
> disks always 'fail' in the reverse of the order of the mirror numbers.  All of
> the other RAID profiles work just fine.  Curiously, it also only seems to
> happen for 'big' devices (I haven't been able to determine exactly what the
> minimum size is, but I see it 100% of the time with 32G devices, never with 16G
> ones, and only intermittently with 24G).
>
> Here's what I got from dmesg when creating a 32G LVM volume that exhibited
> this issue:
> [66318.401295] device-mapper: raid: Superblocks created for new array
> [66318.450452] md/raid1:mdX: active with 2 out of 2 mirrors
> [66318.450467] Choosing daemon_sleep default (5 sec)
> [66318.450482] created bitmap (32 pages) for device mdX
> [66318.450495] attempt to access beyond end of device
> [66318.450501] dm-91: rw=13329, want=0, limit=8192
> [66318.450506] md: super_written gets error=-5, uptodate=0
> [66318.450513] md/raid1:mdX: Disk failure on dm-92, disabling device.
>                md/raid1:mdX: Operation continuing on 1 devices.
> [66318.459815] attempt to access beyond end of device
> [66318.459819] dm-89: rw=13329, want=0, limit=8192
> [66318.459822] md: super_written gets error=-5, uptodate=0
> [66318.492852] attempt to access beyond end of device
> [66318.492862] dm-89: rw=13329, want=0, limit=8192
> [66318.492868] md: super_written gets error=-5, uptodate=0
> [66318.627183] mdX: bitmap file is out of date, doing full recovery
> [66318.714107] mdX: bitmap initialized from disk: read 3 pages, set 65536 of 65536 bits
> [66318.782045] RAID1 conf printout:
> [66318.782054]  --- wd:1 rd:2
> [66318.782061]  disk 0, wo:0, o:1, dev:dm-90
> [66318.782068]  disk 1, wo:1, o:0, dev:dm-92
> [66318.836598] RAID1 conf printout:
> [66318.836607]  --- wd:1 rd:2
> [66318.836614]  disk 0, wo:0, o:1, dev:dm-90
>
> And here's output for a 24G LVM volume that didn't display the issue.
> [66343.407954] device-mapper: raid: Superblocks created for new array
> [66343.479065] md/raid1:mdX: active with 2 out of 2 mirrors
> [66343.479078] Choosing daemon_sleep default (5 sec)
> [66343.479101] created bitmap (24 pages) for device mdX
> [66343.629329] mdX: bitmap file is out of date, doing full recovery
> [66343.677374] mdX: bitmap initialized from disk: read 2 pages, set 49152 of 49152 bits
>
> I'm using a lightly patched version of 4.2.3
> (the source can be found at https://github.com/ferroin/linux)
> but none of the patches I'm using come anywhere near anything in the block layer,
> let alone the DM/MD code.
>
> I've attempted to bisect this, although it got kind of complicated.  So far I've
> determined that the first commit that I see this issue on is d3b178a: md: Skip cluster setup for dm-raid
> Prior to that commit, I can't initialize any dm-raid devices due to the bug it fixes.
> I have not tested anything prior to d51e4fe (the merge commit that pulled in the md-cluster code),
> but I do distinctly remember that I did not see this issue in 3.19.
>
> I'll be happy to provide more info if needed.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2015-10-21  1:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20 15:12 Possible bug in DM-RAID Austin S Hemmelgarn
2015-10-21  1:39 ` Neil Brown [this message]
2015-10-21 14:11   ` Heinz Mauelshagen
2015-10-21 15:08     ` [dm-devel] " Austin S Hemmelgarn
2015-10-21 13:19 ` Austin S Hemmelgarn
2015-10-21 13:24   ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y4ex0zl7.fsf@notabene.neil.brown.name \
    --to=nfbrown@novell.com \
    --cc=ahferroin7@gmail.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.