All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Tom <storm9c1@skymagik.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS appears to cause strange hang with md raid1 on reboot
Date: Thu, 7 Feb 2013 10:51:37 +1100	[thread overview]
Message-ID: <20130206235137.GV2667@dastard> (raw)
In-Reply-To: <55720.75.149.17.233.1360123732.squirrel@secure.skymagik.net>

On Tue, Feb 05, 2013 at 11:08:52PM -0500, Tom wrote:
> In a previous message, Dave Chinner wrote:
> >
> > Find out if the unmount is returning an error first. If there is no
> > error, then you need to find what is doing bind mounts on your
> > system and make sure they are unmounted properly before the final
> > unmount is done. If lazy unmount is being done, make it a normal
> > unmount an see where the unmount is getting stcuk or taking time to
> > complete by using sysrq-w if it gets delayed for any length of time.
> 
> OK, here is what I did tonight.  I added debug toward the end of
> /etc/rc.d/rc6.d/S01reboot  ...where the umounts are normally handled.

> DEBUG: remounting '/' as read-only using 'mount -n -o ro,remount'
> DEBUG: remounting '/proc' as read-only using 'mount -n -o ro,remount'
> mdadm: failed to set readonly for /dev/md3: Device or resource busy

EBUSY means one of two possibilities:

	1. there's a file still open for write. => lsof
	2. there's an unlinked but still open file => lsof

But I don't think that's the problem at all.

> Please stand by while rebooting the system...
> md: stopping all md devices.
> md: md2 switched to read-only mode.
> md: md1 switched to read-only mode.
> (hang)
> 
> Just for kicks, I get the same output with the 308 kernel, with the
> addition of this:
> 
> md: md3 still in use.

Which implies that the problem is a change in behaviour in the md
layer or below. i.e. previously md just saw that it was busy and
did not try to tear down the device. Now it is trying to tear down
the device with a filesystem that is still active on it.

> But the same system happily reboots just fine with the 308 kernel even
> after producing that "still in use" message that 348 does not produce.

Right, because it correctly detects the filesystem is still in use
and doesn't try to tear down the device.

> I did some more experiments with mdadm and I can't get any underlying
> md device to go into read-only mode even if the fs is mounted read-only.
> The only way I could get that to work is if the fs is completely unmounted.
> Whether it is XFS or ext3.  Yet a system on ext3 reboots fine.

And that will be because ext3 won't be issuing any IO on the sync
that is triggered when tearing down the MD device. XFS is writing
the superblock, and that's where the MD device is hanging on itself.

> Is there more specific information that I can gather that may help?

No need - I can tell you the exact commit in the RHEL 5.9 tree that
caused this regression:

11ff4073: [md] Fix reboot stall with raid on megaraid_sas controller

The result is that the final shutdown of md devices now uses a
"force readonly" method, which means it ignores the fact that a
filesystem may still be active on top of it and rips the device out
from under the filesystem. This really only affects root devices,
and given that XFs is not supported as a root device on RHEL, it
isn't in the QE test matrix and so the problem was never noticed.

Feel free to report this all to the RH bugzilla - depending the
implications of the regression for supported configurations, it may
need to be fixed in RHEL anyway.

But now you know the problem, you can probably fix it yourself
rather than have to wait for RHEL/CentOS product cycle updates...

Cheers,

Dave.

PS: has the fact I quoted a RHEL5.9 commit id triggered a lightbulb
moment for you yet?  Hint: my other email address is
dchinner@redhat.com - this XFS community support effort was brought
to you by Red Hat.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-02-06 23:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-28 23:28 XFS appears to cause strange hang with md raid1 on reboot Tom
2013-01-29  0:05 ` Eric Sandeen
2013-01-29 21:47   ` Tom
2013-01-29 21:55     ` Eric Sandeen
2013-01-29 22:25       ` Tom
2013-01-29 22:39         ` Ben Myers
2013-01-30  8:54         ` Stan Hoeppner
2013-01-29 15:18 ` Ben Myers
2013-01-29 21:13   ` Tom
2013-01-30  3:16   ` Tom
2013-01-30 22:51     ` Ben Myers
2013-01-30 23:46     ` Dave Chinner
2013-01-31  2:30       ` Tom
2013-02-04 12:55         ` Dave Chinner
2013-02-05 18:22           ` Tom
2013-02-05 21:32             ` Dave Chinner
2013-02-05 23:05               ` Tom
2013-02-06  4:08               ` Tom
2013-02-06 23:51                 ` Dave Chinner [this message]
2013-02-07  4:18                   ` Tom
2013-01-31  7:35       ` Stefan Ring
2013-01-31  2:34 Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130206235137.GV2667@dastard \
    --to=david@fromorbit.com \
    --cc=storm9c1@skymagik.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.