All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nix <nix@esperi.org.uk>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Eric Sandeen" <sandeen@redhat.com>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Bryan Schumaker" <bjschuma@netapp.com>,
	"Peng Tao" <bergwolf@gmail.com>,
	Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
	"Toralf Förster" <toralf.foerster@gmx.de>
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)
Date: Thu, 25 Oct 2012 00:27:02 +0100	[thread overview]
Message-ID: <87y5iv78op.fsf_-_@spindle.srvr.nix> (raw)
In-Reply-To: <20121024210819.GA5484@thunk.org> (Theodore Ts'o's message of "Wed, 24 Oct 2012 17:08:19 -0400")

On 24 Oct 2012, Theodore Ts'o verbalised:

> On Wed, Oct 24, 2012 at 09:45:47PM +0100, Nix wrote:
>> 
>> It occurs to me that it is possible that this bug hits only those
>> filesystems for which a umount has started but been unable to complete.
>> If so, this is a relatively rare and unimportant bug which probably hits
>> only me and users of slow removable filesystems in the whole world...
>
> Can you verify this?  Does the bug show up if you just hit the power
> switch while the system is booted?

Verified! You do indeed need to do passing strange things to trigger
this bug -- not surprising, really, or everyone and his dog would have
reported it by now. As it is, I'm sorry this hit slashdot, because it
reflects unnecessarily badly on a filesystem that is experiencing
problems only when people do rather insane things to it.

> How about changing the "sleep 2" to "sleep 0.5"?

I tried the following:

 - /sbin/reboot -f of running system
   -> Journal replay, no problems other than the expected free block
      count problems. This is not such a severe problem after all!

 - Normal shutdown, but a 60 second pause after lazy umount, more than
   long enough for all umounts to proceed to termination
   -> no corruption, but curiously /home experienced a journal replay
      before being fscked, even though a cat of /proc/mounts after
      umounting revealed that the only mounted filesystem was /,
      read-only, so /home should have been clean

 - Normal shutdown, a 60 second pause after lazy umount of everything
   other than /var, and then a umount of /var the instant before
   reboot, no sleep at all
   -> massive corruption just as seen before.

Unfortunately, the massive corruption in the last testcase was seen in
3.6.1 as well as 3.6.3: it appears that the only effect that superblock
change had in 3.6.3 was to make this problem easier to hit, and that the
bug itself was introduced probably somewhere between 3.5 and 3.6 (though
I only rebooted 3.5.x twice, and it's rare enough before 3.6.[23], at
~1/20 boots, that it may have been present for longer and I never
noticed).

So the problem is caused by rebooting or powering off or disconnecting
the device *while* umounting a filesystem with a dirty journal, and
might have been introduced by I/O scheduler changes or who knows what
other changes, not just ext4 changes, since the order of block writes by
umount is clearly at issue.

Even though my own system relies on the possibility of rebooting during
umount to reboot reliably, I'd be inclined to say 'not a bug, don't do
that then' -- except that this renders it unreliable to use umount -l to
unmount all the filesystems you can, skipping those that are not
reachable due to having unresponsive servers in the way. As far as I can
tell, there is *no* way to tell when a lazy umount has completed, except
perhaps for polling /proc/mounts: and there is no way at all to tell
when a lazy umount switches from 'waiting for the last process to stop
using me, you can reboot without incident' to 'doing umount, rebooting
is disastrous'. And unfortunately I want to reboot if we're in the
former state, but not in the latter.

(It also makes it unreliable to use ext4 on devices like USB sticks that
might suddenly get disconnected during a umount.)


Further, it seems to me that this makes it dangerous to ever use umount
-l at all, even during normal system operation, since the real umount
might only start when all processes are killed at system shutdown, and
the reboot could well kick in before the umount has finished.

It also appears impossible for me to reliably shut my system down,
though a 60s timeout after lazy umount and before reboot is likely to
work in all but the most pathological of cases (where a downed NFS
server comes up at just the wrong instant): it is clear that the
previous 5s timeout eventually became insufficient simply because of the
amount of time it can take to do a umount on today's larger filesystems.

Truly, my joy is unbounded :(

> 0) Make sure the reliable repro does _not_ work with 3.6.1 booted

Oh dear. Sorry :(((

I can try to bisect this and track down which kernel release it appeared
in -- if it isn't expected behaviour, of course, which is perfectly
possible: rebooting during a umount is at best questionable. But I can't
do anything that lengthy before the weekend, I'm afraid.

-- 
NULL && (void)

  reply	other threads:[~2012-10-24 23:27 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23  1:33 ` J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 16:56             ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
2012-10-23 17:57                     ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 18:23                       ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 20:57           ` Nix
2012-10-23 22:19           ` Theodore Ts'o
2012-10-23 22:47             ` Nix
2012-10-23 23:16               ` Theodore Ts'o
2012-10-23 23:06             ` Nix
2012-10-23 23:28               ` Theodore Ts'o
2012-10-23 23:34                 ` Nix
2012-10-24  0:57             ` Eric Sandeen
2012-10-24 20:17               ` Jan Kara
2012-10-26 15:25                 ` Eric Sandeen
2012-10-24 19:13             ` Jannis Achstetter
2012-10-24 19:13               ` Jannis Achstetter
2012-10-24 21:31               ` Theodore Ts'o
2012-10-24 22:05                 ` Jannis Achstetter
2012-10-24 23:47                 ` Nix
2012-10-25 17:02                 ` Felipe Contreras
2012-10-24 21:04             ` Jannis Achstetter
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  1:13             ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-24  5:23                 ` Theodore Ts'o
2012-10-24  7:00                   ` Hugh Dickins
2012-10-24 11:46                     ` Nix
2012-10-24 11:45                   ` Nix
2012-10-24 17:22                   ` Eric Sandeen
2012-10-24 19:49                   ` Nix
2012-10-24 19:54                     ` Nix
2012-10-24 20:30                     ` Eric Sandeen
2012-10-24 20:34                       ` Nix
2012-10-24 20:45                     ` Nix
2012-10-24 21:08                     ` Theodore Ts'o
2012-10-24 23:27                       ` Nix [this message]
2012-10-24 23:42                         ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-25  1:10                         ` Theodore Ts'o
2012-10-25  1:45                           ` Nix
2012-10-25  1:45                             ` Nix
2012-10-25 14:12                             ` Theodore Ts'o
2012-10-25 14:15                               ` Nix
2012-10-25 17:39                                 ` Nix
2012-10-25 11:06                           ` Nix
2012-10-26  0:22                           ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26  0:11               ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26  0:43                 ` Theodore Ts'o
2012-10-26 12:12                   ` Nix
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o
2012-10-26 20:56                 ` Theodore Ts'o
2012-10-26 20:59                 ` Nix
2012-10-26 20:59                   ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:15                     ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27  0:22                         ` Theodore Ts'o
2012-10-27 12:45                         ` Nix
2012-10-27 17:55                           ` Theodore Ts'o
2012-10-27 18:47                             ` Nix
2012-10-27 21:19                               ` Eric Sandeen
2012-10-27 21:21                                 ` Nix
2012-10-27 21:23                                   ` Eric Sandeen
2012-10-27 21:29                                     ` Nix
2012-10-27 21:34                                       ` Eric Sandeen
2012-10-27 21:40                                         ` Nix
     [not found]                                         ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29  2:09                                           ` Eric Sandeen
2012-10-27 22:42                                 ` Eric Sandeen
2012-10-29  1:00                                   ` Theodore Ts'o
2012-10-29  1:04                                     ` Nix
2012-10-29  2:24                                     ` Eric Sandeen
2012-10-29  2:34                                       ` Theodore Ts'o
2012-10-29  2:35                                         ` Eric Sandeen
2012-10-29  2:42                                           ` Theodore Ts'o
2012-10-27 18:30                           ` Eric Sandeen
2012-10-27  3:11                     ` Jim Rees
2012-10-27  3:11                       ` Jim Rees
2012-10-27  8:01             ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28  4:23           ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28  4:23             ` Eric Sandeen
2012-10-28 13:59             ` Nix
2012-10-29  2:30             ` [PATCH -v3] " Theodore Ts'o
2012-10-29  2:30               ` Theodore Ts'o
2012-10-29  3:24               ` Eric Sandeen
2012-10-29  5:07               ` Andreas Dilger
2012-10-29 17:08               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y5iv78op.fsf_-_@spindle.srvr.nix \
    --to=nix@esperi.org.uk \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=bjschuma@netapp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=toralf.foerster@gmx.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.