linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nix <nix@esperi.org.uk>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Eric Sandeen" <sandeen@redhat.com>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Bryan Schumaker" <bjschuma@netapp.com>,
	"Peng Tao" <bergwolf@gmail.com>,
	Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
	"Toralf Förster" <toralf.foerster@gmx.de>
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)
Date: Thu, 25 Oct 2012 02:45:57 +0100	[thread overview]
Message-ID: <87y5iv5noq.fsf@spindle.srvr.nix> (raw)
In-Reply-To: <20121025011056.GC4559@thunk.org> (Theodore Ts'o's message of "Wed, 24 Oct 2012 21:10:56 -0400")

On 25 Oct 2012, Theodore Ts'o stated:

> On Thu, Oct 25, 2012 at 12:27:02AM +0100, Nix wrote:
>>
>>  - /sbin/reboot -f of running system
>>    -> Journal replay, no problems other than the expected free block
>>       count problems. This is not such a severe problem after all!
>> 
>>  - Normal shutdown, but a 60 second pause after lazy umount, more than
>>    long enough for all umounts to proceed to termination
>>    -> no corruption, but curiously /home experienced a journal replay
>>       before being fscked, even though a cat of /proc/mounts after
>>       umounting revealed that the only mounted filesystem was /,
>>       read-only, so /home should have been clean
>
> Question: how are you doing the journal replay?  Is it happening as
> part of running e2fsck, or are you mounting the file system and
> letting kernel do the journal replay?

This most recent instance was e2fsck. Normally, it's mount. Both
seem able to yield the same corruption.

> Also, can you reproduce the problem with the nobarrier and
> journal_async_commit options *removed*?  Yes, I know you have battery
> backup, but it would be interesting to see if the problem shows up in
> the default configuration with none of the more specialist options.
> (So it would probably be good to test with journal_checksum removed as
> well.)

I'll try that, hopefully tomorrow sometime. It's 2:30am now and probably
time to sleep.

>> Unfortunately, the massive corruption in the last testcase was seen in
>> 3.6.1 as well as 3.6.3: it appears that the only effect that superblock
>> change had in 3.6.3 was to make this problem easier to hit, and that the
>> bug itself was introduced probably somewhere between 3.5 and 3.6 (though
>> I only rebooted 3.5.x twice, and it's rare enough before 3.6.[23], at
>> ~1/20 boots, that it may have been present for longer and I never
>> noticed).
>
> Hmm.... ok.  Can you tell whether or not the 2nd patch I posted on
> this thread made any difference to how frequently it happened?  The

Well, I had a couple of reboots without corruption with that patch
applied, and /home was only ever corrupted with it not applied -- but
that could perfectly well be chance, since I only had two or three
instances of /home corruption so far, thank goodness.

> When you say it's rare before 3.6.[23], how rare is it?  How reliably
> can you trigger it under 3.6.1?  One in 3?  One in 5?  One in 20?

I've rebooted out of 3.6.1 about fifteen times so far. I've seen once
instance of corruption. I've never seen it before 3.6, but I only
rebooted 3.5.x or 3.4.x once or twice in total, so that too could be
chance.

> As far as bisecting, one experiment that I'd really appreciate your
> doing is to check and see whether you can reproduce the problem using
> the 3.4 kernel, and if you can, to see if it reproduces under the 3.3
> kernel.

Will try. It might be the weekend before I can find the time though :(

> The reason why I ask this is there were not any major changes between
> 3.5 and 3.6, or between 3.4 and 3.5.  There *were* however, some
> fairly major changes made by Jan Kara that were introduced between 3.3
> and 3.4.  Among other things, this is where we started using FUA
> (Force Unit Attention) writes to update the journal superblock instead
> of just using REQ_FLUSH.  This is in fact the most likely place where
> we might have introduced the regression, since it wouldn't surprise me
> if Jan didn't test the case of using nobarrier with a storage array
> with battery backup (I certainly didn't, since I don't have easy
> access to such fancy toys :-).

Hm. At boot, I see this for both volumes on the Areca controller:

[    0.855376] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.855465] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

So it looks to me like FUA changes themselves could have little effect.

(btw, the controller cost only about £150... if it was particularly
fancy I certainly couldn't have afforded it.)

>> It also appears impossible for me to reliably shut my system down,
>> though a 60s timeout after lazy umount and before reboot is likely to
>> work in all but the most pathological of cases (where a downed NFS
>> server comes up at just the wrong instant): it is clear that the
>> previous 5s timeout eventually became insufficient simply because of the
>> amount of time it can take to do a umount on today's larger filesystems.
>
> Something that you might want to consider trying is after you kill all
> of the processes, remount all of the local disk file systems
> read-only, then kick off the unmount of the NFS file systems (just to
> be nice to the NFS servers, so they are notified of the unmount), and

Actually I umount NFS first of all, because if I kill the processes
first, this causes trouble with the NFS unmounts, particularly if I'm
doing self-mounting (which I do sometimes, though not at the moment).

I will certainly try a readonly remount instead.

> force a flush on an unmount or remount r/o, regardless of whether
> nobarrier is specified, just to make sure everything is written out
> before the poweroff, battery backup or no.)

I'm rather surprised that doesn't happen anyway. I always thought it
did.

-- 
NULL && (void)

  reply	other threads:[~2012-10-25  1:46 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23  1:33 ` J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19           ` Theodore Ts'o
2012-10-23 22:47             ` Nix
2012-10-23 23:16               ` Theodore Ts'o
2012-10-23 23:06             ` Nix
2012-10-23 23:28               ` Theodore Ts'o
2012-10-23 23:34                 ` Nix
2012-10-24  0:57             ` Eric Sandeen
2012-10-24 20:17               ` Jan Kara
2012-10-26 15:25                 ` Eric Sandeen
2012-10-24 19:13             ` Jannis Achstetter
2012-10-24 21:31               ` Theodore Ts'o
2012-10-24 22:05                 ` Jannis Achstetter
2012-10-24 23:47                 ` Nix
2012-10-25 17:02                 ` Felipe Contreras
2012-10-24 21:04             ` Jannis Achstetter
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-24  5:23                 ` Theodore Ts'o
2012-10-24  7:00                   ` Hugh Dickins
2012-10-24 11:46                     ` Nix
2012-10-24 11:45                   ` Nix
2012-10-24 17:22                   ` Eric Sandeen
2012-10-24 19:49                   ` Nix
2012-10-24 19:54                     ` Nix
2012-10-24 20:30                     ` Eric Sandeen
2012-10-24 20:34                       ` Nix
2012-10-24 20:45                     ` Nix
2012-10-24 21:08                     ` Theodore Ts'o
2012-10-24 23:27                       ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42                         ` Nix
2012-10-25  1:10                         ` Theodore Ts'o
2012-10-25  1:45                           ` Nix [this message]
2012-10-25 14:12                             ` Theodore Ts'o
2012-10-25 14:15                               ` Nix
2012-10-25 17:39                                 ` Nix
2012-10-25 11:06                           ` Nix
2012-10-26  0:22                           ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26  0:11               ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26  0:43                 ` Theodore Ts'o
2012-10-26 12:12                   ` Nix
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o
2012-10-26 20:59                 ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27 12:45                         ` Nix
2012-10-27 17:55                           ` Theodore Ts'o
2012-10-27 18:47                             ` Nix
2012-10-27 21:19                               ` Eric Sandeen
2012-10-27 22:42                                 ` Eric Sandeen
2012-10-29  1:00                                   ` Theodore Ts'o
2012-10-29  1:04                                     ` Nix
2012-10-29  2:24                                     ` Eric Sandeen
2012-10-29  2:34                                       ` Theodore Ts'o
2012-10-29  2:35                                         ` Eric Sandeen
2012-10-29  2:42                                           ` Theodore Ts'o
2012-10-27 18:30                           ` Eric Sandeen
2012-10-27  3:11                     ` Jim Rees
2012-10-28  4:23           ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59             ` Nix
2012-10-29  2:30             ` [PATCH -v3] " Theodore Ts'o
2012-10-29  3:24               ` Eric Sandeen
2012-10-29 17:08               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y5iv5noq.fsf@spindle.srvr.nix \
    --to=nix@esperi.org.uk \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=bjschuma@netapp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=toralf.foerster@gmx.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).