From: "Theodore Ts'o" <tytso@mit.edu>
To: Nix <nix@esperi.org.uk>
Cc: Eric Sandeen <sandeen@redhat.com>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
"J. Bruce Fields" <bfields@fieldses.org>,
Bryan Schumaker <bjschuma@netapp.com>,
Peng Tao <bergwolf@gmail.com>,
Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
linux-nfs@vger.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Fri, 26 Oct 2012 16:56:18 -0400 [thread overview]
Message-ID: <20121026205618.GC8614@thunk.org> (raw)
In-Reply-To: <87wqydx957.fsf@spindle.srvr.nix>
On Fri, Oct 26, 2012 at 09:37:08PM +0100, Nix wrote:
>
> I can reproduce this on a small filesystem and stick the image somewhere
> if that would be of any use to anyone. (If I'm very lucky, merely making
> this offer will make the problem go away. :} )
I'm not sure the image is going to be that useful. What we really
need to do is to get a reliable reproduction of what _you_ are seeing.
It's clear from Eric's experiments that journal_checksum is dangerous.
In fact, I will likely put it under an #ifdef EXT4_EXPERIMENTAL to try
to discourage people from using it in the future. There are things
I've been planning on doing to make it be safer, but there's a very
good *reason* that both journal_checksum and journal_async_commit are
not on by default.
That's why one of the things I asked you to do when you had time was
to see if you could reproduce the problem you are seeing w/o
nobarrier,journal_checksum,journal_async_commit.
The other experiment that would be really useful if you could do is to
try to apply these two patches which I sent earlier this week:
[PATCH 1/2] ext4: revert "jbd2: don't write superblock when if its empty
[PATCH 2/2] ext4: fix I/O error when unmounting an ro file system
... and see if they make a difference.
If they don't make a difference, I don't want to apply patches just
for placebo/PR reasons. And for Eric at least, he can reproduce the
journal checksum error followed by fairly significant corruption
reported by e2fsck with journal_checksum, and the presence or absense
of these patches make no difference for him. So I really don't want
to push these patches to Linus until I get confirmation that they make
a difference to *somebody*.
Regards,
- Ted
next prev parent reply other threads:[~2012-10-26 20:56 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23 1:33 ` J. Bruce Fields
2012-10-23 14:07 ` Nix
2012-10-23 14:30 ` J. Bruce Fields
2012-10-23 16:32 ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46 ` J. Bruce Fields
2012-10-23 16:54 ` J. Bruce Fields
2012-10-23 16:56 ` Myklebust, Trond
2012-10-23 17:05 ` Nix
2012-10-23 17:36 ` Nix
2012-10-23 17:43 ` J. Bruce Fields
2012-10-23 17:44 ` Myklebust, Trond
2012-10-23 17:57 ` Myklebust, Trond
[not found] ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23 ` Myklebust, Trond
2012-10-23 19:49 ` Nix
2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
2012-10-24 21:04 ` Jannis Achstetter
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 0:11 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26 0:43 ` Theodore Ts'o
2012-10-26 12:12 ` Nix
2012-10-26 20:35 ` Eric Sandeen
2012-10-26 20:37 ` Nix
2012-10-26 20:56 ` Theodore Ts'o [this message]
2012-10-26 20:59 ` Nix
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
2012-10-27 3:11 ` Jim Rees
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 17:08 ` Darrick J. Wong
[not found] <jXsTo-5lW-13@gated-at.bofh.it>
[not found] ` <jXBDk-7vn-13@gated-at.bofh.it>
[not found] ` <jXNl8-5m5-13@gated-at.bofh.it>
[not found] ` <jXNOa-5MR-23@gated-at.bofh.it>
[not found] ` <jXPGh-87s-5@gated-at.bofh.it>
[not found] ` <jXTJW-4CH-55@gated-at.bofh.it>
[not found] ` <jXUZj-6mo-13@gated-at.bofh.it>
[not found] ` <jXVLH-7kO-5@gated-at.bofh.it>
[not found] ` <jXW53-7CC-5@gated-at.bofh.it>
[not found] ` <jXWeJ-7Lk-1@gated-at.bofh.it>
2012-10-24 17:38 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Martin
2012-10-26 20:13 ` Martin
2012-10-26 20:24 ` Nix
2012-10-26 20:44 ` Martin
2012-10-26 20:47 ` Nix
2012-10-26 21:10 ` Theodore Ts'o
2012-10-26 23:15 ` Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121026205618.GC8614@thunk.org \
--to=tytso@mit.edu \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bfields@fieldses.org \
--cc=bjschuma@netapp.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).