From: Eric Sandeen <sandeen@redhat.com>
To: Nix <nix@esperi.org.uk>
Cc: "Ted Ts'o" <tytso@mit.edu>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
"J. Bruce Fields" <bfields@fieldses.org>,
Bryan Schumaker <bjschuma@netapp.com>,
Peng Tao <bergwolf@gmail.com>,
Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
linux-nfs@vger.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Tue, 23 Oct 2012 20:13:22 -0500 [thread overview]
Message-ID: <508740B2.2030401@redhat.com> (raw)
In-Reply-To: <87pq48nbyz.fsf_-_@spindle.srvr.nix>
On 10/23/12 3:57 PM, Nix wrote:
<snip>
> (I'd provide more sample errors, but this bug has been eating
> newly-written logs in /var all day, so not much has survived.)
>
> I rebooted into 3.6.1 rescue mode and fscked everything: lots of
> orphans, block group corruption and cross-linked files. The problems did
> not recur upon booting from 3.6.1 into 3.6.1 again. It is quite clear
> that metadata changes made in 3.6.3 are not making it to disk reliably,
> thus leading to corrupted filesystems marked clean on reboot into other
> kernels: pretty much every file appended to in 3.6.3 loses some or all
> of its appended data, and newly allocated blocks often end up
> cross-linked between multiple files.
>
> The curious thing is this doesn't affect every filesystem: for a while
> it affected only /var, and now it's affecting only /var and /home. The
> massive writes to the ext4 filesystem mounted on /usr/src seem to have
> gone off without incident: fsck reports no problems.
>
>
> The only unusual thing about the filesystems on this machine are that
> they have hardware RAID-5 (using the Areca driver), so I'm mounting with
> 'nobarrier':
I should have read more. :( More questions follow:
* Does the Areca have a battery backed write cache?
* Are you crashing or rebooting cleanly?
* Do you see log recovery messages in the logs for this filesystem?
> the full set of options for all my ext4 filesystems are:
>
> rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
> usrquota,grpquota,commit=30,stripe=16,data=ordered,usrquota,grpquota
ok journal_async_commit is off the reservation a bit; that's really not
tested, and Jan had serious reservations about its safety.
* Can you reproduce this w/o journal_async_commit?
-Eric
> If there's anything I can do to help, I'm happy to do it, once I've
> restored my home directory from backup :(
>
>
> tune2fs output for one of the afflicted filesystems (after fscking):
>
> tune2fs 1.42.2 (9-Apr-2012)
> Filesystem volume name: home
> Last mounted on: /home
> Filesystem UUID: 95bd22c2-253c-456f-8e36-b6cfb9ecd4ef
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 3276800
> Block count: 13107200
> Reserved block count: 655360
> Free blocks: 5134852
> Free inodes: 3174777
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 20
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 8192
> Inode blocks per group: 512
> RAID stripe width: 16
> Flex block group size: 64
> Filesystem created: Tue May 26 21:29:41 2009
> Last mount time: Tue Oct 23 21:32:07 2012
> Last write time: Tue Oct 23 21:32:07 2012
> Mount count: 2
> Maximum mount count: 20
> Last checked: Tue Oct 23 21:22:16 2012
> Check interval: 15552000 (6 months)
> Next check after: Sun Apr 21 21:22:16 2013
> Lifetime writes: 1092 GB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> First orphan inode: 1572907
> Default directory hash: half_md4
> Directory Hash Seed: a201983d-d8a3-460b-93ca-eb7804b62c23
> Journal backup: inode blocks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-10-24 1:13 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23 1:33 ` J. Bruce Fields
2012-10-23 14:07 ` Nix
2012-10-23 14:30 ` J. Bruce Fields
2012-10-23 16:32 ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46 ` J. Bruce Fields
2012-10-23 16:54 ` J. Bruce Fields
2012-10-23 16:56 ` Myklebust, Trond
2012-10-23 17:05 ` Nix
2012-10-23 17:36 ` Nix
2012-10-23 17:43 ` J. Bruce Fields
2012-10-23 17:44 ` Myklebust, Trond
2012-10-23 17:57 ` Myklebust, Trond
[not found] ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23 ` Myklebust, Trond
2012-10-23 19:49 ` Nix
2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
2012-10-24 21:04 ` Jannis Achstetter
2012-10-24 1:13 ` Eric Sandeen [this message]
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 0:11 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26 0:43 ` Theodore Ts'o
2012-10-26 12:12 ` Nix
2012-10-26 20:35 ` Eric Sandeen
2012-10-26 20:37 ` Nix
2012-10-26 20:56 ` Theodore Ts'o
2012-10-26 20:59 ` Nix
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
2012-10-27 3:11 ` Jim Rees
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 17:08 ` Darrick J. Wong
[not found] <jXsTo-5lW-13@gated-at.bofh.it>
[not found] ` <jXBDk-7vn-13@gated-at.bofh.it>
[not found] ` <jXNl8-5m5-13@gated-at.bofh.it>
[not found] ` <jXNOa-5MR-23@gated-at.bofh.it>
[not found] ` <jXPGh-87s-5@gated-at.bofh.it>
[not found] ` <jXTJW-4CH-55@gated-at.bofh.it>
[not found] ` <jXUZj-6mo-13@gated-at.bofh.it>
[not found] ` <jXVLH-7kO-5@gated-at.bofh.it>
[not found] ` <jXW53-7CC-5@gated-at.bofh.it>
[not found] ` <jXWeJ-7Lk-1@gated-at.bofh.it>
2012-10-24 17:38 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Martin
2012-10-26 20:13 ` Martin
2012-10-26 20:24 ` Nix
2012-10-26 20:44 ` Martin
2012-10-26 20:47 ` Nix
2012-10-26 21:10 ` Theodore Ts'o
2012-10-26 23:15 ` Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508740B2.2030401@redhat.com \
--to=sandeen@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bfields@fieldses.org \
--cc=bjschuma@netapp.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).