All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nix <nix@esperi.org.uk>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Bryan Schumaker" <bjschuma@netapp.com>,
	"Peng Tao" <bergwolf@gmail.com>,
	Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
	"Toralf Förster" <toralf.foerster@gmx.de>,
	"Eric Sandeen" <sandeen@redhat.com>,
	stable@vger.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Tue, 23 Oct 2012 23:47:27 +0100	[thread overview]
Message-ID: <87k3ugn6v4.fsf@spindle.srvr.nix> (raw)
In-Reply-To: <20121023221913.GC28626@thunk.org> (Theodore Ts'o's message of "Tue, 23 Oct 2012 18:19:13 -0400")

On 23 Oct 2012, Theodore Ts'o said:

> The reason why the problem happens rarely is that the effect of the
> buggy commit is that if the journal's starting block is zero, we fail
> to truncate the journal when we unmount the file system.

Oh dear oh dear.

>                                                           This can
> happen if we mount and then unmount the file system fairly quickly,
> before the log has a chance to wrap.

... which is quite likely if you're rebooting frequently to try to track
down some other kernel bug.

>                                       After the first time this has
> happened, it's not a disaster, since when we replay the journal, we'll
> just replay some extra transactions.  But if this happens twice, the
> oldest valid transaction will still not have gotten updated, but some
> of the newer transactions from the last mount session will have gotten
> written by the very latest transacitons, and when we then try to do
> the extra transaction replays, the metadata blocks can end up getting
> very scrambled indeed.

Ow. OK, it's a good thing I rebooted fast. :) and only fses that got
written to, but not too much, will see this. Hence my /usr/src stayed
intact because it had lots of updates of lots of tiny files, more than
enough to cause the journal to wrap over and over again, even
journalling only metadata. But /home doesn't see so many updates, and
neither does /var...

This seems to explain everything.

It looks like fscking everything will fix it (it'll replay the buggered
journal, mangling the metadata, but then fix up the scrambled metadata
and fix the journal's starting block). So I probably don't need to worry
about latent corruption hiding waiting to pounce. Phew.

> *Sigh*.  My apologies for not catching this when I reviewed this
> patch.  I believe the following patch should fix the bug; once it's
> reviewed by other ext4 developers, I'll push this to Linus ASAP.

No problem. This is my first data-corruption bug in more than seventeen
years of ext* use (it even survived horribly faulty RAM). I call that a
good record. And it happened one day after a full backup, and was
immediately highlighted by corruption of .bash_history and input/output
errors logging in -- and fsck pretty much fixed the problem, with only a
few missing files, one file full of garbage, and one high-ASCII filename
in a temporary directory to show for it. I call that luckier than I have
any right to be.

Plus, my faith in the amazingly fast bugfixing talents of ext4 devs is
undimmed! :)

> 						- Ted
>
> commit 26de1ba5acc39f0ab57ce1ed523cb128e4ad73a4
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Tue Oct 23 18:15:22 2012 -0400
>
>     jbd2: fix a potential fs corrupting bug in jbd2_mark_journal_empty

I'll apply this tomorrow (enough fun with filesystem restoration for
today) and see what happens. (What could *possibly* go wrong?)

But I might not upgrade to stable kernels quite so often in future :(
you know what they say: once burnt, twice not upgrading before doing a
full backup!

-- 
NULL && (void)

  reply	other threads:[~2012-10-23 22:47 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23  1:33 ` J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 16:56             ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
2012-10-23 17:57                     ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 18:23                       ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 20:57           ` Nix
2012-10-23 22:19           ` Theodore Ts'o
2012-10-23 22:47             ` Nix [this message]
2012-10-23 23:16               ` Theodore Ts'o
2012-10-23 23:06             ` Nix
2012-10-23 23:28               ` Theodore Ts'o
2012-10-23 23:34                 ` Nix
2012-10-24  0:57             ` Eric Sandeen
2012-10-24 20:17               ` Jan Kara
2012-10-26 15:25                 ` Eric Sandeen
2012-10-24 19:13             ` Jannis Achstetter
2012-10-24 19:13               ` Jannis Achstetter
2012-10-24 21:31               ` Theodore Ts'o
2012-10-24 22:05                 ` Jannis Achstetter
2012-10-24 23:47                 ` Nix
2012-10-25 17:02                 ` Felipe Contreras
2012-10-24 21:04             ` Jannis Achstetter
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  1:13             ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-24  5:23                 ` Theodore Ts'o
2012-10-24  7:00                   ` Hugh Dickins
2012-10-24 11:46                     ` Nix
2012-10-24 11:45                   ` Nix
2012-10-24 17:22                   ` Eric Sandeen
2012-10-24 19:49                   ` Nix
2012-10-24 19:54                     ` Nix
2012-10-24 20:30                     ` Eric Sandeen
2012-10-24 20:34                       ` Nix
2012-10-24 20:45                     ` Nix
2012-10-24 21:08                     ` Theodore Ts'o
2012-10-24 23:27                       ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42                         ` Nix
2012-10-25  1:10                         ` Theodore Ts'o
2012-10-25  1:45                           ` Nix
2012-10-25  1:45                             ` Nix
2012-10-25 14:12                             ` Theodore Ts'o
2012-10-25 14:15                               ` Nix
2012-10-25 17:39                                 ` Nix
2012-10-25 11:06                           ` Nix
2012-10-26  0:22                           ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26  0:11               ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26  0:43                 ` Theodore Ts'o
2012-10-26 12:12                   ` Nix
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o
2012-10-26 20:56                 ` Theodore Ts'o
2012-10-26 20:59                 ` Nix
2012-10-26 20:59                   ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:15                     ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27  0:22                         ` Theodore Ts'o
2012-10-27 12:45                         ` Nix
2012-10-27 17:55                           ` Theodore Ts'o
2012-10-27 18:47                             ` Nix
2012-10-27 21:19                               ` Eric Sandeen
2012-10-27 21:21                                 ` Nix
2012-10-27 21:23                                   ` Eric Sandeen
2012-10-27 21:29                                     ` Nix
2012-10-27 21:34                                       ` Eric Sandeen
2012-10-27 21:40                                         ` Nix
     [not found]                                         ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29  2:09                                           ` Eric Sandeen
2012-10-27 22:42                                 ` Eric Sandeen
2012-10-29  1:00                                   ` Theodore Ts'o
2012-10-29  1:04                                     ` Nix
2012-10-29  2:24                                     ` Eric Sandeen
2012-10-29  2:34                                       ` Theodore Ts'o
2012-10-29  2:35                                         ` Eric Sandeen
2012-10-29  2:42                                           ` Theodore Ts'o
2012-10-27 18:30                           ` Eric Sandeen
2012-10-27  3:11                     ` Jim Rees
2012-10-27  3:11                       ` Jim Rees
2012-10-27  8:01             ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28  4:23           ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28  4:23             ` Eric Sandeen
2012-10-28 13:59             ` Nix
2012-10-29  2:30             ` [PATCH -v3] " Theodore Ts'o
2012-10-29  2:30               ` Theodore Ts'o
2012-10-29  3:24               ` Eric Sandeen
2012-10-29  5:07               ` Andreas Dilger
2012-10-29 17:08               ` Darrick J. Wong
     [not found] <jXsTo-5lW-13@gated-at.bofh.it>
     [not found] ` <jXBDk-7vn-13@gated-at.bofh.it>
     [not found]   ` <jXNl8-5m5-13@gated-at.bofh.it>
     [not found]     ` <jXNOa-5MR-23@gated-at.bofh.it>
     [not found]       ` <jXPGh-87s-5@gated-at.bofh.it>
     [not found]         ` <jXTJW-4CH-55@gated-at.bofh.it>
     [not found]           ` <jXUZj-6mo-13@gated-at.bofh.it>
     [not found]             ` <jXVLH-7kO-5@gated-at.bofh.it>
     [not found]               ` <jXW53-7CC-5@gated-at.bofh.it>
     [not found]                 ` <jXWeJ-7Lk-1@gated-at.bofh.it>
2012-10-24 17:38                   ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Martin
2012-10-26 20:13                     ` Martin
2012-10-26 20:24                       ` Nix
2012-10-26 20:44                         ` Martin
2012-10-26 20:47                           ` Nix
2012-10-26 21:10                       ` Theodore Ts'o
2012-10-26 23:15                         ` Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3ugn6v4.fsf@spindle.srvr.nix \
    --to=nix@esperi.org.uk \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=bjschuma@netapp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=toralf.foerster@gmx.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.