All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Jannis Achstetter <jannis_achstetter@web.de>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Wed, 24 Oct 2012 17:31:29 -0400	[thread overview]
Message-ID: <20121024213129.GB5484@thunk.org> (raw)
In-Reply-To: <k69ejs$vt2$1@ger.gmane.org>

On Wed, Oct 24, 2012 at 09:13:01PM +0200, Jannis Achstetter wrote:
> 
> As a "normal linux user" I'm interested in the practical things to do
> now to avoid data loss. I'm running several systems with 3.6.2 and ext4.
> Fearing loss of data:
> - Is there a way to see whether the journal of a specific partition has
> been wrapped (since mounting) so that umounting and mounting (or doing a
> reboot to downgrade the kernel) is safe?

My initial analysis of what had been causing the problem now looks
incorrect (or at least incomplete).  Both Eric and I have been unable
to reproduce the failure based on my initial theory of what had been
going on.  So the best information at this point is that it's probably
not related to the file system getting unmounted before the journal
has wrapped.

(Keep in mind this is why commercial software corporations like
Microsoft or Apple generally don't make discussions as they are trying
to root cause a problem public; sometimes the initial theories can be
incorrect, and it's unfortunate when misinformation ends up on
Phoronix or Slashdot, leading to people to panic...  but this is open
source, so that means we do everything in the open, since that way we
can all work towards finding the best answer.)

At the *moment* it looks like it might be related to an unclean
shutdown (i.e., a forced reset or power failure while the file system
is mounted or is in the process of being unmounted).  That being said,
a simply kill -9 of kvm running a test kernel while the file system is
mounted by otherwise quiscient doesn't trigger the problem (I was
trying that last night).

It's a little bit too early for this meme:

    http://memegenerator.net/instance/28936247

But do please note that that Fedora !7 users have been using 3.6.2 for
a while, so if this were an easily triggered bug, (a) Eric and I would
have managed to reproduce it by now, and (b) lots of people would be
complaining, since the symptoms of the bug are not subtle.

That's not to say we aren't treating this seriously; but people
shouldn't panic unduly.... (and if you are using a critical
enterprise/production server on bleeding edge kernels, may I suggest
that this might not be such a good idea; there is a *reason* why
enterprise Linux distro's spend 6-9 months or more just stablizing the
kernel, and being super paranoid about making changes afterwards for
years, and it's not because they enjoy backporting patches and working
with trailing edge kernel sources.  :-)

Regards,

						- Ted

  reply	other threads:[~2012-10-24 21:31 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23  1:33 ` J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 16:56             ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
2012-10-23 17:57                     ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 18:23                       ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 20:57           ` Nix
2012-10-23 22:19           ` Theodore Ts'o
2012-10-23 22:47             ` Nix
2012-10-23 23:16               ` Theodore Ts'o
2012-10-23 23:06             ` Nix
2012-10-23 23:28               ` Theodore Ts'o
2012-10-23 23:34                 ` Nix
2012-10-24  0:57             ` Eric Sandeen
2012-10-24 20:17               ` Jan Kara
2012-10-26 15:25                 ` Eric Sandeen
2012-10-24 19:13             ` Jannis Achstetter
2012-10-24 19:13               ` Jannis Achstetter
2012-10-24 21:31               ` Theodore Ts'o [this message]
2012-10-24 22:05                 ` Jannis Achstetter
2012-10-24 23:47                 ` Nix
2012-10-25 17:02                 ` Felipe Contreras
2012-10-24 21:04             ` Jannis Achstetter
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  1:13             ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-24  5:23                 ` Theodore Ts'o
2012-10-24  7:00                   ` Hugh Dickins
2012-10-24 11:46                     ` Nix
2012-10-24 11:45                   ` Nix
2012-10-24 17:22                   ` Eric Sandeen
2012-10-24 19:49                   ` Nix
2012-10-24 19:54                     ` Nix
2012-10-24 20:30                     ` Eric Sandeen
2012-10-24 20:34                       ` Nix
2012-10-24 20:45                     ` Nix
2012-10-24 21:08                     ` Theodore Ts'o
2012-10-24 23:27                       ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42                         ` Nix
2012-10-25  1:10                         ` Theodore Ts'o
2012-10-25  1:45                           ` Nix
2012-10-25  1:45                             ` Nix
2012-10-25 14:12                             ` Theodore Ts'o
2012-10-25 14:15                               ` Nix
2012-10-25 17:39                                 ` Nix
2012-10-25 11:06                           ` Nix
2012-10-26  0:22                           ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26  0:11               ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26  0:43                 ` Theodore Ts'o
2012-10-26 12:12                   ` Nix
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o
2012-10-26 20:56                 ` Theodore Ts'o
2012-10-26 20:59                 ` Nix
2012-10-26 20:59                   ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:15                     ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27  0:22                         ` Theodore Ts'o
2012-10-27 12:45                         ` Nix
2012-10-27 17:55                           ` Theodore Ts'o
2012-10-27 18:47                             ` Nix
2012-10-27 21:19                               ` Eric Sandeen
2012-10-27 21:21                                 ` Nix
2012-10-27 21:23                                   ` Eric Sandeen
2012-10-27 21:29                                     ` Nix
2012-10-27 21:34                                       ` Eric Sandeen
2012-10-27 21:40                                         ` Nix
     [not found]                                         ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29  2:09                                           ` Eric Sandeen
2012-10-27 22:42                                 ` Eric Sandeen
2012-10-29  1:00                                   ` Theodore Ts'o
2012-10-29  1:04                                     ` Nix
2012-10-29  2:24                                     ` Eric Sandeen
2012-10-29  2:34                                       ` Theodore Ts'o
2012-10-29  2:35                                         ` Eric Sandeen
2012-10-29  2:42                                           ` Theodore Ts'o
2012-10-27 18:30                           ` Eric Sandeen
2012-10-27  3:11                     ` Jim Rees
2012-10-27  3:11                       ` Jim Rees
2012-10-27  8:01             ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28  4:23           ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28  4:23             ` Eric Sandeen
2012-10-28 13:59             ` Nix
2012-10-29  2:30             ` [PATCH -v3] " Theodore Ts'o
2012-10-29  2:30               ` Theodore Ts'o
2012-10-29  3:24               ` Eric Sandeen
2012-10-29  5:07               ` Andreas Dilger
2012-10-29 17:08               ` Darrick J. Wong
     [not found] <jXsTo-5lW-13@gated-at.bofh.it>
     [not found] ` <jXBDk-7vn-13@gated-at.bofh.it>
     [not found]   ` <jXNl8-5m5-13@gated-at.bofh.it>
     [not found]     ` <jXNOa-5MR-23@gated-at.bofh.it>
     [not found]       ` <jXPGh-87s-5@gated-at.bofh.it>
     [not found]         ` <jXTJW-4CH-55@gated-at.bofh.it>
     [not found]           ` <jXUZj-6mo-13@gated-at.bofh.it>
     [not found]             ` <jXVLH-7kO-5@gated-at.bofh.it>
     [not found]               ` <jXW53-7CC-5@gated-at.bofh.it>
     [not found]                 ` <jXWeJ-7Lk-1@gated-at.bofh.it>
2012-10-24 17:38                   ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Martin
2012-10-26 20:13                     ` Martin
2012-10-26 20:24                       ` Nix
2012-10-26 20:44                         ` Martin
2012-10-26 20:47                           ` Nix
2012-10-26 21:10                       ` Theodore Ts'o
2012-10-26 23:15                         ` Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121024213129.GB5484@thunk.org \
    --to=tytso@mit.edu \
    --cc=jannis_achstetter@web.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.