All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Pavel Machek <pavel@ucw.cz>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: Re: ext4: media error but where?
Date: Fri, 4 Jul 2014 08:11:19 -0400	[thread overview]
Message-ID: <20140704121119.GB10514@thunk.org> (raw)
In-Reply-To: <20140704102307.GA19252@amd.pavel.ucw.cz>

On Fri, Jul 04, 2014 at 12:23:07PM +0200, Pavel Machek wrote:
> 
> pavel@duo:~$ uname -a
> Linux duo 3.15.0-rc8+ #365 SMP Mon Jun 9 09:18:29 CEST 2014 i686
> GNU/Linux
> 
> EXT4-fs (sda3): error count: 11
> EXT4-fs (sda3): initial error at 1401714179: ext4_mb_generate_buddy:756
> EXT4-fs (sda3): last error at 1401714179: ext4_reserve_inode_write:4877
> 
> That sounds like media error to me?

If you search your system logs since the last fsck, you should find 11
instances of "EXT4-fs error" message, which means that there was some
file system inconsisntencies detected.  The first error was detected at:

% date -d @1401714179
Mon Jun  2 09:02:59 EDT 2014

... which means that you haven't rebooted in a month, or your boot
scripts aren't automatically running fsck, or your clock is
incorrect.

The first inconsistency was detected in the function
ext4_mb_generate_buddy(), in line 756.  This means there's an
inconsistency between the number of blocks marked as in use in a block
allocation bitmap, and summary statistics in the block group
descriptor.  This can be caused by a hardware hiccup, or some kind of
kernel bug.

People have been reporting an increased incidence rate of this bug
since 3.15, so it's something we're trying to track down.  There have
been some reports of eMMC bugs in 3.15 (see one such report at:
https://lkml.org/lkml/2014/6/12/19).  But other people are reporting
this on SSD's such as the Samsung 840 PRO, which is a SATA attached
device.  See some of the messages on ext4 with the subject line:
"ext4: journal has aborted").

At this point I suspect we have multiple causes that result in the
same symptom that have all appeared at about the same time, which has
made tracking down the root cause(s) very difficult.

It does seem to happen more often after an unclean shutdown, and there
does seem to be a very high correlation with eMMC devices.  It's
possible there is a jbd2 bug that got introduced recently, where ext4
is modifying some field outside of a journal transaction.  But I
haven't been able to reproduce this yet in controlled circumstances.

What I need from people reporting problems: 

* What is the HDD/SSD/eMMC device involved

* What kernel version were you running

* What distribution are you running (more so I know what the init
  scripts might or might not have been doing vis-a-vis running fsck
  after a crash)

* Was there an unclean shutdown / power drop / hard reset involved?
  If so, did the HDD/SSD/eMMC lose power, or was the reset button hit
  on the machine?

* What sort of workload / application / test program running before
  the crash, if any?


I really need all of this information, especially since at this point
I suspect there may be more than one cause with similar symptoms.  So
it's important that just because someone else reports a similar
symptom, that folks not assume because one person has reported one set
of hardware / software details, that it's the same problem as theirs,
and so they don't need to report anymore info.  I need as many data
points as possible at this point.

						- Ted

  reply	other threads:[~2014-07-04 12:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-26 20:20 ext4: total breakdown on USB hdd, 3.0 kernel Pavel Machek
2014-06-26 20:30 ` Pavel Machek
2014-06-26 20:50   ` Pavel Machek
2014-06-27  2:48     ` Theodore Ts'o
2014-06-27  2:46   ` Theodore Ts'o
2014-06-29 20:25     ` Pavel Machek
2014-06-29 21:04       ` Theodore Ts'o
2014-06-30  6:46         ` Pavel Machek
2014-06-30 13:43           ` Theodore Ts'o
2014-07-04 10:23             ` ext4: media error but where? Pavel Machek
2014-07-04 12:11               ` Theodore Ts'o [this message]
2014-07-04 17:21                 ` Pavel Machek
2014-07-04 18:06                   ` Pavel Machek
2014-07-04 18:56                   ` Theodore Ts'o
2014-07-06 13:32                     ` Pavel Machek
2014-07-06 13:43                       ` Pavel Machek
2014-07-06 18:29                         ` Theodore Ts'o
2014-07-06 21:37                           ` Pavel Machek
2014-07-07  1:00                             ` Theodore Ts'o
2014-07-07 18:55                               ` Pavel Machek
2014-07-07 23:18                                 ` 3.16-rc, ext4: oopses, OOMs after hard powerdown Pavel Machek
2014-07-07 23:21                                 ` ext4: media error but where? Theodore Ts'o
2014-07-04 19:17                   ` Andreas Dilger
2014-07-04 20:33                     ` Pavel Machek
2014-07-04 22:18                       ` Andreas Dilger
2014-07-05 22:17                       ` Theodore Ts'o
2014-06-27  8:23 ` ext4: total breakdown on USB hdd, 3.0 kernel Oliver Neukum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704121119.GB10514@thunk.org \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.