All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Jander <david@protonic.nl>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>,
	Matteo Croce <technoboy85@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Fri, 4 Jul 2014 15:45:59 +0200	[thread overview]
Message-ID: <20140704154559.026331ec@archvile> (raw)
In-Reply-To: <20140704122022.GC10514@thunk.org>


Hi Ted, Dmitry,

On Fri, 4 Jul 2014 08:20:22 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote:
> > 
> > Here is the output I am getting... AFAICS no problems on the raw device. Is
> > this sufficient testing, Ted?
> 
> I'm not sure what theory Dmitry was trying to pursue when he requested
> that you run the fio test.  Dmitry?
> 
> 
> Please note that at this point there may be multiple causes with
> similar symptoms that are showing up.  So just because one person
> reports one set of data points, such as someone claiming they've seen
> this without a power drop to the storage device, that therefore all of
> the problems were caused by flaky I/O to the device.
> 
> Right now, there are multiple theories floating around --- and it may
> be that more than one of them are true (i.e., there may be multiple
> bugs here).  Some of the possibilities, which again, may not be
> mutually exclusive:
> 
> 1) Some kind of eMMC driver bug, which is possibly causing the CACHE
> FLUSH command not to be sent.

How can I investigate this? According to the fio tests I ran and the
explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH
commands is the only thing left to be discarded on the eMMC driver front,
right?

> 2) Some kind of hardware problem involving flash translation layers
> not having durable transactions of their flash metadata across power
> failures.

That would be like blaming Micron (the eMMC part manufacturer) for faulty
firmware... could be, but how can we test this?

> 3) Some kind of ext4/jbd2 bug, recently introduced, where we are
> modifying some ext4 metadata (either the block allocation bitmap or
> block group summary statistics) outside of a valid transaction handle.

I think I have some more evidence to support this case:

Until previously, I did not run fsck EVER! I know that this is not a good idea
to do in a production environment, but I am only testing right now, and in
theory it should not be necessary, right?

What I did this time, was to run fsck.ext3 or fsck.ext4 (depending on FS
format of course) once every one or two power cycles.

So effectively, what I did amounts to this:

CASE 1: fsck on every power-cycle:

1.- Boot from clean filesystem
2.- Run the following command line:
$ cp -a /usr . & bonnie\+\+ -r 32 -u 100:100 & bonnie\+\+ -r 32 -u 102:102

3.- Hit CTRL+Z (to stop the second bonnie++ process)
4.- Execute "sync"
5.- While "sync" was running, cut off the power supply.
6.- Turn on power and boot from external medium
7.- Run fsck.ext3/4 on eMMC device
8.- Repeat

In this case, there was a minor difference for the fsck output of both
filesystems:

EXT4 was always something like this:

# fsck.ext4 /dev/mmcblk1p2
e2fsck 1.42.5 (29-Jul-2012)
rootfs: recovering journal
Setting free inodes count to 37692 (was 37695)
Setting free blocks count to 136285 (was 136291)
rootfs: clean, 7140/44832 files, 42915/179200 blocks

While for EXT3 the output did not contain the "Setting free * count..."
messages:

# fsck.ext3 -p /dev/mmcblk1p2
rootfs: recovering journal
rootfs: clean, 4895/44832 files, 36473/179200 blocks



CASE 2: fsck on every other power-cycle:

Same as CASE 1 steps 1...5 and then:
6.- Turn on power and boot again from dirty internal eMMC without running fsck.
7.- Repeat steps 2...5 one more time
8.- Perform steps 6...8 from CASE 1.

With this test, the following difference became apparent:

With EXT3: fsck.ext3 did the same as in CASE 1

With EXT4: I get a long list of errors that are being fixed.
It starts like this:


# fsck.ext4 /dev/mmcblk1p2
e2fsck 1.42.5 (29-Jul-2012)
rootfs: recovering journal
rootfs contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 4591, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4594, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4595, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4596, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4597, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4598, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4599, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4600, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4601, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4602, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4603, i_blocks is 16, should be 8.  Fix<y>? yes
...
...
Eventually I pressed CTRL+C and restarted fsck with the option "-p", because
this list was getting a little long.
...
...

# fsck.ext4 -p /dev/mmcblk1p2
rootfs contains a file system with errors, check forced.
rootfs: Inode 5391, i_blocks is 32, should be 16.  FIXED.
rootfs: Inode 5392, i_blocks is 16, should be 8.  FIXED.
rootfs: Inode 5393, i_blocks is 48, should be 24.  FIXED.
rootfs: Inode 5394, i_blocks is 32, should be 16.  FIXED.
rootfs: Inode 5395, i_blocks is 16, should be 8.  FIXED.
...
...
rootfs: Inode 5854, i_blocks is 240, should be 120.  FIXED.
rootfs: Inode 5857, i_blocks is 576, should be 288.  FIXED.
rootfs: Inode 5860, i_blocks is 512, should be 256.  FIXED.
rootfs: Inode 5863, i_blocks is 656, should be 328.  FIXED.
rootfs: Inode 5866, i_blocks is 480, should be 240.  FIXED.
rootfs: Inode 5869, i_blocks is 176, should be 88.  FIXED.
rootfs: Inode 5872, i_blocks is 336, should be 168.  FIXED.
rootfs: 11379/44832 files (0.1% non-contiguous), 70010/179200 blocks
#

> 4) Some other kind of hard-to-reproduce race or wild pointer which is
> sometimes corrupting fs data structures.

I don't have such a hard time reproducing it... but it does take quite some
time (booting several times, re-installing, testing, etc...)

> If someone has a easy to reproduce failure case, the first step is to
> do a very rough bisection test.  Does the easy-to-reproduce failure go
> away if you use 3.14?  3.12?  Also, if you can describe in great
> detail your hardware and software configuration, and under what
> circumstances the problem reproduces, and when it doesn't, that would
> also be critical.  Whether you are just doing reset or a power cycle
> if an unclean shutdown is involved, might also be important.

Until now, I always do a power-cycle, but I can try to check if I am able to
reproduce the problem with just a "shutdown -f" (AFAIK, this does NOT sync
filesystems, right?)

I will try to check 3.14 and 3.12 (if 3.14 still seems buggy). It could take
quite a while until I have results... certainly not before monday.

> And at this point, because I'm getting very suspicious that there may
> be more than one root cause, we should try to keep the debugging of
> one person's reproduction, such as David's, separate from another's,
> such as Matteo's.  It may be that there ultimately have the same root
> cause, and so if one person is able to get an interesting reproduction
> result, it would be great for the other person to try running the same
> experiment on their hardware/software configuration.  But what we must
> not do is assume that one person's experiment is automatically
> applicable to other circumstances.

I agree.

Best regards,

-- 
David Jander
Protonic Holland.

  parent reply	other threads:[~2014-07-04 13:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander [this message]
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704154559.026331ec@archvile \
    --to=david@protonic.nl \
    --cc=darrick.wong@oracle.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.