All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: David Jander <david@protonic.nl>, Theodore Ts'o <tytso@mit.edu>
Cc: Matteo Croce <technoboy85@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Thu, 03 Jul 2014 18:57:18 +0400	[thread overview]
Message-ID: <87vbreze0h.fsf@openvz.org> (raw)
In-Reply-To: <20140703161551.5fd13245@archvile>

On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote:
> 
> Hi Ted,
> 
> On Thu, 3 Jul 2014 09:43:38 -0400
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
> > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote:
> > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>:
> > > 
> > > I have a Samsung SSD 840 PRO
> > 
> > Matteo,
> > 
> > For you, you said you were seeing these problems on 3.15.  Was it
> > *not* happening for you when you used an older kernel?  If so, that
> > would help us try to provide the basis of trying to do a bisection
> > search.
> 
> I also tested with 3.15, and there too I see the same problem.
> 
> > Using the kvm-xfstests infrastructure, I've been trying to reproduce
> > the problem as follows:
> > 
> > ./kvm-xfstests  --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k 
> > 
> > xfstests geneeric/075 runs fsx which does a fair amount of block
> > allocation deallocations, and then after the test finishes, it first
> > replays the journal (e2fsck -p) and then forces a fsck run on the
> > test disk that I use for the run.
> > 
> > After I launch this, in a separate window, I do this:
> > 
> > 	sleep 60  ; killall qemu-system-x86_64 
> > 
> > This kills the qemu process midway through the fsx test, and then I
> > see if I can find a problem.  I haven't had a chance to automate this
> > yet, and it is my intention to try to set this up where I can run this
> > on a ramdisk or a SSD, so I can more closely approximate what people
> > are reporting on flash-based media.
> > 
> > So far, I haven't been able to reproduce the problem.  If after doing
> > a large number of times, it can't be reproduced (especially if it
> > can't be reproduced on an SSD), then it would lead us to believe that
> > one of two things is the cause.  (a) The CACHE FLUSH command isn't
> > properly getting sent to the device in some cases, or (b) there really
> > is a hardware problem with the flash device in question.
> 
> Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral
> driver? Can you explain why I don't see any problems with EXT3?
> 
> I can't discard the possibility of (b) because I cannot prove it, but I will
> try to see if I can do the same test on a SSD which I happen to have on that
> platform. That should be able to rule out problems with the eMMC chip and
> -driver, right?
> 
> Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)?
> 
> I left the system running (it started from a dirty EXT4 partition), and I am
> seen the following error pop up after a few minutes. The system is not doing
> much (some syslog activity maybe, but not much more):
> 
> [  303.072983] EXT4-fs (mmcblk1p2): error count: 4
> [  303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756
> [  303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757
> 
> What does that mean?
This means that it found previous error in internal ext4's log. Which is
normal because your fs was corrupted before. It is reasonable to
recreate filesystem from very beginning.

In order to understand whenever it is regression in eMMC driver it is
reasonable to run integrity test for a device itself. You can run
any integrity test you like, For example just run a fio's job
 "fio disk-verify2.fio" (see attachment), IMPORTANT this script will
 destroy data on test partition. If it failed with errors like
 follows "verify: bad magic header XXX" than it is definitely a drivers issue.

If my theory is true and it is storage's driver issue than JBD complain
simply because it do care about it's data (it does integrity checks).
Can you also create btrfs on that partition and performs some io
activity and run fsck after that. You likely will see similar corruption

> 
> Best regards,
> 
> -- 
> David Jander
> Protonic Holland.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-07-03 14:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov [this message]
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vbreze0h.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.