All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: David Jander <david@protonic.nl>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Matteo Croce <technoboy85@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Fri, 04 Jul 2014 14:17:21 +0400	[thread overview]
Message-ID: <87r421zavi.fsf@openvz.org> (raw)
In-Reply-To: <20140704114031.2915161a@archvile>

On Fri, 4 Jul 2014 11:40:31 +0200, David Jander <david@protonic.nl> wrote:
> 
> Hi Dmitry,
> 
> On Thu, 03 Jul 2014 18:58:48 +0400
> Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> 
> > On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote:
> > > 
> > > Hi Ted,
> > > 
> > > On Thu, 3 Jul 2014 09:43:38 -0400
> > > "Theodore Ts'o" <tytso@mit.edu> wrote:
> > > 
> > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote:
> > > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>:
> > > > > 
> > > > > I have a Samsung SSD 840 PRO
> > > > 
> > > > Matteo,
> > > > 
> > > > For you, you said you were seeing these problems on 3.15.  Was it
> > > > *not* happening for you when you used an older kernel?  If so, that
> > > > would help us try to provide the basis of trying to do a bisection
> > > > search.
> > > 
> > > I also tested with 3.15, and there too I see the same problem.
> > > 
> > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce
> > > > the problem as follows:
> > > > 
> > > > ./kvm-xfstests  --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k 
> > > > 
> > > > xfstests geneeric/075 runs fsx which does a fair amount of block
> > > > allocation deallocations, and then after the test finishes, it first
> > > > replays the journal (e2fsck -p) and then forces a fsck run on the
> > > > test disk that I use for the run.
> > > > 
> > > > After I launch this, in a separate window, I do this:
> > > > 
> > > > 	sleep 60  ; killall qemu-system-x86_64 
> > > > 
> > > > This kills the qemu process midway through the fsx test, and then I
> > > > see if I can find a problem.  I haven't had a chance to automate this
> > > > yet, and it is my intention to try to set this up where I can run this
> > > > on a ramdisk or a SSD, so I can more closely approximate what people
> > > > are reporting on flash-based media.
> > > > 
> > > > So far, I haven't been able to reproduce the problem.  If after doing
> > > > a large number of times, it can't be reproduced (especially if it
> > > > can't be reproduced on an SSD), then it would lead us to believe that
> > > > one of two things is the cause.  (a) The CACHE FLUSH command isn't
> > > > properly getting sent to the device in some cases, or (b) there really
> > > > is a hardware problem with the flash device in question.
> > > 
> > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral
> > > driver? Can you explain why I don't see any problems with EXT3?
> > > 
> > > I can't discard the possibility of (b) because I cannot prove it, but I will
> > > try to see if I can do the same test on a SSD which I happen to have on that
> > > platform. That should be able to rule out problems with the eMMC chip and
> > > -driver, right?
> > > 
> > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)?
> > > 
> > > I left the system running (it started from a dirty EXT4 partition), and I am
> > > seen the following error pop up after a few minutes. The system is not doing
> > > much (some syslog activity maybe, but not much more):
> > > 
> > > [  303.072983] EXT4-fs (mmcblk1p2): error count: 4
> > > [  303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756
> > > [  303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757
> > > 
> > > What does that mean?
> > This means that it found previous error in internal ext4's log. Which is
> > normal because your fs was corrupted before. It is reasonable to
> > recreate filesystem from very beginning.
> > 
> > In order to understand whenever it is regression in eMMC driver it is
> > reasonable to run integrity test for a device itself. You can run
> > any integrity test you like, For example just run a fio's job
> >  "fio disk-verify2.fio" (see attachment), IMPORTANT this script will
> >  destroy data on test partition. If it failed with errors like
> >  follows "verify: bad magic header XXX" than it is definitely a drivers issue.
> 
> I have been trying to run fio on my board with your configuration file, but I
> am having problems, and since I am not familiar with fio at all, I can't
> really figure out what's wrong. My eMMC device is only 916MiB in size, so I
> edited the last part to be:
> 
> offset_increment=100M
> size=100M
> 
> Is that ok?
> 
> I still get error messages complaining about blocksize though. Here is the
> output I get (can't really make sense of it):
> 
> # ./fio ../disk-verify2.fio 
> Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures.
> /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
> ...
> fio-2.1.10-49-gf302
> Starting 4 processes
> fio: blocksize too large for data set
> fio: blocksize too large for data set
> fio: blocksize too large for data set
> fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s]
> fio: pid=7612, got signal=6
> 
> /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul  4 09:31:15 2014
>     lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23%
>     lat (msec) : 250=56.01%, 500=37.18%, 750=1.14%
>   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
>   IO depths    : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=33/w=1024/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=32
> 
> Run status group 0 (all jobs):
> 
> Disk stats (read/write):
>   mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12%
> fio: file hash not empty on exit
> 
> 
> This assertion bugs me. Is it due to the previous errors ("blocksize too large
> for data set") or is is because my eMMC drive/kernel is seriously screwed?
> 
> Help please!
Ohhh. Actually this is axboe's crap. Recent fio's version is broken.
Please use old good commit ffa93ca9d8d37ef
git checkout git://git.kernel.dk/fio.git
cd fio
git checkout -b b2.0.13 ffa93ca9d8d37ef
make -j4
./fio ffa93ca9d8d37ef
> 
> > If my theory is true and it is storage's driver issue than JBD complain
> > simply because it do care about it's data (it does integrity checks).
> > Can you also create btrfs on that partition and performs some io
> > activity and run fsck after that. You likely will see similar corruption
> 
> Best regards,
> 
> -- 
> David Jander
> Protonic Holland.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-07-04 10:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov [this message]
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r421zavi.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.