linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Rue <dan.rue@linaro.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Naresh Kamboju <naresh.kamboju@linaro.org>,
	open list <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Guenter Roeck <linux@roeck-us.net>, Shuah Khan <shuah@kernel.org>,
	patches@kernelci.org,
	Ben Hutchings <ben.hutchings@codethink.co.uk>,
	lkft-triage@lists.linaro.org,
	linux- stable <stable@vger.kernel.org>,
	linux-ext4@vger.kernel.org,
	Arthur Marsh <arthur.marsh@internode.on.net>,
	Richard Weinberger <richard.weinberger@gmail.com>
Subject: Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
Date: Tue, 21 May 2019 10:02:53 -0500	[thread overview]
Message-ID: <20190521150253.coawunplbqjqf4n3@xps.therub.org> (raw)
In-Reply-To: <20190521093849.GA9806@kroah.com>

On Tue, May 21, 2019 at 11:38:49AM +0200, Greg Kroah-Hartman wrote:
> On Tue, May 21, 2019 at 02:58:58PM +0530, Naresh Kamboju wrote:
> > On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> > > > On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.19.45 release.
> > > > > There are 105 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > >
> > > > > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > > > > Anything received after that time might be too late.
> > > >
> > > > We're seeing an ext4 issue previously reported at
> > > > https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> > > >
> > > > [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> > > > [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> > > > [ 1916.081071] Aborting journal on device sda-8.
> > > > [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> > > > [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> > > >
> > > > This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> > > > for 5.1-rc yet, which is presumably also affected in this RC round.
> > > >
> > > > We only see this on x86_64 and i386 devices - though our hardware setups
> > > > vary so it could be coincidence.
> > > >
> > > > I have to run out now, but I'll come back and work on a reproducer and
> > > > bisection later tonight and tomorrow.
> > > >
> > > > Here is an example test run; link goes to the spot in the ltp syscalls
> > > > test where the disk goes into read-only mode.
> > > > https://lkft.validation.linaro.org/scheduler/job/735468#L8081
> > >
> > > Odd, I keep hearing rumors of ext4 issues right now, but nothing
> > > actually solid that I can point to.  Any help you can provide here would
> > > be great.
> > >
> > 
> > git bisect helped me to land on this commit,
> > 
> > # git bisect bad
> > e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit
> > commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date:   Tue Apr 9 23:37:08 2019 -0400
> > 
> >     ext4: protect journal inode's blocks using block_validity
> > 
> >     commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.
> > 
> >     Add the blocks which belong to the journal inode to block_validity's
> >     system zone so attempts to deallocate or overwrite the journal due a
> >     corrupted file system where the journal blocks are also claimed by
> >     another inode.
> > 
> >     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
> >     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> >     Cc: stable@kernel.org
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > :040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff
> > 747c67b159b33e4e1da414b1d33567a5da9ae125 M fs
> 
> Ah, many thanks for this bisection.
> 
> Ted, any ideas here?  Should I drop this from the stable trees, and you
> revert it from Linus's?  Or something else?
> 
> Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
> journal inodes using indirect blocks") in the trees, which was supposed
> to fix the problem with this patch, am I missing another one as well?
> 
> (side note, it was mean not to mark 170417c8c7bb for stable, when the
> patch it was fixing was marked for stable, I'm lucky I caught it...)

My independent bisection agrees that e8fd3c9a5415 ("ext4: protect
journal inode's blocks using block_validity") is the root cause. I was
able to revert it along with 18b3c1c2827c ("ext4: unsigned int compared
against zero") on 4.19 and then the issue went away.

I tested the same revert on mainline v5.2-rc1 and it fixed the issue
there as well (git revert fbbbbd2f28ae 345c0dbf3a30).

The problem reproduces in our environment 100% of the time, but creating
a reproducer is troublesome; it happens while running LTP syscalls, and
requires some combination of syscall tests to happen. So far, we've been
able to reduce it to the following ltp runfile:
https://gist.github.com/danrue/61c663e1dc50dc7c13a232f0a062bdc6

LTP is run using '/opt/ltp/runltp -d /scratch -f syscalls', where the
syscalls file has been replaced with the version in the gist, and
/scratch is an ext4 SATA drive. /scratch is created using 'mkfs -t ext4
/dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGKWF' and mounted to
/scratch.

I'll update the gist as we reduce it further.

Dan

-- 
Linaro - Kernel Validation

      parent reply	other threads:[~2019-05-21 15:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190520115247.060821231@linuxfoundation.org>
2019-05-20 22:23 ` [PATCH 4.19 000/105] 4.19.45-stable review Dan Rue
2019-05-21  8:59   ` Greg Kroah-Hartman
2019-05-21  9:28     ` Naresh Kamboju
2019-05-21  9:38       ` ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review) Greg Kroah-Hartman
2019-05-21 10:28         ` Naresh Kamboju
2019-05-21 16:21           ` Theodore Ts'o
2019-05-21 16:30             ` Greg Kroah-Hartman
2019-05-21 16:44               ` Greg Kroah-Hartman
2019-05-21 17:57             ` Naresh Kamboju
2019-05-22  5:05               ` Theodore Ts'o
2019-05-22 10:20                 ` Naresh Kamboju
2019-05-21 15:02         ` Dan Rue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521150253.coawunplbqjqf4n3@xps.therub.org \
    --to=dan.rue@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=arthur.marsh@internode.on.net \
    --cc=ben.hutchings@codethink.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=lkft-triage@lists.linaro.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=patches@kernelci.org \
    --cc=richard.weinberger@gmail.com \
    --cc=shuah@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).