linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 4.19 000/105] 4.19.45-stable review
       [not found] <20190520115247.060821231@linuxfoundation.org>
@ 2019-05-20 22:23 ` Dan Rue
  2019-05-21  8:59   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Rue @ 2019-05-20 22:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable, linux-ext4, Arthur Marsh,
	Richard Weinberger, Theodore Ts'o

On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.45 release.
> There are 105 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> Anything received after that time might be too late.

We're seeing an ext4 issue previously reported at
https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.

[ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
[ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
[ 1916.081071] Aborting journal on device sda-8.
[ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
[ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only

This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
for 5.1-rc yet, which is presumably also affected in this RC round.

We only see this on x86_64 and i386 devices - though our hardware setups
vary so it could be coincidence.

I have to run out now, but I'll come back and work on a reproducer and
bisection later tonight and tomorrow.

Here is an example test run; link goes to the spot in the ltp syscalls
test where the disk goes into read-only mode.
https://lkft.validation.linaro.org/scheduler/job/735468#L8081

Dan

-- 
Linaro - Kernel Validation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4.19 000/105] 4.19.45-stable review
  2019-05-20 22:23 ` [PATCH 4.19 000/105] 4.19.45-stable review Dan Rue
@ 2019-05-21  8:59   ` Greg Kroah-Hartman
  2019-05-21  9:28     ` Naresh Kamboju
  0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2019-05-21  8:59 UTC (permalink / raw)
  To: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable, linux-ext4, Arthur Marsh,
	Richard Weinberger, Theodore Ts'o

On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.19.45 release.
> > There are 105 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > Anything received after that time might be too late.
> 
> We're seeing an ext4 issue previously reported at
> https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> 
> [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> [ 1916.081071] Aborting journal on device sda-8.
> [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> 
> This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> for 5.1-rc yet, which is presumably also affected in this RC round.
> 
> We only see this on x86_64 and i386 devices - though our hardware setups
> vary so it could be coincidence.
> 
> I have to run out now, but I'll come back and work on a reproducer and
> bisection later tonight and tomorrow.
> 
> Here is an example test run; link goes to the spot in the ltp syscalls
> test where the disk goes into read-only mode.
> https://lkft.validation.linaro.org/scheduler/job/735468#L8081

Odd, I keep hearing rumors of ext4 issues right now, but nothing
actually solid that I can point to.  Any help you can provide here would
be great.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4.19 000/105] 4.19.45-stable review
  2019-05-21  8:59   ` Greg Kroah-Hartman
@ 2019-05-21  9:28     ` Naresh Kamboju
  2019-05-21  9:38       ` ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review) Greg Kroah-Hartman
  0 siblings, 1 reply; 12+ messages in thread
From: Naresh Kamboju @ 2019-05-21  9:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: open list, Linus Torvalds, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, linux- stable,
	linux-ext4, Arthur Marsh, Richard Weinberger, Theodore Ts'o

On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> > On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.19.45 release.
> > > There are 105 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > > Anything received after that time might be too late.
> >
> > We're seeing an ext4 issue previously reported at
> > https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> >
> > [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> > [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> > [ 1916.081071] Aborting journal on device sda-8.
> > [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> > [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> >
> > This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> > for 5.1-rc yet, which is presumably also affected in this RC round.
> >
> > We only see this on x86_64 and i386 devices - though our hardware setups
> > vary so it could be coincidence.
> >
> > I have to run out now, but I'll come back and work on a reproducer and
> > bisection later tonight and tomorrow.
> >
> > Here is an example test run; link goes to the spot in the ltp syscalls
> > test where the disk goes into read-only mode.
> > https://lkft.validation.linaro.org/scheduler/job/735468#L8081
>
> Odd, I keep hearing rumors of ext4 issues right now, but nothing
> actually solid that I can point to.  Any help you can provide here would
> be great.
>

git bisect helped me to land on this commit,

# git bisect bad
e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit
commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue Apr 9 23:37:08 2019 -0400

    ext4: protect journal inode's blocks using block_validity

    commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.

    Add the blocks which belong to the journal inode to block_validity's
    system zone so attempts to deallocate or overwrite the journal due a
    corrupted file system where the journal blocks are also claimed by
    another inode.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff
747c67b159b33e4e1da414b1d33567a5da9ae125 M fs

- Naresh

> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21  9:28     ` Naresh Kamboju
@ 2019-05-21  9:38       ` Greg Kroah-Hartman
  2019-05-21 10:28         ` Naresh Kamboju
  2019-05-21 15:02         ` Dan Rue
  0 siblings, 2 replies; 12+ messages in thread
From: Greg Kroah-Hartman @ 2019-05-21  9:38 UTC (permalink / raw)
  To: Theodore Ts'o, Naresh Kamboju
  Cc: open list, Linus Torvalds, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, linux- stable,
	linux-ext4, Arthur Marsh, Richard Weinberger, Theodore Ts'o

On Tue, May 21, 2019 at 02:58:58PM +0530, Naresh Kamboju wrote:
> On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> > > On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > > > This is the start of the stable review cycle for the 4.19.45 release.
> > > > There are 105 patches in this series, all will be posted as a response
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > > > Anything received after that time might be too late.
> > >
> > > We're seeing an ext4 issue previously reported at
> > > https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> > >
> > > [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> > > [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> > > [ 1916.081071] Aborting journal on device sda-8.
> > > [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> > > [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> > >
> > > This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> > > for 5.1-rc yet, which is presumably also affected in this RC round.
> > >
> > > We only see this on x86_64 and i386 devices - though our hardware setups
> > > vary so it could be coincidence.
> > >
> > > I have to run out now, but I'll come back and work on a reproducer and
> > > bisection later tonight and tomorrow.
> > >
> > > Here is an example test run; link goes to the spot in the ltp syscalls
> > > test where the disk goes into read-only mode.
> > > https://lkft.validation.linaro.org/scheduler/job/735468#L8081
> >
> > Odd, I keep hearing rumors of ext4 issues right now, but nothing
> > actually solid that I can point to.  Any help you can provide here would
> > be great.
> >
> 
> git bisect helped me to land on this commit,
> 
> # git bisect bad
> e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit
> commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Tue Apr 9 23:37:08 2019 -0400
> 
>     ext4: protect journal inode's blocks using block_validity
> 
>     commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.
> 
>     Add the blocks which belong to the journal inode to block_validity's
>     system zone so attempts to deallocate or overwrite the journal due a
>     corrupted file system where the journal blocks are also claimed by
>     another inode.
> 
>     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>     Cc: stable@kernel.org
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> :040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff
> 747c67b159b33e4e1da414b1d33567a5da9ae125 M fs

Ah, many thanks for this bisection.

Ted, any ideas here?  Should I drop this from the stable trees, and you
revert it from Linus's?  Or something else?

Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
journal inodes using indirect blocks") in the trees, which was supposed
to fix the problem with this patch, am I missing another one as well?

(side note, it was mean not to mark 170417c8c7bb for stable, when the
patch it was fixing was marked for stable, I'm lucky I caught it...)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21  9:38       ` ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review) Greg Kroah-Hartman
@ 2019-05-21 10:28         ` Naresh Kamboju
  2019-05-21 16:21           ` Theodore Ts'o
  2019-05-21 15:02         ` Dan Rue
  1 sibling, 1 reply; 12+ messages in thread
From: Naresh Kamboju @ 2019-05-21 10:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Theodore Ts'o, open list, Linus Torvalds, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	linux- stable, linux-ext4, Arthur Marsh, Richard Weinberger

On Tue, 21 May 2019 at 15:08, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, May 21, 2019 at 02:58:58PM +0530, Naresh Kamboju wrote:
> > On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> > > > On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.19.45 release.
> > > > > There are 105 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > >
> > > > > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > > > > Anything received after that time might be too late.
> > > >
> > > > We're seeing an ext4 issue previously reported at
> > > > https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> > > >
> > > > [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> > > > [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> > > > [ 1916.081071] Aborting journal on device sda-8.
> > > > [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> > > > [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> > > >
> > > > This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> > > > for 5.1-rc yet, which is presumably also affected in this RC round.
> > > >
> > > > We only see this on x86_64 and i386 devices - though our hardware setups
> > > > vary so it could be coincidence.
> > > >
> > > > I have to run out now, but I'll come back and work on a reproducer and
> > > > bisection later tonight and tomorrow.
> > > >
> > > > Here is an example test run; link goes to the spot in the ltp syscalls
> > > > test where the disk goes into read-only mode.
> > > > https://lkft.validation.linaro.org/scheduler/job/735468#L8081
> > >
> > > Odd, I keep hearing rumors of ext4 issues right now, but nothing
> > > actually solid that I can point to.  Any help you can provide here would
> > > be great.
> > >
> >
> > git bisect helped me to land on this commit,
> >
> > # git bisect bad
> > e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit
> > commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date:   Tue Apr 9 23:37:08 2019 -0400
> >
> >     ext4: protect journal inode's blocks using block_validity
> >
> >     commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.
> >
> >     Add the blocks which belong to the journal inode to block_validity's
> >     system zone so attempts to deallocate or overwrite the journal due a
> >     corrupted file system where the journal blocks are also claimed by
> >     another inode.
> >
> >     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
> >     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> >     Cc: stable@kernel.org
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >
> > :040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff
> > 747c67b159b33e4e1da414b1d33567a5da9ae125 M fs
>
> Ah, many thanks for this bisection.
>
> Ted, any ideas here?  Should I drop this from the stable trees, and you
> revert it from Linus's?  Or something else?
>
> Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
> journal inodes using indirect blocks") in the trees, which was supposed
> to fix the problem with this patch, am I missing another one as well?

FYI,
I have applied fix patch 170417c8c7bb ("ext4: fix block validity checks for
 journal inodes using indirect blocks") but did not fix this problem.

>
> (side note, it was mean not to mark 170417c8c7bb for stable, when the
> patch it was fixing was marked for stable, I'm lucky I caught it...)
>

This problem occurring on stable rc 4.19, 5.0, 5.1 branches
and master branch of mainline and -next trees also.


- Naresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21  9:38       ` ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review) Greg Kroah-Hartman
  2019-05-21 10:28         ` Naresh Kamboju
@ 2019-05-21 15:02         ` Dan Rue
  1 sibling, 0 replies; 12+ messages in thread
From: Dan Rue @ 2019-05-21 15:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Theodore Ts'o, Naresh Kamboju, open list, Linus Torvalds,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, linux- stable, linux-ext4, Arthur Marsh,
	Richard Weinberger

On Tue, May 21, 2019 at 11:38:49AM +0200, Greg Kroah-Hartman wrote:
> On Tue, May 21, 2019 at 02:58:58PM +0530, Naresh Kamboju wrote:
> > On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
> > > > On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.19.45 release.
> > > > > There are 105 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > >
> > > > > Responses should be made by Wed 22 May 2019 11:50:49 AM UTC.
> > > > > Anything received after that time might be too late.
> > > >
> > > > We're seeing an ext4 issue previously reported at
> > > > https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
> > > >
> > > > [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
> > > > [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8
> > > > [ 1916.081071] Aborting journal on device sda-8.
> > > > [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
> > > > [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> > > >
> > > > This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data
> > > > for 5.1-rc yet, which is presumably also affected in this RC round.
> > > >
> > > > We only see this on x86_64 and i386 devices - though our hardware setups
> > > > vary so it could be coincidence.
> > > >
> > > > I have to run out now, but I'll come back and work on a reproducer and
> > > > bisection later tonight and tomorrow.
> > > >
> > > > Here is an example test run; link goes to the spot in the ltp syscalls
> > > > test where the disk goes into read-only mode.
> > > > https://lkft.validation.linaro.org/scheduler/job/735468#L8081
> > >
> > > Odd, I keep hearing rumors of ext4 issues right now, but nothing
> > > actually solid that I can point to.  Any help you can provide here would
> > > be great.
> > >
> > 
> > git bisect helped me to land on this commit,
> > 
> > # git bisect bad
> > e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit
> > commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date:   Tue Apr 9 23:37:08 2019 -0400
> > 
> >     ext4: protect journal inode's blocks using block_validity
> > 
> >     commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.
> > 
> >     Add the blocks which belong to the journal inode to block_validity's
> >     system zone so attempts to deallocate or overwrite the journal due a
> >     corrupted file system where the journal blocks are also claimed by
> >     another inode.
> > 
> >     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
> >     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> >     Cc: stable@kernel.org
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > :040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff
> > 747c67b159b33e4e1da414b1d33567a5da9ae125 M fs
> 
> Ah, many thanks for this bisection.
> 
> Ted, any ideas here?  Should I drop this from the stable trees, and you
> revert it from Linus's?  Or something else?
> 
> Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
> journal inodes using indirect blocks") in the trees, which was supposed
> to fix the problem with this patch, am I missing another one as well?
> 
> (side note, it was mean not to mark 170417c8c7bb for stable, when the
> patch it was fixing was marked for stable, I'm lucky I caught it...)

My independent bisection agrees that e8fd3c9a5415 ("ext4: protect
journal inode's blocks using block_validity") is the root cause. I was
able to revert it along with 18b3c1c2827c ("ext4: unsigned int compared
against zero") on 4.19 and then the issue went away.

I tested the same revert on mainline v5.2-rc1 and it fixed the issue
there as well (git revert fbbbbd2f28ae 345c0dbf3a30).

The problem reproduces in our environment 100% of the time, but creating
a reproducer is troublesome; it happens while running LTP syscalls, and
requires some combination of syscall tests to happen. So far, we've been
able to reduce it to the following ltp runfile:
https://gist.github.com/danrue/61c663e1dc50dc7c13a232f0a062bdc6

LTP is run using '/opt/ltp/runltp -d /scratch -f syscalls', where the
syscalls file has been replaced with the version in the gist, and
/scratch is an ext4 SATA drive. /scratch is created using 'mkfs -t ext4
/dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGKWF' and mounted to
/scratch.

I'll update the gist as we reduce it further.

Dan

-- 
Linaro - Kernel Validation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21 10:28         ` Naresh Kamboju
@ 2019-05-21 16:21           ` Theodore Ts'o
  2019-05-21 16:30             ` Greg Kroah-Hartman
  2019-05-21 17:57             ` Naresh Kamboju
  0 siblings, 2 replies; 12+ messages in thread
From: Theodore Ts'o @ 2019-05-21 16:21 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Greg Kroah-Hartman, open list, Linus Torvalds, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	linux- stable, linux-ext4, Arthur Marsh, Richard Weinberger

On Tue, May 21, 2019 at 03:58:15PM +0530, Naresh Kamboju wrote:
> > Ted, any ideas here?  Should I drop this from the stable trees, and you
> > revert it from Linus's?  Or something else?

It's safe to drop this from the stable trees while we investigate.  It
was always borderline for stable anyway.  (See below).

> >
> > Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
> > journal inodes using indirect blocks") in the trees, which was supposed
> > to fix the problem with this patch, am I missing another one as well?
> 
> FYI,
> I have applied fix patch 170417c8c7bb ("ext4: fix block validity checks for
>  journal inodes using indirect blocks") but did not fix this problem.

Hmm... are you _sure_?  This bug was reported to me versus the
mainline, and the person who reported it confirmed that it did fix the
problem, he was seeing, and the symptoms are identical to yours.  Can
you double check, please?  I can't reproduce it either with that patch applied.

> > (side note, it was mean not to mark 170417c8c7bb for stable, when the
> > patch it was fixing was marked for stable, I'm lucky I caught it...)

Sorry, I had forgotten that I had marked 345c0dbf3a30 for stable;
that's why I didn't mark 170417c8c7bb for stable.  345c0dbf3a30 fixes
a crash triggered by a specially crafted (corrupted) file system, and
I had thought I had decided it wasn't important enough for stable; I
think what happened is I shrugged and said, "oh well, Sasha's
automated ML system is going to pick it for stable anyway, so I might
just mark it for stable anyway" --- and I forgot I had landed that
way.

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21 16:21           ` Theodore Ts'o
@ 2019-05-21 16:30             ` Greg Kroah-Hartman
  2019-05-21 16:44               ` Greg Kroah-Hartman
  2019-05-21 17:57             ` Naresh Kamboju
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2019-05-21 16:30 UTC (permalink / raw)
  To: Theodore Ts'o, Naresh Kamboju, open list, Linus Torvalds,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, linux- stable, linux-ext4, Arthur Marsh,
	Richard Weinberger

On Tue, May 21, 2019 at 12:21:42PM -0400, Theodore Ts'o wrote:
> On Tue, May 21, 2019 at 03:58:15PM +0530, Naresh Kamboju wrote:
> > > Ted, any ideas here?  Should I drop this from the stable trees, and you
> > > revert it from Linus's?  Or something else?
> 
> It's safe to drop this from the stable trees while we investigate.  It
> was always borderline for stable anyway.  (See below).

Ok, will go drop both of these now, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21 16:30             ` Greg Kroah-Hartman
@ 2019-05-21 16:44               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 12+ messages in thread
From: Greg Kroah-Hartman @ 2019-05-21 16:44 UTC (permalink / raw)
  To: Theodore Ts'o, Naresh Kamboju, open list, Linus Torvalds,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, linux- stable, linux-ext4, Arthur Marsh,
	Richard Weinberger

On Tue, May 21, 2019 at 06:30:12PM +0200, Greg Kroah-Hartman wrote:
> On Tue, May 21, 2019 at 12:21:42PM -0400, Theodore Ts'o wrote:
> > On Tue, May 21, 2019 at 03:58:15PM +0530, Naresh Kamboju wrote:
> > > > Ted, any ideas here?  Should I drop this from the stable trees, and you
> > > > revert it from Linus's?  Or something else?
> > 
> > It's safe to drop this from the stable trees while we investigate.  It
> > was always borderline for stable anyway.  (See below).
> 
> Ok, will go drop both of these now, thanks.

I have now pushed out -rc2 releases for 5.1, 5.0, and 4.19 with 3 ext4
patches dropped from each series as there was the original patch here,
and then 2 others on top of that.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21 16:21           ` Theodore Ts'o
  2019-05-21 16:30             ` Greg Kroah-Hartman
@ 2019-05-21 17:57             ` Naresh Kamboju
  2019-05-22  5:05               ` Theodore Ts'o
  1 sibling, 1 reply; 12+ messages in thread
From: Naresh Kamboju @ 2019-05-21 17:57 UTC (permalink / raw)
  To: Theodore Ts'o, Naresh Kamboju, Greg Kroah-Hartman, open list,
	Linus Torvalds, Andrew Morton, Guenter Roeck, Shuah Khan,
	patches, Ben Hutchings, lkft-triage, linux- stable, linux-ext4,
	Arthur Marsh, Richard Weinberger
  Cc: ltp, Jan Stancek

On Tue, 21 May 2019 at 21:52, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Tue, May 21, 2019 at 03:58:15PM +0530, Naresh Kamboju wrote:
> > > Ted, any ideas here?  Should I drop this from the stable trees, and you
> > > revert it from Linus's?  Or something else?
>
> It's safe to drop this from the stable trees while we investigate.  It
> was always borderline for stable anyway.  (See below).
>
> > >
> > > Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for
> > > journal inodes using indirect blocks") in the trees, which was supposed
> > > to fix the problem with this patch, am I missing another one as well?
> >
> > FYI,
> > I have applied fix patch 170417c8c7bb ("ext4: fix block validity checks for
> >  journal inodes using indirect blocks") but did not fix this problem.
>
> Hmm... are you _sure_?  This bug was reported to me versus the
> mainline, and the person who reported it confirmed that it did fix the
> problem, he was seeing, and the symptoms are identical to yours.  Can
> you double check, please?  I can't reproduce it either with that patch applied.

This bug is specific to x86_64 and i386.

Steps to reproduce is,
running LTP three test cases in sequence on x86 device.
# cd ltp/runtest
# cat syscalls ( only three test case)
open12 open12
madvise06 madvise06
poll02 poll02
#

as Dan referring to,

LTP is run using '/opt/ltp/runltp -d /scratch -f syscalls', where the
syscalls file has been replaced with three test case names, and
/scratch is an ext4 SATA drive. /scratch is created using 'mkfs -t ext4
/dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGKWF' and mounted to
/scratch.

Please find full test log,
https://lkft.validation.linaro.org/scheduler/job/738661#L1356

And you notice dmesg log,
[   53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode
#8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent
entries - magic f30a, entries 8, max 340(340), depth 0(0)
[   53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8
[   53.938480] Aborting journal on device sda-8.
[   55.431382] EXT4-fs error (device sda):
ext4_journal_check_start:61: Detected aborted journal
[   55.439947] EXT4-fs (sda): Remounting filesystem read-only

- Naresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-21 17:57             ` Naresh Kamboju
@ 2019-05-22  5:05               ` Theodore Ts'o
  2019-05-22 10:20                 ` Naresh Kamboju
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2019-05-22  5:05 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Greg Kroah-Hartman, open list, Linus Torvalds, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	linux- stable, linux-ext4, Arthur Marsh, Richard Weinberger, ltp,
	Jan Stancek

On Tue, May 21, 2019 at 11:27:21PM +0530, Naresh Kamboju wrote:
> Steps to reproduce is,
> running LTP three test cases in sequence on x86 device.
> # cd ltp/runtest
> # cat syscalls ( only three test case)
> open12 open12
> madvise06 madvise06
> poll02 poll02
> #
> 
> as Dan referring to,
> 
> LTP is run using '/opt/ltp/runltp -d /scratch -f syscalls', where the
> syscalls file has been replaced with three test case names, and
> /scratch is an ext4 SATA drive. /scratch is created using 'mkfs -t ext4
> /dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGKWF' and mounted to
> /scratch.

I'm still having trouble reproducing the problem.  I've followed the
above exactly, and it doesn't trigger on my system.  I think I know
what is happening, but even given my theory, I'm still not able to
trigger it.  So, I'm not 100% sure this is the appropriate fix.  If
you can reproduce it, can you see if this patch, applied on top of the
Linus's tip, fixes the problem for you?

					- Ted

commit 3ad7621bfff343b16d59ed418f6d4420d4ec3e63
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue May 21 17:01:01 2019 -0400

    ext4: don't perform block validity checks on the journal inode
    
    Since the journal inode is already checked when we added it to the
    block validity's system zone, if we check it again, we'll just trigger
    a failure.
    
    This was causing failures like this:
    
    [   53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode
    #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
    [   53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8
    [   53.938480] Aborting journal on device sda-8.
    
    ... but only if the system was under enough memory pressure that
    logical->physical mapping for the journal inode gets pushed out of the
    extent cache.  (This is why it wasn't noticed earlier.)
    
    Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
    Reported-by: Dan Rue <dan.rue@linaro.org>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f2c62e2a0c98..d40ed940001e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -518,10 +518,14 @@ __read_extent_tree_block(const char *function, unsigned int line,
 	}
 	if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE))
 		return bh;
-	err = __ext4_ext_check(function, line, inode,
-			       ext_block_hdr(bh), depth, pblk);
-	if (err)
-		goto errout;
+	if (!ext4_has_feature_journal(inode->i_sb) ||
+	    (inode->i_ino !=
+	     le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) {
+		err = __ext4_ext_check(function, line, inode,
+				       ext_block_hdr(bh), depth, pblk);
+		if (err)
+			goto errout;
+	}
 	set_buffer_verified(bh);
 	/*
 	 * If this is a leaf block, cache all of its entries

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review)
  2019-05-22  5:05               ` Theodore Ts'o
@ 2019-05-22 10:20                 ` Naresh Kamboju
  0 siblings, 0 replies; 12+ messages in thread
From: Naresh Kamboju @ 2019-05-22 10:20 UTC (permalink / raw)
  To: Theodore Ts'o, Naresh Kamboju, Greg Kroah-Hartman, open list,
	Linus Torvalds, Andrew Morton, Guenter Roeck, Shuah Khan,
	patches, Ben Hutchings, lkft-triage, linux- stable, linux-ext4,
	Arthur Marsh, Richard Weinberger, ltp, Jan Stancek

On Wed, 22 May 2019 at 10:36, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Tue, May 21, 2019 at 11:27:21PM +0530, Naresh Kamboju wrote:
> > Steps to reproduce is,
> > running LTP three test cases in sequence on x86 device.
> > # cd ltp/runtest
> > # cat syscalls ( only three test case)
> > open12 open12
> > madvise06 madvise06
> > poll02 poll02
> > #
> >
> > as Dan referring to,
> >
> > LTP is run using '/opt/ltp/runltp -d /scratch -f syscalls', where the
> > syscalls file has been replaced with three test case names, and
> > /scratch is an ext4 SATA drive. /scratch is created using 'mkfs -t ext4
> > /dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGKWF' and mounted to
> > /scratch.
>
> I'm still having trouble reproducing the problem.  I've followed the
> above exactly, and it doesn't trigger on my system.  I think I know
> what is happening, but even given my theory, I'm still not able to
> trigger it.  So, I'm not 100% sure this is the appropriate fix.  If
> you can reproduce it, can you see if this patch, applied on top of the
> Linus's tip, fixes the problem for you?

Applied your patch on mainline master branch and tested on x86_64 and
confirms that the reported problem fixed.

Thanks for your fix patch.

LTP syscalls full test output log,
https://lkft.validation.linaro.org/scheduler/job/739075

---
Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using
block_validity")
    Reported-by: Dan Rue <dan.rue@linaro.org>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f2c62e2a0c98..d40ed940001e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -518,10 +518,14 @@ __read_extent_tree_block(const char *function,
unsigned int line,
        }
        if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE))
                return bh;
-       err = __ext4_ext_check(function, line, inode,
-                              ext_block_hdr(bh), depth, pblk);
-       if (err)
-               goto errout;
+       if (!ext4_has_feature_journal(inode->i_sb) ||
+           (inode->i_ino !=
+            le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) {
+               err = __ext4_ext_check(function, line, inode,
+                                      ext_block_hdr(bh), depth, pblk);
+               if (err)
+                       goto errout;
+       }
        set_buffer_verified(bh);
        /*
         * If this is a leaf block, cache all of its entries


- Naresh

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-22 10:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190520115247.060821231@linuxfoundation.org>
2019-05-20 22:23 ` [PATCH 4.19 000/105] 4.19.45-stable review Dan Rue
2019-05-21  8:59   ` Greg Kroah-Hartman
2019-05-21  9:28     ` Naresh Kamboju
2019-05-21  9:38       ` ext4 regression (was Re: [PATCH 4.19 000/105] 4.19.45-stable review) Greg Kroah-Hartman
2019-05-21 10:28         ` Naresh Kamboju
2019-05-21 16:21           ` Theodore Ts'o
2019-05-21 16:30             ` Greg Kroah-Hartman
2019-05-21 16:44               ` Greg Kroah-Hartman
2019-05-21 17:57             ` Naresh Kamboju
2019-05-22  5:05               ` Theodore Ts'o
2019-05-22 10:20                 ` Naresh Kamboju
2019-05-21 15:02         ` Dan Rue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).