linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext4 file system corruption with v4.19.3 / v4.19.4
       [not found] <065643a0-f9aa-a361-715a-03ca978d9228@roeck-us.net>
@ 2018-11-27 14:32 ` Guenter Roeck
  2018-11-27 14:48   ` Marek Habersack
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Guenter Roeck @ 2018-11-27 14:32 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel

[trying again, this time with correct kernel.org address]

Hi,

I have seen the following and similar problems several times,
with both v4.19.3 and v4.19.4:

Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode size 256)
Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device sdb1-8.
Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting filesystem read-only
Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602881: comm rm: bad extra_isize 33685 (inode size 256)
...

Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode referenced: 238160407
Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device sda1-8.
Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting filesystem read-only
...

Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode referenced: 52043796
Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device sda1-8.
Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting filesystem read-only


The same systems running v4.18.6 never experienced a problem.

Has anyone else seen similar problems ? Is there anything I can do
to help tracking down the problem ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 14:32 ` ext4 file system corruption with v4.19.3 / v4.19.4 Guenter Roeck
@ 2018-11-27 14:48   ` Marek Habersack
  2018-11-27 17:31     ` Guenter Roeck
  2018-11-27 18:55     ` Rainer Fiebig
  2018-11-27 15:50   ` Rainer Fiebig
  2018-11-28  0:16   ` Andrey Jr. Melnikov
  2 siblings, 2 replies; 27+ messages in thread
From: Marek Habersack @ 2018-11-27 14:48 UTC (permalink / raw)
  To: Guenter Roeck, Theodore Ts'o, Andreas Dilger; +Cc: linux-ext4, linux-kernel

On 27/11/2018 15:32, Guenter Roeck wrote:
Hi,

You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel config. Starting with 4.19.1 it somehow
interferes with ext4 and causes problems similar to the ones you list below. Ever since I disabled MQ (either recompile
your kernel or add `scsi_mod.use_blk_mq=0` to the kernel command line) none of those errors came back.

hope it helps,

marek
> [trying again, this time with correct kernel.org address]
> 
> Hi,
> 
> I have seen the following and similar problems several times,
> with both v4.19.3 and v4.19.4:
> 
> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602889: comm git: bad
> extra_isize 33661 (inode size 256)
> Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device sdb1-8.
> Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting filesystem read-only
> Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602881: comm rm: bad
> extra_isize 33685 (inode size 256)
> ...
> 
> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): ext4_lookup:1578: inode #238034131: comm
> updatedb.mlocat: deleted inode referenced: 238160407
> Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device sda1-8.
> Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting filesystem read-only
> ...
> 
> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): ext4_lookup:1578: inode #52038457: comm
> nfsd: deleted inode referenced: 52043796
> Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device sda1-8.
> Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting filesystem read-only
> 
> 
> The same systems running v4.18.6 never experienced a problem.
> 
> Has anyone else seen similar problems ? Is there anything I can do
> to help tracking down the problem ?
> 
> Thanks,
> Guenter
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 14:32 ` ext4 file system corruption with v4.19.3 / v4.19.4 Guenter Roeck
  2018-11-27 14:48   ` Marek Habersack
@ 2018-11-27 15:50   ` Rainer Fiebig
  2018-11-28  0:16   ` Andrey Jr. Melnikov
  2 siblings, 0 replies; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-27 15:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Guenter Roeck, Theodore Ts'o, Andreas Dilger, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]

Am Dienstag, 27. November 2018, 06:32:08 schrieb Guenter Roeck:
> [trying again, this time with correct kernel.org address]
> 
> Hi,
> 
> I have seen the following and similar problems several times,
> with both v4.19.3 and v4.19.4:
> 
> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1):
> ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode
> size 256) Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on
> device sdb1-8. Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1):
> Remounting filesystem read-only Nov 23 04:32:25 mars kernel:
> [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode
> #12602881: comm rm: bad extra_isize 33685 (inode size 256) ...
> 
> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1):
> ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode
> referenced: 238160407 Nov 25 00:12:43 saturn kernel: [59377.766638]
> Aborting journal on device sda1-8. Nov 25 00:12:43 saturn kernel:
> [59377.779372] EXT4-fs (sda1): Remounting filesystem read-only ...
> 
> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1):
> ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode referenced:
> 52043796 Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on
> device sda1-8. Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs
> (sda1): Remounting filesystem read-only
> 
> 
> The same systems running v4.18.6 never experienced a problem.
> 
> Has anyone else seen similar problems ? Is there anything I can do
> to help tracking down the problem ?
> 
> Thanks,
> Guenter

This is already being discussed here: 
https://bugzilla.kernel.org/show_bug.cgi?id=201685

I guess bisecting between 4.18.0 and 4.19.0 by someone who is proficient in 
this and is affected by the problem would be immensely valuable.

I'm not affected and am a bit reluctant to expose my production system to 
this. ;)

So long!

Rainer Fiebig 

-- 
The truth always turns out to be simpler than you thought.
Richard Feynman

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 14:48   ` Marek Habersack
@ 2018-11-27 17:31     ` Guenter Roeck
  2018-11-27 18:55     ` Rainer Fiebig
  1 sibling, 0 replies; 27+ messages in thread
From: Guenter Roeck @ 2018-11-27 17:31 UTC (permalink / raw)
  To: Marek Habersack
  Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel

On Tue, Nov 27, 2018 at 03:48:19PM +0100, Marek Habersack wrote:
> On 27/11/2018 15:32, Guenter Roeck wrote:
> Hi,
> 
> You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel config. Starting with 4.19.1 it somehow

Yes, it is enabled in my configuration.

> interferes with ext4 and causes problems similar to the ones you list below. Ever since I disabled MQ (either recompile
> your kernel or add `scsi_mod.use_blk_mq=0` to the kernel command line) none of those errors came back.
> 
I'll try that. Thanks a lot for the hint!

Guenter

> hope it helps,
> 
> marek
> > [trying again, this time with correct kernel.org address]
> > 
> > Hi,
> > 
> > I have seen the following and similar problems several times,
> > with both v4.19.3 and v4.19.4:
> > 
> > Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602889: comm git: bad
> > extra_isize 33661 (inode size 256)
> > Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device sdb1-8.
> > Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting filesystem read-only
> > Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602881: comm rm: bad
> > extra_isize 33685 (inode size 256)
> > ...
> > 
> > Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): ext4_lookup:1578: inode #238034131: comm
> > updatedb.mlocat: deleted inode referenced: 238160407
> > Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device sda1-8.
> > Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting filesystem read-only
> > ...
> > 
> > Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): ext4_lookup:1578: inode #52038457: comm
> > nfsd: deleted inode referenced: 52043796
> > Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device sda1-8.
> > Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting filesystem read-only
> > 
> > 
> > The same systems running v4.18.6 never experienced a problem.
> > 
> > Has anyone else seen similar problems ? Is there anything I can do
> > to help tracking down the problem ?
> > 
> > Thanks,
> > Guenter
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 14:48   ` Marek Habersack
  2018-11-27 17:31     ` Guenter Roeck
@ 2018-11-27 18:55     ` Rainer Fiebig
  2018-11-27 21:22       ` Guenter Roeck
  1 sibling, 1 reply; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-27 18:55 UTC (permalink / raw)
  To: linux-kernel, grendel
  Cc: Guenter Roeck, Theodore Ts'o, Andreas Dilger, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

Am Dienstag, 27. November 2018, 15:48:19 schrieb Marek Habersack:
> On 27/11/2018 15:32, Guenter Roeck wrote:
> Hi,
> 
> You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel
> config. Starting with 4.19.1 it somehow interferes with ext4 and causes
> problems similar to the ones you list below. Ever since I disabled MQ
> (either recompile your kernel or add `scsi_mod.use_blk_mq=0` to the kernel
> command line) none of those errors came back.
> 
> hope it helps,
> 
> marek

Unfortunately, this doesn't seem to work in every case: 
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c54

And I'm using a defconfig-4.19.3 (meaning: CONFIG_SCSI_MQ_DEFAULT=yes) in a VM 
and I'm not seeing those errors there. OK, it's a VM - but anyway.

The definite cause of this can only be found by bisecting, IMO. And it needs 
to be pinned down because else some feeling of insecurity will remain.

So long!

Rainer Fiebig

> 
> > [trying again, this time with correct kernel.org address]
> > 
> > Hi,
> > 
> > I have seen the following and similar problems several times,
> > with both v4.19.3 and v4.19.4:
> > 
> > Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1):
> > ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode
> > size 256)
> > Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device
> > sdb1-8. Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1):
> > Remounting filesystem read-only Nov 23 04:32:25 mars kernel:
> > [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode
> > #12602881: comm rm: bad extra_isize 33685 (inode size 256)
> > ...
> > 
> > Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1):
> > ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode
> > referenced: 238160407
> > Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device
> > sda1-8. Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1):
> > Remounting filesystem read-only ...
> > 
> > Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device
> > sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode
> > referenced: 52043796
> > Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device
> > sda1-8. Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1):
> > Remounting filesystem read-only
> > 
> > 
> > The same systems running v4.18.6 never experienced a problem.
> > 
> > Has anyone else seen similar problems ? Is there anything I can do
> > to help tracking down the problem ?
> > 
> > Thanks,
> > Guenter

-- 
The truth always turns out to be simpler than you thought.
Richard Feynman

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 18:55     ` Rainer Fiebig
@ 2018-11-27 21:22       ` Guenter Roeck
  2018-11-28  1:57         ` Vito Caputo
  2018-11-28  9:56         ` Rainer Fiebig
  0 siblings, 2 replies; 27+ messages in thread
From: Guenter Roeck @ 2018-11-27 21:22 UTC (permalink / raw)
  To: Rainer Fiebig
  Cc: linux-kernel, grendel, Theodore Ts'o, Andreas Dilger, linux-ext4

On Tue, Nov 27, 2018 at 07:55:01PM +0100, Rainer Fiebig wrote:
> Am Dienstag, 27. November 2018, 15:48:19 schrieb Marek Habersack:
> > On 27/11/2018 15:32, Guenter Roeck wrote:
> > Hi,
> > 
> > You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel
> > config. Starting with 4.19.1 it somehow interferes with ext4 and causes
> > problems similar to the ones you list below. Ever since I disabled MQ
> > (either recompile your kernel or add `scsi_mod.use_blk_mq=0` to the kernel
> > command line) none of those errors came back.
> > 
> > hope it helps,
> > 
> > marek
> 
> Unfortunately, this doesn't seem to work in every case: 
> https://bugzilla.kernel.org/show_bug.cgi?id=201685#c54
> 
> And I'm using a defconfig-4.19.3 (meaning: CONFIG_SCSI_MQ_DEFAULT=yes) in a VM 
> and I'm not seeing those errors there. OK, it's a VM - but anyway.
> 

Agreed. I disabled CONFIG_SCSI_MQ_DEFAULT, but the problem is still seen
at least on one of my servers, so disabling it does not help, at least not
in my case.

If the problem is somehow related to CONFIG_SCSI_MQ_DEFAULT, you might
have to explicitly use a scsi drive (virtio-scsi-pci or similar) to
trigger its use in a VM.

Guenter

> The definite cause of this can only be found by bisecting, IMO. And it needs 
> to be pinned down because else some feeling of insecurity will remain.
> 
> So long!
> 
> Rainer Fiebig
> 
> > 
> > > [trying again, this time with correct kernel.org address]
> > > 
> > > Hi,
> > > 
> > > I have seen the following and similar problems several times,
> > > with both v4.19.3 and v4.19.4:
> > > 
> > > Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1):
> > > ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode
> > > size 256)
> > > Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device
> > > sdb1-8. Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1):
> > > Remounting filesystem read-only Nov 23 04:32:25 mars kernel:
> > > [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode
> > > #12602881: comm rm: bad extra_isize 33685 (inode size 256)
> > > ...
> > > 
> > > Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1):
> > > ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode
> > > referenced: 238160407
> > > Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device
> > > sda1-8. Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1):
> > > Remounting filesystem read-only ...
> > > 
> > > Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device
> > > sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode
> > > referenced: 52043796
> > > Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device
> > > sda1-8. Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1):
> > > Remounting filesystem read-only
> > > 
> > > 
> > > The same systems running v4.18.6 never experienced a problem.
> > > 
> > > Has anyone else seen similar problems ? Is there anything I can do
> > > to help tracking down the problem ?
> > > 
> > > Thanks,
> > > Guenter
> 
> -- 
> The truth always turns out to be simpler than you thought.
> Richard Feynman



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 14:32 ` ext4 file system corruption with v4.19.3 / v4.19.4 Guenter Roeck
  2018-11-27 14:48   ` Marek Habersack
  2018-11-27 15:50   ` Rainer Fiebig
@ 2018-11-28  0:16   ` Andrey Jr. Melnikov
  2018-11-28  4:15     ` Theodore Y. Ts'o
  2 siblings, 1 reply; 27+ messages in thread
From: Andrey Jr. Melnikov @ 2018-11-28  0:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4

In gmane.comp.file-systems.ext4 Guenter Roeck <linux@roeck-us.net> wrote:
> [trying again, this time with correct kernel.org address]

> Hi,

> I have seen the following and similar problems several times,
> with both v4.19.3 and v4.19.4:

> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode size 256)
> Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device sdb1-8.
> Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting filesystem read-only
> Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode #12602881: comm rm: bad extra_isize 33685 (inode size 256)
> ...

> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode referenced: 238160407
> Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device sda1-8.
> Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting filesystem read-only
> ...

> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode referenced: 52043796
> Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device sda1-8.
> Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting filesystem read-only


> The same systems running v4.18.6 never experienced a problem.

> Has anyone else seen similar problems ? Is there anything I can do
> to help tracking down the problem ?

I see this problem on 4.19.1, 4.19.2 & 4.19.5 (4.19.3 & 4.19.4 - skipped).

kernel always report:
ext4_iget:4851: inode #XXXXXXX: comm updatedb.mlocat: checksum invalid

/dev/sde2 mounted as ext4 with relatime,errors=remount-ro options.

last one (from 4.19.5):
[  355.926146] EXT4-fs error (device sde2): ext4_iget:4851: inode #63184919: comm updatedb.mlocat: checksum invalid
[  355.966810] EXT4-fs (sde2): Remounting filesystem read-only
[  355.987887] EXT4-fs error (device sde2): ext4_journal_check_start:61: Detected aborted journal

# debugfs -n /dev/sde2
debugfs 1.44.4 (18-Aug-2018)
debugfs:  ncheck 63184919
Inode   Pathname
63184919        /home/lynxchaus/openwrt/lede/build_dir/target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.15/drivers/gpu
debugfs:  ls /home/lynxchaus/openwrt/lede/build_dir/target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.15/drivers/gpu

/home/lynxchaus/openwrt/lede/build_dir/target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.15/drivers/gpu: Ext2 inode is not a directory
debugfs:  stat /home/lynxchaus/openwrt/lede/build_dir/target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.15/drivers/gpu
Inode: 63184919   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 1951786937    Version: 0x00000000:00000001
User:  1000   Group:  1000   Project:     0   Size: 6013
File ACL: 0
Links: 1   Blockcount: 16
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x59bfa71f:bee151ac -- Mon Sep 18 13:59:43 2017
 atime: 0x59bfa71f:bee151ac -- Mon Sep 18 13:59:43 2017
 mtime: 0x59b9a16d:00000000 -- Thu Sep 14 00:21:49 2017
crtime: 0x59bfa71f:bee151ac -- Mon Sep 18 13:59:43 2017
Size of extra inode fields: 32
Inode checksum: 0x39c8526f
EXTENTS:  
(0-1):214480636-214480637

After repair /home/lynxchaus/openwrt/lede/build_dir/target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.15/drivers/gpu deleted,
all files in it - connected to lost+found.

Corrupted inodes - always directory, not touched at least year or more for writing. Something wrong when updating atime?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 21:22       ` Guenter Roeck
@ 2018-11-28  1:57         ` Vito Caputo
  2018-11-28  9:56         ` Rainer Fiebig
  1 sibling, 0 replies; 27+ messages in thread
From: Vito Caputo @ 2018-11-28  1:57 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rainer Fiebig, linux-kernel, grendel, Theodore Ts'o,
	Andreas Dilger, linux-ext4

On Tue, Nov 27, 2018 at 01:22:55PM -0800, Guenter Roeck wrote:
> On Tue, Nov 27, 2018 at 07:55:01PM +0100, Rainer Fiebig wrote:
> > Am Dienstag, 27. November 2018, 15:48:19 schrieb Marek Habersack:
> > > On 27/11/2018 15:32, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel
> > > config. Starting with 4.19.1 it somehow interferes with ext4 and causes
> > > problems similar to the ones you list below. Ever since I disabled MQ
> > > (either recompile your kernel or add `scsi_mod.use_blk_mq=0` to the kernel
> > > command line) none of those errors came back.
> > > 
> > > hope it helps,
> > > 
> > > marek
> > 
> > Unfortunately, this doesn't seem to work in every case: 
> > https://bugzilla.kernel.org/show_bug.cgi?id=201685#c54
> > 
> > And I'm using a defconfig-4.19.3 (meaning: CONFIG_SCSI_MQ_DEFAULT=yes) in a VM 
> > and I'm not seeing those errors there. OK, it's a VM - but anyway.
> > 
> 
> Agreed. I disabled CONFIG_SCSI_MQ_DEFAULT, but the problem is still seen
> at least on one of my servers, so disabling it does not help, at least not
> in my case.
> 
> If the problem is somehow related to CONFIG_SCSI_MQ_DEFAULT, you might
> have to explicitly use a scsi drive (virtio-scsi-pci or similar) to
> trigger its use in a VM.
> 
> Guenter
> 
> > The definite cause of this can only be found by bisecting, IMO. And it needs 
> > to be pinned down because else some feeling of insecurity will remain.
> > 
> > So long!
> > 
> > Rainer Fiebig
> > 
> > > 
> > > > [trying again, this time with correct kernel.org address]
> > > > 
> > > > Hi,
> > > > 
> > > > I have seen the following and similar problems several times,
> > > > with both v4.19.3 and v4.19.4:
> > > > 
> > > > Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1):
> > > > ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode
> > > > size 256)
> > > > Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device
> > > > sdb1-8. Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1):
> > > > Remounting filesystem read-only Nov 23 04:32:25 mars kernel:
> > > > [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode
> > > > #12602881: comm rm: bad extra_isize 33685 (inode size 256)
> > > > ...
> > > > 
> > > > Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1):
> > > > ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode
> > > > referenced: 238160407
> > > > Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device
> > > > sda1-8. Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1):
> > > > Remounting filesystem read-only ...
> > > > 
> > > > Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device
> > > > sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode
> > > > referenced: 52043796
> > > > Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device
> > > > sda1-8. Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1):
> > > > Remounting filesystem read-only
> > > > 
> > > > 
> > > > The same systems running v4.18.6 never experienced a problem.
> > > > 
> > > > Has anyone else seen similar problems ? Is there anything I can do
> > > > to help tracking down the problem ?
> > > > 
> > > > Thanks,
> > > > Guenter
> > 


Not sure how relevant this is, but I had emailed the list earlier in the
month reporting totally bogus fs/SATA errors following an fstrim in
4.19.  I didn't have much information to add, as the logs were all lost,
and I didn't have any interest in trying to reproduce it on my daily
driven laptop.

I've just been running 4.17 since then (4.18 has some annoying i915 drm
bugs), and things have been perfectly fine in the storage/filesystem
department.

What I had noticed as being suspect back then was the following:

$ git tag --contains 744889b7cbb56a6
v4.19
v4.19.1
v4.19.2
v4.19.3
v4.19.4
v4.19.5
v4.20-rc1
v4.20-rc2
v4.20-rc3
v4.20-rc4
$ git tag --contains 1adfc5e4136f5967
v4.20-rc2
v4.20-rc3
v4.20-rc4
$

Since the 744889b7 commit message talks specifically about discard, and
1adfc5e4 claims to fix 744889b7, I assumed it was probably responsible
considering the tags profile, but did not try understand the commits or
bisect.

FYI the machine I observed this on is a SATA-attached SSD (Samsung 840
EVO 250G) X61s.  I only run fstrim manually, but of course with discard
enabled all the way down the lvm+dmcrypt stack.

Maybe that's of use in hunting down this bug.  If nobody else bisects in
the coming weeks I'll have to reconsider the rigamarole of backups,
repro, and attempting a bisect.

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28  0:16   ` Andrey Jr. Melnikov
@ 2018-11-28  4:15     ` Theodore Y. Ts'o
  2018-11-28  8:02       ` Marek Habersack
                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-28  4:15 UTC (permalink / raw)
  To: Andrey Jr. Melnikov; +Cc: linux-kernel, linux-ext4

On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> Corrupted inodes - always directory, not touched at least year or
> more for writing. Something wrong when updating atime?

We're not sure.  The frustrating thing is that it's not reproducing
for me.  I run extensive regression tests, and I'm using 4.19 on my
development laptop without notcing any problems.  If I could reproduce
it, I could debug it, but since I can't, I need to rely on those who
are seeing the problem to help pinpoint the problem.

I'm trying to figure out common factors from those people who are
reporting problems.

(a) What distribution are you running (it appears that many people
reporting problems are running Ubuntu, but this may be a sampling
issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
Testing.)

(b) What hardware are you using?  (SSD?  SATA-attached?
NVMe-attached?)

(c) Are you using LVM?  LUKS (e.g., disk encrypted)?

(d) are you using discard?  One theory is a recent discard change may
be in play.   How do you use discard?   (mount option, fstrim, etc.)

      	      	     	     		- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28  4:15     ` Theodore Y. Ts'o
@ 2018-11-28  8:02       ` Marek Habersack
  2018-11-28 10:02       ` Andrey Jr. Melnikov
  2018-11-28 13:28       ` Andrey Melnikov
  2 siblings, 0 replies; 27+ messages in thread
From: Marek Habersack @ 2018-11-28  8:02 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Andrey Jr. Melnikov, linux-kernel, linux-ext4

On 28/11/2018 05:15, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
>> Corrupted inodes - always directory, not touched at least year or
>> more for writing. Something wrong when updating atime?
I've just seen the errors come back despite having MQ off :( However, this time it took 5 days for them to come back, so
MQ must play a role here. Also, indeed, they happened after fstrim ran and this time *only* on the SSD disks reported
below, another clue? This time the errors were "just" orphaned inodes + invalid free inode counts, all repaired without
issues by fsck.

> 
> We're not sure.  The frustrating thing is that it's not reproducing
> for me.  I run extensive regression tests, and I'm using 4.19 on my
> development laptop without notcing any problems.  If I could reproduce
> it, I could debug it, but since I can't, I need to rely on those who
> are seeing the problem to help pinpoint the problem.
> 
> I'm trying to figure out common factors from those people who are
> reporting problems.
> 
> (a) What distribution are you running (it appears that many people
> reporting problems are running Ubuntu, but this may be a sampling
> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> Testing.)
Ubuntu 18.10 here

> 
> (b) What hardware are you using?  (SSD?  SATA-attached?
> NVMe-attached?)
The errors occured on both SSD:
  - Samsung SSD 850 EVO 1TB, firmware rev EMT03B6Q
  - OCZ-AGILITY3, firmware rev 2.25

and spinning rust:
  - Seagate ST2000DX001-1CM164, firmware revision CC43

> 
> (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
LUKS. Both the Samsung and the Seagate use DM for encryption.

> (d) are you using discard?  One theory is a recent discard change may
> be in play.   How do you use discard?   (mount option, fstrim, etc.)
fstrim runs weekly and the Samsung SSD is mounted with

   rw,nosuid,nodev,noatime,discard,helper=crypt

marek
> 
>       	      	     	     		- Ted
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-27 21:22       ` Guenter Roeck
  2018-11-28  1:57         ` Vito Caputo
@ 2018-11-28  9:56         ` Rainer Fiebig
  1 sibling, 0 replies; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-28  9:56 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, grendel, Theodore Ts'o, Andreas Dilger, linux-ext4


[-- Attachment #1.1: Type: text/plain, Size: 3658 bytes --]

Am 27.11.18 um 22:22 schrieb Guenter Roeck:
> On Tue, Nov 27, 2018 at 07:55:01PM +0100, Rainer Fiebig wrote:
>> Am Dienstag, 27. November 2018, 15:48:19 schrieb Marek Habersack:
>>> On 27/11/2018 15:32, Guenter Roeck wrote:
>>> Hi,
>>>
>>> You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel
>>> config. Starting with 4.19.1 it somehow interferes with ext4 and causes
>>> problems similar to the ones you list below. Ever since I disabled MQ
>>> (either recompile your kernel or add `scsi_mod.use_blk_mq=0` to the kernel
>>> command line) none of those errors came back.
>>>
>>> hope it helps,
>>>
>>> marek
>>
>> Unfortunately, this doesn't seem to work in every case: 
>> https://bugzilla.kernel.org/show_bug.cgi?id=201685#c54
>>
>> And I'm using a defconfig-4.19.3 (meaning: CONFIG_SCSI_MQ_DEFAULT=yes) in a VM 
>> and I'm not seeing those errors there. OK, it's a VM - but anyway.
>>
> 
> Agreed. I disabled CONFIG_SCSI_MQ_DEFAULT, but the problem is still seen
> at least on one of my servers, so disabling it does not help, at least not
> in my case.
> 
> If the problem is somehow related to CONFIG_SCSI_MQ_DEFAULT, you might
> have to explicitly use a scsi drive (virtio-scsi-pci or similar) to
> trigger its use in a VM.

It seems more likely to me now that it may have nothing to do with the
SCSI-settings. Perhaps with some other config-option that's not set in a
defconfig.

I had hoped the problem would show up in the VM, so I could have safely
bisected it there. But tough luck.

So long!

Rainer Fiebig


> 
> Guenter
> 
>> The definite cause of this can only be found by bisecting, IMO. And it needs 
>> to be pinned down because else some feeling of insecurity will remain.
>>
>> So long!
>>
>> Rainer Fiebig
>>
>>>
>>>> [trying again, this time with correct kernel.org address]
>>>>
>>>> Hi,
>>>>
>>>> I have seen the following and similar problems several times,
>>>> with both v4.19.3 and v4.19.4:
>>>>
>>>> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1):
>>>> ext4_iget:4831: inode #12602889: comm git: bad extra_isize 33661 (inode
>>>> size 256)
>>>> Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device
>>>> sdb1-8. Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1):
>>>> Remounting filesystem read-only Nov 23 04:32:25 mars kernel:
>>>> [112668.808886] EXT4-fs error (device sdb1): ext4_iget:4831: inode
>>>> #12602881: comm rm: bad extra_isize 33685 (inode size 256)
>>>> ...
>>>>
>>>> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1):
>>>> ext4_lookup:1578: inode #238034131: comm updatedb.mlocat: deleted inode
>>>> referenced: 238160407
>>>> Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device
>>>> sda1-8. Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1):
>>>> Remounting filesystem read-only ...
>>>>
>>>> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device
>>>> sda1): ext4_lookup:1578: inode #52038457: comm nfsd: deleted inode
>>>> referenced: 52043796
>>>> Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device
>>>> sda1-8. Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1):
>>>> Remounting filesystem read-only
>>>>
>>>>
>>>> The same systems running v4.18.6 never experienced a problem.
>>>>
>>>> Has anyone else seen similar problems ? Is there anything I can do
>>>> to help tracking down the problem ?
>>>>
>>>> Thanks,
>>>> Guenter
>>
>> -- 
>> The truth always turns out to be simpler than you thought.
>> Richard Feynman
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28  4:15     ` Theodore Y. Ts'o
  2018-11-28  8:02       ` Marek Habersack
@ 2018-11-28 10:02       ` Andrey Jr. Melnikov
  2018-11-28 15:56         ` Rainer Fiebig
  2018-11-28 13:28       ` Andrey Melnikov
  2 siblings, 1 reply; 27+ messages in thread
From: Andrey Jr. Melnikov @ 2018-11-28 10:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4

In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> > Corrupted inodes - always directory, not touched at least year or
> > more for writing. Something wrong when updating atime?

> We're not sure.  The frustrating thing is that it's not reproducing
> for me.  I run extensive regression tests, and I'm using 4.19 on my
> development laptop without notcing any problems.  If I could reproduce
> it, I could debug it, but since I can't, I need to rely on those who
> are seeing the problem to help pinpoint the problem.

My workstation hit this bug every time after boot. If you have an idea - I
may test it.

> I'm trying to figure out common factors from those people who are
> reporting problems.

> (a) What distribution are you running (it appears that many people
> reporting problems are running Ubuntu, but this may be a sampling
> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> Testing.)
Debian sid but self-build kernel from ubuntu mainline-ppa.

> (b) What hardware are you using?  (SSD?  SATA-attached?
> NVMe-attached?)
SATA HDD WDC WD20EZRZ-00Z5HB0.

> (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
No and no. Plain ext4.
-- cut --
debugfs:  features 
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
needs_recovery extent 64bit flex_bg sparse_super large_file huge_file
dir_nlink extra_isize metadata_csum
-- cut --

> (d) are you using discard?  One theory is a recent discard change may
> be in play.   How do you use discard?   (mount option, fstrim, etc.)

no


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28  4:15     ` Theodore Y. Ts'o
  2018-11-28  8:02       ` Marek Habersack
  2018-11-28 10:02       ` Andrey Jr. Melnikov
@ 2018-11-28 13:28       ` Andrey Melnikov
  2 siblings, 0 replies; 27+ messages in thread
From: Andrey Melnikov @ 2018-11-28 13:28 UTC (permalink / raw)
  To: Theodore Ts'o, linux-kernel, linux-ext4

ср, 28 нояб. 2018 г. в 07:15, Theodore Y. Ts'o <tytso@mit.edu>:
>
> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> > Corrupted inodes - always directory, not touched at least year or
> > more for writing. Something wrong when updating atime?
>
> We're not sure.  The frustrating thing is that it's not reproducing
> for me.  I run extensive regression tests, and I'm using 4.19 on my
> development laptop without notcing any problems.  If I could reproduce
> it, I could debug it, but since I can't, I need to rely on those who
> are seeing the problem to help pinpoint the problem.
>
> I'm trying to figure out common factors from those people who are
> reporting problems.

For me - switching off CONFIG_EXT4_ENCRYPTION helps. machine survive
run updatedb & kernel compile. Switch it back to ON - crashed again:

[  118.204547] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155777: comm updatedb: bad extra_isize 13898 (inode size 256)
[  118.248928] Aborting journal on device sde2-8.
[  118.272239] EXT4-fs (sde2): Remounting filesystem read-only
[  118.293364] EXT4-fs error (device sde2):
ext4_journal_check_start:61: Detected aborted journal
[  118.315146] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155778: comm updatedb: bad extra_isize 38198 (inode size 256)
[  118.326760] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155780: comm updatedb: bad extra_isize 5023 (inode size 256)
[  118.337916] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155782: comm updatedb: bad extra_isize 44080 (inode size 256)
[  118.349149] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155784: comm updatedb: bad extra_isize 56093 (inode size 256)
[  118.360336] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155786: comm updatedb: bad extra_isize 13308 (inode size 256)
[  118.387113] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155788: comm updatedb: bad extra_isize 38399 (inode size 256)
[  118.422779] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155790: comm updatedb: bad extra_isize 24014 (inode size 256)
[  118.438124] EXT4-fs error (device sde2): ext4_iget:4831: inode
#14155792: comm updatedb: bad extra_isize 23262 (inode size 256)

CONFIG_SCSI_MQ_DEFAULT=y in both cases.

Any ideas?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 10:02       ` Andrey Jr. Melnikov
@ 2018-11-28 15:56         ` Rainer Fiebig
  2018-11-28 16:10           ` Theodore Y. Ts'o
  2018-11-28 21:13           ` Andrey Melnikov
  0 siblings, 2 replies; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-28 15:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrey Jr. Melnikov, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2481 bytes --]

Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
> In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
> > On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> > > Corrupted inodes - always directory, not touched at least year or
> > > more for writing. Something wrong when updating atime?
> > 
> > We're not sure.  The frustrating thing is that it's not reproducing
> > for me.  I run extensive regression tests, and I'm using 4.19 on my
> > development laptop without notcing any problems.  If I could reproduce
> > it, I could debug it, but since I can't, I need to rely on those who
> > are seeing the problem to help pinpoint the problem.
> 
> My workstation hit this bug every time after boot. If you have an idea - I
> may test it.
> 
> > I'm trying to figure out common factors from those people who are
> > reporting problems.
> > 
> > (a) What distribution are you running (it appears that many people
> > reporting problems are running Ubuntu, but this may be a sampling
> > issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> > Testing.)
> 
> Debian sid but self-build kernel from ubuntu mainline-ppa.

You could try a vanilla 4.19.5 from https://www.kernel.org/
and compile it with your current .config.

If you still see the errors, at least the Ubuntu-kernel could be ruled out.

In addition, if you still see the errors:

- backup your .config in a *different* folder (so that you can later re-use 
it)
- do a "make mrproper" (deletes the .config, see above)
- do a "make defconfig"
- and compile the kernel with that new .config

If you still have the problem after that, you may want to learn how to bisect. 
;)

So long!

Rainer Fiebig


> 
> > (b) What hardware are you using?  (SSD?  SATA-attached?
> > NVMe-attached?)
> 
> SATA HDD WDC WD20EZRZ-00Z5HB0.
> 
> > (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
> 
> No and no. Plain ext4.
> -- cut --
> debugfs:  features
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> needs_recovery extent 64bit flex_bg sparse_super large_file huge_file
> dir_nlink extra_isize metadata_csum
> -- cut --
> 
> > (d) are you using discard?  One theory is a recent discard change may
> > be in play.   How do you use discard?   (mount option, fstrim, etc.)
> 
> no

-- 
The truth always turns out to be simpler than you thought.
Richard Feynman

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 15:56         ` Rainer Fiebig
@ 2018-11-28 16:10           ` Theodore Y. Ts'o
  2018-11-28 16:18             ` Marek Habersack
  2018-11-28 17:01             ` Rainer Fiebig
  2018-11-28 21:13           ` Andrey Melnikov
  1 sibling, 2 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-28 16:10 UTC (permalink / raw)
  To: Rainer Fiebig; +Cc: linux-kernel, Andrey Jr. Melnikov, linux-ext4

On Wed, Nov 28, 2018 at 04:56:51PM +0100, Rainer Fiebig wrote:
> 
> If you still see the errors, at least the Ubuntu-kernel could be ruled out.

My impression is that some of the people reporting problems have been
using stock upstream kernels, so I wasn't really worried about the
Ubuntu kernel (although it could be something about the default
configs that Ubuntu sets up).  What I was more wondering was whether
there was something about userspace or default configs of Ubuntu.
This isn't necessarily a *problem* per se; for examople, not that long
ago some users were getting surprised when a problem showed up with an
older version of the LVM2 userspace with newer upstream kernels.
After a while, you learn to get super paranoid about making sure to
rule out all possibilities when trying to debug problems that are only
hitting a set of users.

						- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 16:10           ` Theodore Y. Ts'o
@ 2018-11-28 16:18             ` Marek Habersack
  2018-11-28 17:01             ` Rainer Fiebig
  1 sibling, 0 replies; 27+ messages in thread
From: Marek Habersack @ 2018-11-28 16:18 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Rainer Fiebig, linux-kernel,
	Andrey Jr. Melnikov, linux-ext4

On 28/11/2018 17:10, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 04:56:51PM +0100, Rainer Fiebig wrote:
>>
>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> 
> My impression is that some of the people reporting problems have been
> using stock upstream kernels, so I wasn't really worried about the
Also, the Ubuntu mainline kernel doesn't patch the kernel code, it merely uses Ubuntu configs to build the stock kerenel
(you can find the patches in e.g. http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19.5/ at the top of the directory)

> Ubuntu kernel (although it could be something about the default
> configs that Ubuntu sets up).  What I was more wondering was whether
> there was something about userspace or default configs of Ubuntu.
> This isn't necessarily a *problem* per se; for examople, not that long
> ago some users were getting surprised when a problem showed up with an
> older version of the LVM2 userspace with newer upstream kernels.
> After a while, you learn to get super paranoid about making sure to
> rule out all possibilities when trying to debug problems that are only
> hitting a set of users.
> 
> 						- Ted
> 

marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 16:10           ` Theodore Y. Ts'o
  2018-11-28 16:18             ` Marek Habersack
@ 2018-11-28 17:01             ` Rainer Fiebig
  1 sibling, 0 replies; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-28 17:01 UTC (permalink / raw)
  To: Theodore Y. Ts'o, linux-kernel, Andrey Jr. Melnikov, linux-ext4


[-- Attachment #1.1: Type: text/plain, Size: 1083 bytes --]

Am 28.11.18 um 17:10 schrieb Theodore Y. Ts'o:
> On Wed, Nov 28, 2018 at 04:56:51PM +0100, Rainer Fiebig wrote:
>>
>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> 
> My impression is that some of the people reporting problems have been
> using stock upstream kernels, so I wasn't really worried about the
> Ubuntu kernel (although it could be something about the default
> configs that Ubuntu sets up).  What I was more wondering was whether
> there was something about userspace or default configs of Ubuntu.
> This isn't necessarily a *problem* per se; for examople, not that long
> ago some users were getting surprised when a problem showed up with an
> older version of the LVM2 userspace with newer upstream kernels.
> After a while, you learn to get super paranoid about making sure to
> rule out all possibilities when trying to debug problems that are only
> hitting a set of users.
> 
> 						- Ted
> 

OK, thanks. Perhaps Andrey can tell us then what impact the default
.config had on the problem.

Rainer Fiebig


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 15:56         ` Rainer Fiebig
  2018-11-28 16:10           ` Theodore Y. Ts'o
@ 2018-11-28 21:13           ` Andrey Melnikov
  2018-11-28 22:09             ` Rainer Fiebig
  1 sibling, 1 reply; 27+ messages in thread
From: Andrey Melnikov @ 2018-11-28 21:13 UTC (permalink / raw)
  To: jrf; +Cc: linux-kernel, linux-ext4

ср, 28 нояб. 2018 г. в 18:55, Rainer Fiebig <jrf@mailbox.org>:
>
> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
> > In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
> > > On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> > > > Corrupted inodes - always directory, not touched at least year or
> > > > more for writing. Something wrong when updating atime?
> > >
> > > We're not sure.  The frustrating thing is that it's not reproducing
> > > for me.  I run extensive regression tests, and I'm using 4.19 on my
> > > development laptop without notcing any problems.  If I could reproduce
> > > it, I could debug it, but since I can't, I need to rely on those who
> > > are seeing the problem to help pinpoint the problem.
> >
> > My workstation hit this bug every time after boot. If you have an idea - I
> > may test it.
> >
> > > I'm trying to figure out common factors from those people who are
> > > reporting problems.
> > >
> > > (a) What distribution are you running (it appears that many people
> > > reporting problems are running Ubuntu, but this may be a sampling
> > > issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> > > Testing.)
> >
> > Debian sid but self-build kernel from ubuntu mainline-ppa.
>
> You could try a vanilla 4.19.5 from https://www.kernel.org/
> and compile it with your current .config.

mainline-ppa use vanilla kernel. Patches only adds debian specific
build infrastructure.

> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
>
> In addition, if you still see the errors:
>
> - backup your .config in a *different* folder (so that you can later re-use
> it)
> - do a "make mrproper" (deletes the .config, see above)
> - do a "make defconfig"
> - and compile the kernel with that new .config

defconfig is great - for abstract hardware in vacuum.

> If you still have the problem after that, you may want to learn how to bisect.
> ;)
I'm already know how-to bisect. From kernel 2.0 era. Without git ;)

This problem simply non-bisectable, when same kernel corrupt FS on my
workstation but normally working on other servers.
And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.

> So long!
>
> Rainer Fiebig
>
>
> >
> > > (b) What hardware are you using?  (SSD?  SATA-attached?
> > > NVMe-attached?)
> >
> > SATA HDD WDC WD20EZRZ-00Z5HB0.
> >
> > > (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
> >
> > No and no. Plain ext4.
> > -- cut --
> > debugfs:  features
> > Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> > needs_recovery extent 64bit flex_bg sparse_super large_file huge_file
> > dir_nlink extra_isize metadata_csum
> > -- cut --
> >
> > > (d) are you using discard?  One theory is a recent discard change may
> > > be in play.   How do you use discard?   (mount option, fstrim, etc.)
> >
> > no
>
> --
> The truth always turns out to be simpler than you thought.
> Richard Feynman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 21:13           ` Andrey Melnikov
@ 2018-11-28 22:09             ` Rainer Fiebig
  2018-12-02 20:19               ` Andrey Melnikov
  0 siblings, 1 reply; 27+ messages in thread
From: Rainer Fiebig @ 2018-11-28 22:09 UTC (permalink / raw)
  To: Andrey Melnikov; +Cc: linux-kernel, linux-ext4


[-- Attachment #1.1: Type: text/plain, Size: 3248 bytes --]

Am 28.11.18 um 22:13 schrieb Andrey Melnikov:
> ср, 28 нояб. 2018 г. в 18:55, Rainer Fiebig <jrf@mailbox.org>:
>>
>> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
>>> In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
>>>> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
>>>>> Corrupted inodes - always directory, not touched at least year or
>>>>> more for writing. Something wrong when updating atime?
>>>>
>>>> We're not sure.  The frustrating thing is that it's not reproducing
>>>> for me.  I run extensive regression tests, and I'm using 4.19 on my
>>>> development laptop without notcing any problems.  If I could reproduce
>>>> it, I could debug it, but since I can't, I need to rely on those who
>>>> are seeing the problem to help pinpoint the problem.
>>>
>>> My workstation hit this bug every time after boot. If you have an idea - I
>>> may test it.
>>>
>>>> I'm trying to figure out common factors from those people who are
>>>> reporting problems.
>>>>
>>>> (a) What distribution are you running (it appears that many people
>>>> reporting problems are running Ubuntu, but this may be a sampling
>>>> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
>>>> Testing.)
>>>
>>> Debian sid but self-build kernel from ubuntu mainline-ppa.
>>
>> You could try a vanilla 4.19.5 from https://www.kernel.org/
>> and compile it with your current .config.
> 
> mainline-ppa use vanilla kernel. Patches only adds debian specific
> build infrastructure.
> 
>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
>>
>> In addition, if you still see the errors:
>>
>> - backup your .config in a *different* folder (so that you can later re-use
>> it)
>> - do a "make mrproper" (deletes the .config, see above)
>> - do a "make defconfig"
>> - and compile the kernel with that new .config
> 
> defconfig is great - for abstract hardware in vacuum.
> 
>> If you still have the problem after that, you may want to learn how to bisect.
>> ;)
> I'm already know how-to bisect. From kernel 2.0 era. Without git ;)
> 
> This problem simply non-bisectable, when same kernel corrupt FS on my
> workstation but normally working on other servers.
> And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.

OK, - and now we are looking forward to *your* ideas how to solve this.

> 
>> So long!
>>
>> Rainer Fiebig
>>
>>
>>>
>>>> (b) What hardware are you using?  (SSD?  SATA-attached?
>>>> NVMe-attached?)
>>>
>>> SATA HDD WDC WD20EZRZ-00Z5HB0.
>>>
>>>> (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
>>>
>>> No and no. Plain ext4.
>>> -- cut --
>>> debugfs:  features
>>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
>>> needs_recovery extent 64bit flex_bg sparse_super large_file huge_file
>>> dir_nlink extra_isize metadata_csum
>>> -- cut --
>>>
>>>> (d) are you using discard?  One theory is a recent discard change may
>>>> be in play.   How do you use discard?   (mount option, fstrim, etc.)
>>>
>>> no
>>
>> --
>> The truth always turns out to be simpler than you thought.
>> Richard Feynman



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-11-28 22:09             ` Rainer Fiebig
@ 2018-12-02 20:19               ` Andrey Melnikov
  2018-12-02 22:13                 ` Rainer Fiebig
  0 siblings, 1 reply; 27+ messages in thread
From: Andrey Melnikov @ 2018-12-02 20:19 UTC (permalink / raw)
  To: jrf, Theodore Ts'o; +Cc: linux-kernel, linux-ext4

чт, 29 нояб. 2018 г. в 01:08, Rainer Fiebig <jrf@mailbox.org>:
>
> Am 28.11.18 um 22:13 schrieb Andrey Melnikov:
> > ср, 28 нояб. 2018 г. в 18:55, Rainer Fiebig <jrf@mailbox.org>:
> >>
> >> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
> >>> In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >>>> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> >>>>> Corrupted inodes - always directory, not touched at least year or
> >>>>> more for writing. Something wrong when updating atime?
> >>>>
> >>>> We're not sure.  The frustrating thing is that it's not reproducing
> >>>> for me.  I run extensive regression tests, and I'm using 4.19 on my
> >>>> development laptop without notcing any problems.  If I could reproduce
> >>>> it, I could debug it, but since I can't, I need to rely on those who
> >>>> are seeing the problem to help pinpoint the problem.
> >>>
> >>> My workstation hit this bug every time after boot. If you have an idea - I
> >>> may test it.
> >>>
> >>>> I'm trying to figure out common factors from those people who are
> >>>> reporting problems.
> >>>>
> >>>> (a) What distribution are you running (it appears that many people
> >>>> reporting problems are running Ubuntu, but this may be a sampling
> >>>> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> >>>> Testing.)
> >>>
> >>> Debian sid but self-build kernel from ubuntu mainline-ppa.
> >>
> >> You could try a vanilla 4.19.5 from https://www.kernel.org/
> >> and compile it with your current .config.
> >
> > mainline-ppa use vanilla kernel. Patches only adds debian specific
> > build infrastructure.
> >
> >> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> >>
> >> In addition, if you still see the errors:
> >>
> >> - backup your .config in a *different* folder (so that you can later re-use
> >> it)
> >> - do a "make mrproper" (deletes the .config, see above)
> >> - do a "make defconfig"
> >> - and compile the kernel with that new .config
> >
> > defconfig is great - for abstract hardware in vacuum.
> >
> >> If you still have the problem after that, you may want to learn how to bisect.
> >> ;)
> > I'm already know how-to bisect. From kernel 2.0 era. Without git ;)
> >
> > This problem simply non-bisectable, when same kernel corrupt FS on my
> > workstation but normally working on other servers.
> > And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.
>
> OK, - and now we are looking forward to *your* ideas how to solve this.

After four days playing games around git bisect - real winner is
debian gcc-8.2.0-9. Upgrade it to 8.2.0-10 or use 7.3.0-30 version for
same kernel + config - does not exhibit ext4 corruption.

I think I hit this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87859
with 8.2.0-9 version.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-12-02 20:19               ` Andrey Melnikov
@ 2018-12-02 22:13                 ` Rainer Fiebig
  2018-12-05 12:58                   ` Andrey Melnikov
  0 siblings, 1 reply; 27+ messages in thread
From: Rainer Fiebig @ 2018-12-02 22:13 UTC (permalink / raw)
  To: Andrey Melnikov, Theodore Ts'o; +Cc: linux-kernel, linux-ext4


[-- Attachment #1.1: Type: text/plain, Size: 3144 bytes --]

Am 02.12.18 um 21:19 schrieb Andrey Melnikov:
> чт, 29 нояб. 2018 г. в 01:08, Rainer Fiebig <jrf@mailbox.org>:
>>
>> Am 28.11.18 um 22:13 schrieb Andrey Melnikov:
>>> ср, 28 нояб. 2018 г. в 18:55, Rainer Fiebig <jrf@mailbox.org>:
>>>>
>>>> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
>>>>> In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
>>>>>> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
>>>>>>> Corrupted inodes - always directory, not touched at least year or
>>>>>>> more for writing. Something wrong when updating atime?
>>>>>>
>>>>>> We're not sure.  The frustrating thing is that it's not reproducing
>>>>>> for me.  I run extensive regression tests, and I'm using 4.19 on my
>>>>>> development laptop without notcing any problems.  If I could reproduce
>>>>>> it, I could debug it, but since I can't, I need to rely on those who
>>>>>> are seeing the problem to help pinpoint the problem.
>>>>>
>>>>> My workstation hit this bug every time after boot. If you have an idea - I
>>>>> may test it.
>>>>>
>>>>>> I'm trying to figure out common factors from those people who are
>>>>>> reporting problems.
>>>>>>
>>>>>> (a) What distribution are you running (it appears that many people
>>>>>> reporting problems are running Ubuntu, but this may be a sampling
>>>>>> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
>>>>>> Testing.)
>>>>>
>>>>> Debian sid but self-build kernel from ubuntu mainline-ppa.
>>>>
>>>> You could try a vanilla 4.19.5 from https://www.kernel.org/
>>>> and compile it with your current .config.
>>>
>>> mainline-ppa use vanilla kernel. Patches only adds debian specific
>>> build infrastructure.
>>>
>>>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
>>>>
>>>> In addition, if you still see the errors:
>>>>
>>>> - backup your .config in a *different* folder (so that you can later re-use
>>>> it)
>>>> - do a "make mrproper" (deletes the .config, see above)
>>>> - do a "make defconfig"
>>>> - and compile the kernel with that new .config
>>>
>>> defconfig is great - for abstract hardware in vacuum.
>>>
>>>> If you still have the problem after that, you may want to learn how to bisect.
>>>> ;)
>>> I'm already know how-to bisect. From kernel 2.0 era. Without git ;)
>>>
>>> This problem simply non-bisectable, when same kernel corrupt FS on my
>>> workstation but normally working on other servers.
>>> And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.
>>
>> OK, - and now we are looking forward to *your* ideas how to solve this.
> 
> After four days playing games around git bisect - real winner is
> debian gcc-8.2.0-9. Upgrade it to 8.2.0-10 or use 7.3.0-30 version for
> same kernel + config - does not exhibit ext4 corruption.
> 
> I think I hit this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87859
> with 8.2.0-9 version.
> 
Good that it works for you. But others used gcc 5.4.0 or 6.3.0 and were
hit anyway: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c165


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-12-02 22:13                 ` Rainer Fiebig
@ 2018-12-05 12:58                   ` Andrey Melnikov
  2018-12-11  0:11                     ` Pavel Machek
  0 siblings, 1 reply; 27+ messages in thread
From: Andrey Melnikov @ 2018-12-05 12:58 UTC (permalink / raw)
  To: jrf; +Cc: Theodore Ts'o, linux-kernel, linux-ext4

пн, 3 дек. 2018 г. в 01:11, Rainer Fiebig <jrf@mailbox.org>:
>
> Am 02.12.18 um 21:19 schrieb Andrey Melnikov:
> > чт, 29 нояб. 2018 г. в 01:08, Rainer Fiebig <jrf@mailbox.org>:
> >>
> >> Am 28.11.18 um 22:13 schrieb Andrey Melnikov:
> >>> ср, 28 нояб. 2018 г. в 18:55, Rainer Fiebig <jrf@mailbox.org>:
> >>>>
> >>>> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
> >>>>> In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >>>>>> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> >>>>>>> Corrupted inodes - always directory, not touched at least year or
> >>>>>>> more for writing. Something wrong when updating atime?
> >>>>>>
> >>>>>> We're not sure.  The frustrating thing is that it's not reproducing
> >>>>>> for me.  I run extensive regression tests, and I'm using 4.19 on my
> >>>>>> development laptop without notcing any problems.  If I could reproduce
> >>>>>> it, I could debug it, but since I can't, I need to rely on those who
> >>>>>> are seeing the problem to help pinpoint the problem.
> >>>>>
> >>>>> My workstation hit this bug every time after boot. If you have an idea - I
> >>>>> may test it.
> >>>>>
> >>>>>> I'm trying to figure out common factors from those people who are
> >>>>>> reporting problems.
> >>>>>>
> >>>>>> (a) What distribution are you running (it appears that many people
> >>>>>> reporting problems are running Ubuntu, but this may be a sampling
> >>>>>> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> >>>>>> Testing.)
> >>>>>
> >>>>> Debian sid but self-build kernel from ubuntu mainline-ppa.
> >>>>
> >>>> You could try a vanilla 4.19.5 from https://www.kernel.org/
> >>>> and compile it with your current .config.
> >>>
> >>> mainline-ppa use vanilla kernel. Patches only adds debian specific
> >>> build infrastructure.
> >>>
> >>>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> >>>>
> >>>> In addition, if you still see the errors:
> >>>>
> >>>> - backup your .config in a *different* folder (so that you can later re-use
> >>>> it)
> >>>> - do a "make mrproper" (deletes the .config, see above)
> >>>> - do a "make defconfig"
> >>>> - and compile the kernel with that new .config
> >>>
> >>> defconfig is great - for abstract hardware in vacuum.
> >>>
> >>>> If you still have the problem after that, you may want to learn how to bisect.
> >>>> ;)
> >>> I'm already know how-to bisect. From kernel 2.0 era. Without git ;)
> >>>
> >>> This problem simply non-bisectable, when same kernel corrupt FS on my
> >>> workstation but normally working on other servers.
> >>> And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.
> >>
> >> OK, - and now we are looking forward to *your* ideas how to solve this.
> >
> > After four days playing games around git bisect - real winner is
> > debian gcc-8.2.0-9. Upgrade it to 8.2.0-10 or use 7.3.0-30 version for
> > same kernel + config - does not exhibit ext4 corruption.
> >
> > I think I hit this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87859
> > with 8.2.0-9 version.
> >
> Good that it works for you. But others used gcc 5.4.0 or 6.3.0 and were
> hit anyway: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c165

Depends on workload pattern. 4.19.5 built with 8.2.0-10 and 7.3.0-30 -
crashed after 4 hours of usage (previous build crash in 5 min).
So my assumption about broken gcc is wrong.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-12-05 12:58                   ` Andrey Melnikov
@ 2018-12-11  0:11                     ` Pavel Machek
  2018-12-13 10:38                       ` Andrey Jr. Melnikov
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2018-12-11  0:11 UTC (permalink / raw)
  To: Andrey Melnikov; +Cc: jrf, Theodore Ts'o, linux-kernel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

Hi!

> > >> OK, - and now we are looking forward to *your* ideas how to solve this.
> > >
> > > After four days playing games around git bisect - real winner is
> > > debian gcc-8.2.0-9. Upgrade it to 8.2.0-10 or use 7.3.0-30 version for
> > > same kernel + config - does not exhibit ext4 corruption.
> > >
> > > I think I hit this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87859
> > > with 8.2.0-9 version.
> > >
> > Good that it works for you. But others used gcc 5.4.0 or 6.3.0 and were
> > hit anyway: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c165
> 
> Depends on workload pattern. 4.19.5 built with 8.2.0-10 and 7.3.0-30 -
> crashed after 4 hours of usage (previous build crash in 5 min).
> So my assumption about broken gcc is wrong.

Would it be possible to try vanilla 4.19? (Not stable?)

I test vanilla and -next kernels every week or two, and did not have
ext4 problems recently. I guess many kernel developers test mainline
but not stable...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
  2018-12-11  0:11                     ` Pavel Machek
@ 2018-12-13 10:38                       ` Andrey Jr. Melnikov
  0 siblings, 0 replies; 27+ messages in thread
From: Andrey Jr. Melnikov @ 2018-12-13 10:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4

In gmane.comp.file-systems.ext4 Pavel Machek <pavel@ucw.cz> wrote:
> [-- text/plain, кодировка quoted-printable, кодировка: us-ascii, 32 строк --]

> Hi!

> > > >> OK, - and now we are looking forward to *your* ideas how to solve this.
> > > >
> > > > After four days playing games around git bisect - real winner is
> > > > debian gcc-8.2.0-9. Upgrade it to 8.2.0-10 or use 7.3.0-30 version for
> > > > same kernel + config - does not exhibit ext4 corruption.
> > > >
> > > > I think I hit this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87859
> > > > with 8.2.0-9 version.
> > > >
> > > Good that it works for you. But others used gcc 5.4.0 or 6.3.0 and were
> > > hit anyway: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c165
> > 
> > Depends on workload pattern. 4.19.5 built with 8.2.0-10 and 7.3.0-30 -
> > crashed after 4 hours of usage (previous build crash in 5 min).
> > So my assumption about broken gcc is wrong.

> Would it be possible to try vanilla 4.19? (Not stable?)

> I test vanilla and -next kernels every week or two, and did not have
> ext4 problems recently. I guess many kernel developers test mainline
> but not stable...

Fix already commited to stable/vanilla brances:
https://bugzilla.kernel.org/show_bug.cgi?id=201685#c314


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
@ 2018-12-04  7:33 Gunter Königsmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gunter Königsmann @ 2018-12-04  7:33 UTC (permalink / raw)
  To: linux-kernel

After upgrading my kernel to 4.19 I got a corruption on nearly every
reboot or resume from suspend on my Acer s7-391 [UEFI boot].

Going to my UEFI setup and changing IDE mode from IDE to ATA seems to
have resolved the issue for me.

Don't know, though, if that is a valid data point or if it was a mere
accident (tested only on one computer) or just avoids the Bad Timing by
a few nanoseconds....



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
@ 2018-12-03 21:47 Michael Hennig
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Hennig @ 2018-12-03 21:47 UTC (permalink / raw)
  To: linux-kernel

I'm suffering from ext4 corruption too.
(Ubuntu 18.10, 4.19.6-041906-generic, 64 bit)

22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484759: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484759: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484754: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484755: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4851: inode
#42484768: comm rm: checksum invalid
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484767: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484763: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4851: inode
#42484766: comm rm: checksum invalid
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484761: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484764: comm rm: bad extra_isize 58853 (inode size 256)
22:00:01 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484765: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484759: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484754: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484755: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4851: inode
#42484768: comm rm: checksum invalid
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484767: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484763: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4851: inode
#42484766: comm rm: checksum invalid
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484761: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484764: comm rm: bad extra_isize 58853 (inode size 256)
21:00:48 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484765: comm rm: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484762: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484761: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484760: comm rsync: bad extra_isize 54976 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484759: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484758: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484757: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484756: comm rsync: bad extra_isize 63008 (inode size 256)
21:00:32 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484755: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:31 kernel: EXT4-fs error (device sda4): ext4_iget:4831: inode
#42484754: comm rsync: bad extra_isize 58853 (inode size 256)
21:00:31 kernel: EXT4-fs error (device sda4): ext4_iget:4851: inode
#42484753: comm rsync: checksum invalid


Checked some of the defective inodes only to find timeshift back-upped
files through rsync:

debugfs 1.44.4 (18-Aug-2018)
debugfs:  ncheck 42484759
Inode	Pathname
42484759
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxcb1:i386.symbols
42484759
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxcb1:i386.symbols
42484759
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxcb1:i386.symbols
42484759
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxcb1:i386.symbols
debugfs:  ncheck 42484754
Inode	Pathname
debugfs:  ncheck 42484755
Inode	Pathname
42484755
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxcb1:amd64.triggers
42484755
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.triggers
42484755
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.triggers
42484755
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.triggers
debugfs:  ncheck 42484768
Inode	Pathname
42484768
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxft2:amd64.list
42484768
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxft2:amd64.list
42484768
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxft2:amd64.list
42484768
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxft2:amd64.list
debugfs:  ncheck 42484767
Inode	Pathname
42484767
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxft-dev:amd64.md5sums
42484767
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxft-dev:amd64.md5sums
42484767
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxft-dev:amd64.md5sums
42484767
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxft-dev:amd64.md5sums
debugfs:  ncheck 42484763
Inode	Pathname
42484763
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxdo3:amd64.shlibs
42484763
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxdo3:amd64.shlibs
42484763
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxdo3:amd64.shlibs
42484763
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxdo3:amd64.shlibs
debugfs:  ncheck 42484753
Inode	Pathname
42484753
/timeshift/snapshots/2018-11-28_20-00-02/localhost/var/lib/dpkg/info/libxcb1:amd64.shlibs
42484753
/timeshift/snapshots/2018-11-30_18-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.shlibs
42484753
/timeshift/snapshots/2018-12-02_21-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.shlibs
42484753
/timeshift/snapshots/2018-12-01_21-00-01/localhost/var/lib/dpkg/info/libxcb1:amd64.shlibs

Maybe timeshifts rsync is related to this?


Kind regards
Michael Hennig

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ext4 file system corruption with v4.19.3 / v4.19.4
@ 2018-12-01 15:47 Huang Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Huang Yan @ 2018-12-01 15:47 UTC (permalink / raw)
  To: linux-kernel

On Tue, 27 Nov 2018 23:15:55 -0500, Theodore Y. Ts'o wrote:
> I'm trying to figure out common factors from those people who are
> reporting problems.

Hello, I experienced the ext4-randomly-switching-to-read-only issue
with Linux kernel 4.19.5 (from
http://kernel.ubuntu.com/~kernel-ppa/mainline/ ; also with the
previous minor versions) on Ubuntu 18.04 - on two laptops (SATA HDD &
SATA SSD).

I noticed that the only thing I've changed from the default Ubuntu
configuration was enabling *write cache* for the disk with the ext4
root partition. Perhaps this information will help.

Best regards,
Yan Huang (Johnny)

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-12-13 11:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <065643a0-f9aa-a361-715a-03ca978d9228@roeck-us.net>
2018-11-27 14:32 ` ext4 file system corruption with v4.19.3 / v4.19.4 Guenter Roeck
2018-11-27 14:48   ` Marek Habersack
2018-11-27 17:31     ` Guenter Roeck
2018-11-27 18:55     ` Rainer Fiebig
2018-11-27 21:22       ` Guenter Roeck
2018-11-28  1:57         ` Vito Caputo
2018-11-28  9:56         ` Rainer Fiebig
2018-11-27 15:50   ` Rainer Fiebig
2018-11-28  0:16   ` Andrey Jr. Melnikov
2018-11-28  4:15     ` Theodore Y. Ts'o
2018-11-28  8:02       ` Marek Habersack
2018-11-28 10:02       ` Andrey Jr. Melnikov
2018-11-28 15:56         ` Rainer Fiebig
2018-11-28 16:10           ` Theodore Y. Ts'o
2018-11-28 16:18             ` Marek Habersack
2018-11-28 17:01             ` Rainer Fiebig
2018-11-28 21:13           ` Andrey Melnikov
2018-11-28 22:09             ` Rainer Fiebig
2018-12-02 20:19               ` Andrey Melnikov
2018-12-02 22:13                 ` Rainer Fiebig
2018-12-05 12:58                   ` Andrey Melnikov
2018-12-11  0:11                     ` Pavel Machek
2018-12-13 10:38                       ` Andrey Jr. Melnikov
2018-11-28 13:28       ` Andrey Melnikov
2018-12-01 15:47 Huang Yan
2018-12-03 21:47 Michael Hennig
2018-12-04  7:33 Gunter Königsmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).