linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ext4 corruption with VM images as 3 > drop_caches
@ 2020-03-18  3:47 Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-18  3:47 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Ritesh Harjani

Hi,

With new vm install I am finding corruption with the vm image if I
follow up the install with echo 3 > /proc/sys/vm/drop_caches 

The file system reports below error.

Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ...
[    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
done.
[    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
[    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
/sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
[    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00

And debugfs reports

debugfs:  stat <917954>
Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
Size of extra inode fields: 0
Inode checksum: 0x00000000
BLOCKS:
debugfs:  

Bisecting this finds 
Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make dioread_nolock the default")
as bad. If I revert the same on top of linus upstream(fb33c6510d5595144d585aa194d377cf74d31911)
I don't hit the corrupttion anymore.

-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
@ 2020-03-19 13:24 ` Ritesh Harjani
  2020-03-19 16:36 ` Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Ritesh Harjani @ 2020-03-19 13:24 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Aneesh Kumar K.V, Jan Kara



On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
> Hi,
> 
> With new vm install I am finding corruption with the vm image if I
> follow up the install with echo 3 > /proc/sys/vm/drop_caches
> 
> The file system reports below error.
> 
> Begin: Running /scripts/local-bottom ... done.
> Begin: Running /scripts/init-bottom ...
> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
> done.
> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 
> And debugfs reports
> 
> debugfs:  stat <917954>
> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> Size of extra inode fields: 0
> Inode checksum: 0x00000000
> BLOCKS:
> debugfs:
> 
> Bisecting this finds
> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make dioread_nolock the default")
> as bad. If I revert the same on top of linus upstream(fb33c6510d5595144d585aa194d377cf74d31911)
> I don't hit the corrupttion anymore.

Tried replicating this and could easily replicate it on Power box.
I tried to reproduce this on x86 too, but could not reproduce on x86.
Now one difference on Power could be that pagesize is 64K and fs
blocksize is 4K.

The issue looks like the guest qemu image file is not properly written
back, after host does echo 3 > drop_caches. (correct me if this is not
the case).

I tried replicating via below test, but it could not reproduce.

Any idea what kind of unit test could be written for this?
I am not sure how exactly qemu is writing to it's image file.


1. Create 2 files. "mmap-file", "mmap-data".
2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
with both 64KB align and 4KB align offsets), try to write
pagesize/blocksize amount of known data pattern.
3. These offsets (which are pagesize/blocksize align) are recorded into
"mmap-data" file via normal read/write calls.
4. Then after we wrote to both files, we munmap the "mmap-file" and
close both of these files.
5. Then we do echo 3 > drop_caches.
6. Then in the verify phase, using the offsets written in "mmap-data"
file, I read the "mmap-file" to verify if it's contents are proper or
not.
With that could not reproduce this issue.


-ritesh



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
@ 2020-03-19 16:36 ` Jan Kara
  2020-03-20  4:07   ` Aneesh Kumar K.V
  2020-03-20  5:34 ` Ritesh Harjani
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
  3 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2020-03-19 16:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-ext4, Theodore Y. Ts'o, Ritesh Harjani

Hi!

On Wed 18-03-20 09:17:51, Aneesh Kumar K.V wrote:
> With new vm install I am finding corruption with the vm image if I
> follow up the install with echo 3 > /proc/sys/vm/drop_caches 
> 
> The file system reports below error.
> 
> Begin: Running /scripts/local-bottom ... done.
> Begin: Running /scripts/init-bottom ...
> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
> done.
> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 
> And debugfs reports
> 
> debugfs:  stat <917954>
> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> Size of extra inode fields: 0
> Inode checksum: 0x00000000
> BLOCKS:
> debugfs:  
> 
> Bisecting this finds 
> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
> dioread_nolock the default") as bad. If I revert the same on top of linus
> upstream(fb33c6510d5595144d585aa194d377cf74d31911) I don't hit the
> corrupttion anymore.

Thanks for report and the bisection! Is this guest or host kernel that you
were bisecting? I presume host but I want to make sure.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-19 16:36 ` Jan Kara
@ 2020-03-20  4:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-20  4:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Y. Ts'o, Ritesh Harjani

On 3/19/20 10:06 PM, Jan Kara wrote:
> Hi!
> 
> On Wed 18-03-20 09:17:51, Aneesh Kumar K.V wrote:
>> With new vm install I am finding corruption with the vm image if I
>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>
>> The file system reports below error.
>>
>> Begin: Running /scripts/local-bottom ... done.
>> Begin: Running /scripts/init-bottom ...
>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
>> done.
>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
>> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
>> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
>>
>> And debugfs reports
>>
>> debugfs:  stat <917954>
>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>> Generation: 0    Version: 0x00000000
>> User:     0   Group:     0   Size: 0
>> File ACL: 0
>> Links: 0   Blockcount: 0
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> Size of extra inode fields: 0
>> Inode checksum: 0x00000000
>> BLOCKS:
>> debugfs:
>>
>> Bisecting this finds
>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
>> dioread_nolock the default") as bad. If I revert the same on top of linus
>> upstream(fb33c6510d5595144d585aa194d377cf74d31911) I don't hit the
>> corrupttion anymore.
> 
> Thanks for report and the bisection! Is this guest or host kernel that you
> were bisecting? I presume host but I want to make sure.
> 

host kernel. W.r.t guest kernel, it is not dependent on guest kernel 
version. I was able to recreate with different guest kernel versions 
(ubuntu 5.3.0-42-generic kernel and also with upstream)

-aneesh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
  2020-03-19 16:36 ` Jan Kara
@ 2020-03-20  5:34 ` Ritesh Harjani
  2020-03-20 11:49   ` Jan Kara
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
  3 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2020-03-20  5:34 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Aneesh Kumar K.V, Jan Kara



On 3/19/20 6:54 PM, Ritesh Harjani wrote:
> 
> 
> On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
>> Hi,
>>
>> With new vm install I am finding corruption with the vm image if I
>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>
>> The file system reports below error.
>>
>> Begin: Running /scripts/local-bottom ... done.
>> Begin: Running /scripts/init-bottom ...
>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #787185: comm sh: iget: checksum invalid
>> done.
>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> /sbin/init: error while loading shared libraries: libc.so.6: cannot 
>> open shared object file: Error 74
>> [    5.271207] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x00007f00
>>
>> And debugfs reports
>>
>> debugfs:  stat <917954>
>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>> Generation: 0    Version: 0x00000000
>> User:     0   Group:     0   Size: 0
>> File ACL: 0
>> Links: 0   Blockcount: 0
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> Size of extra inode fields: 0
>> Inode checksum: 0x00000000
>> BLOCKS:
>> debugfs:
>>
>> Bisecting this finds
>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make 
>> dioread_nolock the default")
>> as bad. If I revert the same on top of linus 
>> upstream(fb33c6510d5595144d585aa194d377cf74d31911)
>> I don't hit the corrupttion anymore.
> 
> Tried replicating this and could easily replicate it on Power box.
> I tried to reproduce this on x86 too, but could not reproduce on x86.
> Now one difference on Power could be that pagesize is 64K and fs
> blocksize is 4K.
> 
> The issue looks like the guest qemu image file is not properly written
> back, after host does echo 3 > drop_caches. (correct me if this is not
> the case).

Ok. So tried this issue with passing "cache=directsync" parameter to
drive file. This parameter says it should bypass the host side page
cache. With this parameter, I don't see this issue on Power box.

-ritesh


> 
> I tried replicating via below test, but it could not reproduce.
> 
> Any idea what kind of unit test could be written for this?
> I am not sure how exactly qemu is writing to it's image file.
> 
> 
> 1. Create 2 files. "mmap-file", "mmap-data".
> 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
> with both 64KB align and 4KB align offsets), try to write
> pagesize/blocksize amount of known data pattern.
> 3. These offsets (which are pagesize/blocksize align) are recorded into
> "mmap-data" file via normal read/write calls.
> 4. Then after we wrote to both files, we munmap the "mmap-file" and
> close both of these files.
> 5. Then we do echo 3 > drop_caches.
> 6. Then in the verify phase, using the offsets written in "mmap-data"
> file, I read the "mmap-file" to verify if it's contents are proper or
> not.
> With that could not reproduce this issue.
> 
> 
> -ritesh
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-20  5:34 ` Ritesh Harjani
@ 2020-03-20 11:49   ` Jan Kara
  2020-03-21  3:22     ` Ritesh Harjani
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2020-03-20 11:49 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, Theodore Y. Ts'o, Aneesh Kumar K.V, Jan Kara

On Fri 20-03-20 11:04:50, Ritesh Harjani wrote:
> On 3/19/20 6:54 PM, Ritesh Harjani wrote:
> > On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
> > > Hi,
> > > 
> > > With new vm install I am finding corruption with the vm image if I
> > > follow up the install with echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > The file system reports below error.
> > > 
> > > Begin: Running /scripts/local-bottom ... done.
> > > Begin: Running /scripts/init-bottom ...
> > > [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #787185: comm sh: iget: checksum invalid
> > > done.
> > > [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #917954: comm init: iget: checksum invalid
> > > [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #917954: comm init: iget: checksum invalid
> > > /sbin/init: error while loading shared libraries: libc.so.6: cannot
> > > open shared object file: Error 74
> > > [    5.271207] Kernel panic - not syncing: Attempted to kill init!
> > > exitcode=0x00007f00
> > > 
> > > And debugfs reports
> > > 
> > > debugfs:  stat <917954>
> > > Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> > > Generation: 0    Version: 0x00000000
> > > User:     0   Group:     0   Size: 0
> > > File ACL: 0
> > > Links: 0   Blockcount: 0
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > Size of extra inode fields: 0
> > > Inode checksum: 0x00000000
> > > BLOCKS:
> > > debugfs:
> > > 
> > > Bisecting this finds
> > > Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
> > > dioread_nolock the default")
> > > as bad. If I revert the same on top of linus
> > > upstream(fb33c6510d5595144d585aa194d377cf74d31911)
> > > I don't hit the corrupttion anymore.
> > 
> > Tried replicating this and could easily replicate it on Power box.
> > I tried to reproduce this on x86 too, but could not reproduce on x86.
> > Now one difference on Power could be that pagesize is 64K and fs
> > blocksize is 4K.
> > 
> > The issue looks like the guest qemu image file is not properly written
> > back, after host does echo 3 > drop_caches. (correct me if this is not
> > the case).
> 
> Ok. So tried this issue with passing "cache=directsync" parameter to
> drive file. This parameter says it should bypass the host side page
> cache. With this parameter, I don't see this issue on Power box.

OK, so this likely means that there is something hosed in the writeback
path using unwritten extents when blocksize < pagesize. Maybe we miss some
conversion of unwritten extent to a written one and thus after dropping
caches we effectively loose data?

								Honza

> > I tried replicating via below test, but it could not reproduce.
> > 
> > Any idea what kind of unit test could be written for this?
> > I am not sure how exactly qemu is writing to it's image file.
> > 
> > 
> > 1. Create 2 files. "mmap-file", "mmap-data".
> > 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried
> > with both 64KB align and 4KB align offsets), try to write
> > pagesize/blocksize amount of known data pattern.
> > 3. These offsets (which are pagesize/blocksize align) are recorded into
> > "mmap-data" file via normal read/write calls.
> > 4. Then after we wrote to both files, we munmap the "mmap-file" and
> > close both of these files.
> > 5. Then we do echo 3 > drop_caches.
> > 6. Then in the verify phase, using the offsets written in "mmap-data"
> > file, I read the "mmap-file" to verify if it's contents are proper or
> > not.
> > With that could not reproduce this issue.
> > 
> > 
> > -ritesh
> > 
> > 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-20 11:49   ` Jan Kara
@ 2020-03-21  3:22     ` Ritesh Harjani
  0 siblings, 0 replies; 15+ messages in thread
From: Ritesh Harjani @ 2020-03-21  3:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Y. Ts'o, Aneesh Kumar K.V



On 3/20/20 5:19 PM, Jan Kara wrote:
> On Fri 20-03-20 11:04:50, Ritesh Harjani wrote:
>> On 3/19/20 6:54 PM, Ritesh Harjani wrote:
>>> On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
>>>> Hi,
>>>>
>>>> With new vm install I am finding corruption with the vm image if I
>>>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>>>
>>>> The file system reports below error.
>>>>
>>>> Begin: Running /scripts/local-bottom ... done.
>>>> Begin: Running /scripts/init-bottom ...
>>>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #787185: comm sh: iget: checksum invalid
>>>> done.
>>>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #917954: comm init: iget: checksum invalid
>>>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #917954: comm init: iget: checksum invalid
>>>> /sbin/init: error while loading shared libraries: libc.so.6: cannot
>>>> open shared object file: Error 74
>>>> [    5.271207] Kernel panic - not syncing: Attempted to kill init!
>>>> exitcode=0x00007f00
>>>>
>>>> And debugfs reports
>>>>
>>>> debugfs:  stat <917954>
>>>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>>>> Generation: 0    Version: 0x00000000
>>>> User:     0   Group:     0   Size: 0
>>>> File ACL: 0
>>>> Links: 0   Blockcount: 0
>>>> Fragment:  Address: 0    Number: 0    Size: 0
>>>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> Size of extra inode fields: 0
>>>> Inode checksum: 0x00000000
>>>> BLOCKS:
>>>> debugfs:
>>>>
>>>> Bisecting this finds
>>>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
>>>> dioread_nolock the default")
>>>> as bad. If I revert the same on top of linus
>>>> upstream(fb33c6510d5595144d585aa194d377cf74d31911)
>>>> I don't hit the corrupttion anymore.
>>>
>>> Tried replicating this and could easily replicate it on Power box.
>>> I tried to reproduce this on x86 too, but could not reproduce on x86.
>>> Now one difference on Power could be that pagesize is 64K and fs
>>> blocksize is 4K.
>>>
>>> The issue looks like the guest qemu image file is not properly written
>>> back, after host does echo 3 > drop_caches. (correct me if this is not
>>> the case).
>>
>> Ok. So tried this issue with passing "cache=directsync" parameter to
>> drive file. This parameter says it should bypass the host side page
>> cache. With this parameter, I don't see this issue on Power box.
> 
> OK, so this likely means that there is something hosed in the writeback
> path using unwritten extents when blocksize < pagesize. Maybe we miss some
> conversion of unwritten extent to a written one and thus after dropping
> caches we effectively loose data?
> 

Yes, that seems like it. I will try and create a small test case
considering this. Also will go over the unwritten to written path and
check what did I miss there.

Thanks
ritesh





> 
>>> I tried replicating via below test, but it could not reproduce.
>>>
>>> Any idea what kind of unit test could be written for this?
>>> I am not sure how exactly qemu is writing to it's image file.
>>>
>>>
>>> 1. Create 2 files. "mmap-file", "mmap-data".
>>> 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried
>>> with both 64KB align and 4KB align offsets), try to write
>>> pagesize/blocksize amount of known data pattern.
>>> 3. These offsets (which are pagesize/blocksize align) are recorded into
>>> "mmap-data" file via normal read/write calls.
>>> 4. Then after we wrote to both files, we munmap the "mmap-file" and
>>> close both of these files.
>>> 5. Then we do echo 3 > drop_caches.
>>> 6. Then in the verify phase, using the offsets written in "mmap-data"
>>> file, I read the "mmap-file" to verify if it's contents are proper or
>>> not.
>>> With that could not reproduce this issue.
>>>
>>>
>>> -ritesh
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-03-20  5:34 ` Ritesh Harjani
@ 2020-03-27 20:07 ` Ritesh Harjani
  2020-03-29  2:17   ` Theodore Y. Ts'o
  3 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2020-03-27 20:07 UTC (permalink / raw)
  To: linux-ext4, Theodore Y . Ts'o
  Cc: Jan Kara, Ritesh Harjani, Aneesh Kumar K . V

Currently on calling echo 3 > drop_caches on host machine, we see
FS corruption in the guest. This happens on Power machine where
blocksize < pagesize.

So as a temporary workaound don't enable dioread_nolock by default
for blocksize < pagesize until we identify the root cause.

Also emit a warning msg in case if this mount option is manually
enabled for blocksize < pagesize.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/super.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 27ab130a40d1..6873d9ffa352 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2186,6 +2186,14 @@ static int parse_options(char *options, struct super_block *sb,
 		}
 	}
 #endif
+	if (test_opt(sb, DIOREAD_NOLOCK)) {
+		int blocksize =
+			BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
+		if (blocksize < PAGE_SIZE)
+			ext4_msg(sb, KERN_WARNING, "Warning: mounting with an "
+				 "experimental mount option 'dioread_nolock' "
+				 "for blocksize < PAGE_SIZE");
+	}
 	return 1;
 }
 
@@ -3792,7 +3800,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		set_opt(sb, NO_UID32);
 	/* xattr user namespace & acls are now defaulted on */
 	set_opt(sb, XATTR_USER);
-	set_opt(sb, DIOREAD_NOLOCK);
 #ifdef CONFIG_EXT4_FS_POSIX_ACL
 	set_opt(sb, POSIX_ACL);
 #endif
@@ -3842,6 +3849,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_li_wait_mult = EXT4_DEF_LI_WAIT_MULT;
 
 	blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size);
+
+	if (blocksize == PAGE_SIZE)
+		set_opt(sb, DIOREAD_NOLOCK);
+
 	if (blocksize < EXT4_MIN_BLOCK_SIZE ||
 	    blocksize > EXT4_MAX_BLOCK_SIZE) {
 		ext4_msg(sb, KERN_ERR,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
@ 2020-03-29  2:17   ` Theodore Y. Ts'o
  2020-05-11  8:07     ` Ritesh Harjani
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-29  2:17 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: linux-ext4, Jan Kara, Aneesh Kumar K . V

On Sat, Mar 28, 2020 at 01:37:44AM +0530, Ritesh Harjani wrote:
> Currently on calling echo 3 > drop_caches on host machine, we see
> FS corruption in the guest. This happens on Power machine where
> blocksize < pagesize.
> 
> So as a temporary workaound don't enable dioread_nolock by default
> for blocksize < pagesize until we identify the root cause.
> 
> Also emit a warning msg in case if this mount option is manually
> enabled for blocksize < pagesize.
> 
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-03-29  2:17   ` Theodore Y. Ts'o
@ 2020-05-11  8:07     ` Ritesh Harjani
  2020-05-12 11:45       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2020-05-11  8:07 UTC (permalink / raw)
  To: stable
  Cc: Theodore Y. Ts'o, linux-ext4, Jan Kara, Aneesh Kumar K . V,
	Sasha Levin

Hello stable-list,

I think this subjected patch [1] missed the below fixes tag.
I guess the subjected patch is only picked for 5.7. And
AFAIU, this patch will be needed for 5.6 as well.

Could you please do the needful.

Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1


-ritesh

On 3/29/20 7:47 AM, Theodore Y. Ts'o wrote:
> On Sat, Mar 28, 2020 at 01:37:44AM +0530, Ritesh Harjani wrote:
>> Currently on calling echo 3 > drop_caches on host machine, we see
>> FS corruption in the guest. This happens on Power machine where
>> blocksize < pagesize.
>>
>> So as a temporary workaound don't enable dioread_nolock by default
>> for blocksize < pagesize until we identify the root cause.
>>
>> Also emit a warning msg in case if this mount option is manually
>> enabled for blocksize < pagesize.
>>
>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> Thanks, applied.
> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-05-11  8:07     ` Ritesh Harjani
@ 2020-05-12 11:45       ` Greg KH
  2020-05-12 12:50         ` Ritesh Harjani
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2020-05-12 11:45 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: stable, Theodore Y. Ts'o, linux-ext4, Jan Kara,
	Aneesh Kumar K . V, Sasha Levin

On Mon, May 11, 2020 at 01:37:59PM +0530, Ritesh Harjani wrote:
> Hello stable-list,
> 
> I think this subjected patch [1] missed the below fixes tag.
> I guess the subjected patch is only picked for 5.7. And
> AFAIU, this patch will be needed for 5.6 as well.
> 
> Could you please do the needful.
> 
> Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1

This patch does not apply to the 5.6 kernel tree at all.  Please provide
a working backport if you wish to see it present there.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-05-12 11:45       ` Greg KH
@ 2020-05-12 12:50         ` Ritesh Harjani
  2020-05-12 12:59           ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2020-05-12 12:50 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Theodore Y. Ts'o, linux-ext4, Jan Kara,
	Aneesh Kumar K . V, Sasha Levin

Hello Greg,

On 5/12/20 5:15 PM, Greg KH wrote:
> On Mon, May 11, 2020 at 01:37:59PM +0530, Ritesh Harjani wrote:
>> Hello stable-list,
>>
>> I think this subjected patch [1] missed the below fixes tag.
>> I guess the subjected patch is only picked for 5.7. And
>> AFAIU, this patch will be needed for 5.6 as well.
>>
>> Could you please do the needful.
>>
>> Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)
>>
>> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1
> 
> This patch does not apply to the 5.6 kernel tree at all.  Please provide
> a working backport if you wish to see it present there.

Sorry if that's the case.
I tried both "git cherry-pick" and "git am" with patch mentioned @ [1]
to apply on branch "remotes/linux-stable/linux-5.6.y" of tree
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
and it applied cleanly.

Also, just noticed this patch in the queue. Is it that maybe you are
trying to apply it twice?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.6/ext4-don-t-set-dioread_nolock-by-default-for-blocksi.patch

Do let me know if I am missing anything here.

-ritesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-05-12 12:50         ` Ritesh Harjani
@ 2020-05-12 12:59           ` Greg KH
  2020-05-12 14:13             ` Sasha Levin
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2020-05-12 12:59 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: stable, Theodore Y. Ts'o, linux-ext4, Jan Kara,
	Aneesh Kumar K . V, Sasha Levin

On Tue, May 12, 2020 at 06:20:05PM +0530, Ritesh Harjani wrote:
> Hello Greg,
> 
> On 5/12/20 5:15 PM, Greg KH wrote:
> > On Mon, May 11, 2020 at 01:37:59PM +0530, Ritesh Harjani wrote:
> > > Hello stable-list,
> > > 
> > > I think this subjected patch [1] missed the below fixes tag.
> > > I guess the subjected patch is only picked for 5.7. And
> > > AFAIU, this patch will be needed for 5.6 as well.
> > > 
> > > Could you please do the needful.
> > > 
> > > Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)
> > > 
> > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1
> > 
> > This patch does not apply to the 5.6 kernel tree at all.  Please provide
> > a working backport if you wish to see it present there.
> 
> Sorry if that's the case.
> I tried both "git cherry-pick" and "git am" with patch mentioned @ [1]
> to apply on branch "remotes/linux-stable/linux-5.6.y" of tree
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> and it applied cleanly.
> 
> Also, just noticed this patch in the queue. Is it that maybe you are
> trying to apply it twice?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.6/ext4-don-t-set-dioread_nolock-by-default-for-blocksi.patch

Odd, it didn't have the "upstream" commit id, which is why I didn't see
that it was applied already.

Sasha, something went wrong with your scripts, you didn't sign-off on it
either :(

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-05-12 12:59           ` Greg KH
@ 2020-05-12 14:13             ` Sasha Levin
  2020-05-12 16:12               ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2020-05-12 14:13 UTC (permalink / raw)
  To: Greg KH
  Cc: Ritesh Harjani, stable, Theodore Y. Ts'o, linux-ext4,
	Jan Kara, Aneesh Kumar K . V

On Tue, May 12, 2020 at 02:59:31PM +0200, Greg KH wrote:
>On Tue, May 12, 2020 at 06:20:05PM +0530, Ritesh Harjani wrote:
>> Hello Greg,
>>
>> On 5/12/20 5:15 PM, Greg KH wrote:
>> > On Mon, May 11, 2020 at 01:37:59PM +0530, Ritesh Harjani wrote:
>> > > Hello stable-list,
>> > >
>> > > I think this subjected patch [1] missed the below fixes tag.
>> > > I guess the subjected patch is only picked for 5.7. And
>> > > AFAIU, this patch will be needed for 5.6 as well.
>> > >
>> > > Could you please do the needful.
>> > >
>> > > Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)
>> > >
>> > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1
>> >
>> > This patch does not apply to the 5.6 kernel tree at all.  Please provide
>> > a working backport if you wish to see it present there.
>>
>> Sorry if that's the case.
>> I tried both "git cherry-pick" and "git am" with patch mentioned @ [1]
>> to apply on branch "remotes/linux-stable/linux-5.6.y" of tree
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>> and it applied cleanly.
>>
>> Also, just noticed this patch in the queue. Is it that maybe you are
>> trying to apply it twice?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.6/ext4-don-t-set-dioread_nolock-by-default-for-blocksi.patch
>
>Odd, it didn't have the "upstream" commit id, which is why I didn't see
>that it was applied already.
>
>Sasha, something went wrong with your scripts, you didn't sign-off on it
>either :(

Crap, sorry. I'll fix it up.

I'm testing out a new script that integrates the dependency mappings I
have with the rest of the script, and it looks like there are some
quirks I need to deal with.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-05-12 14:13             ` Sasha Levin
@ 2020-05-12 16:12               ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2020-05-12 16:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ritesh Harjani, stable, Theodore Y. Ts'o, linux-ext4,
	Jan Kara, Aneesh Kumar K . V

On Tue, May 12, 2020 at 10:13:12AM -0400, Sasha Levin wrote:
> On Tue, May 12, 2020 at 02:59:31PM +0200, Greg KH wrote:
> > On Tue, May 12, 2020 at 06:20:05PM +0530, Ritesh Harjani wrote:
> > > Hello Greg,
> > > 
> > > On 5/12/20 5:15 PM, Greg KH wrote:
> > > > On Mon, May 11, 2020 at 01:37:59PM +0530, Ritesh Harjani wrote:
> > > > > Hello stable-list,
> > > > >
> > > > > I think this subjected patch [1] missed the below fixes tag.
> > > > > I guess the subjected patch is only picked for 5.7. And
> > > > > AFAIU, this patch will be needed for 5.6 as well.
> > > > >
> > > > > Could you please do the needful.
> > > > >
> > > > > Fixes: 244adf6426ee31a (ext4: make dioread_nolock the default)
> > > > >
> > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=626b035b816b61a7a7b4d2205a6807e2f11a18c1
> > > >
> > > > This patch does not apply to the 5.6 kernel tree at all.  Please provide
> > > > a working backport if you wish to see it present there.
> > > 
> > > Sorry if that's the case.
> > > I tried both "git cherry-pick" and "git am" with patch mentioned @ [1]
> > > to apply on branch "remotes/linux-stable/linux-5.6.y" of tree
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> > > and it applied cleanly.
> > > 
> > > Also, just noticed this patch in the queue. Is it that maybe you are
> > > trying to apply it twice?
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.6/ext4-don-t-set-dioread_nolock-by-default-for-blocksi.patch
> > 
> > Odd, it didn't have the "upstream" commit id, which is why I didn't see
> > that it was applied already.
> > 
> > Sasha, something went wrong with your scripts, you didn't sign-off on it
> > either :(
> 
> Crap, sorry. I'll fix it up.

I already did :)


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-12 16:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
2020-03-19 13:24 ` Ritesh Harjani
2020-03-19 16:36 ` Jan Kara
2020-03-20  4:07   ` Aneesh Kumar K.V
2020-03-20  5:34 ` Ritesh Harjani
2020-03-20 11:49   ` Jan Kara
2020-03-21  3:22     ` Ritesh Harjani
2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
2020-03-29  2:17   ` Theodore Y. Ts'o
2020-05-11  8:07     ` Ritesh Harjani
2020-05-12 11:45       ` Greg KH
2020-05-12 12:50         ` Ritesh Harjani
2020-05-12 12:59           ` Greg KH
2020-05-12 14:13             ` Sasha Levin
2020-05-12 16:12               ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).