Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* Ext4 corruption with VM images as 3 > drop_caches
@ 2020-03-18  3:47 Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-18  3:47 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Ritesh Harjani

Hi,

With new vm install I am finding corruption with the vm image if I
follow up the install with echo 3 > /proc/sys/vm/drop_caches 

The file system reports below error.

Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ...
[    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
done.
[    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
[    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
/sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
[    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00

And debugfs reports

debugfs:  stat <917954>
Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
Size of extra inode fields: 0
Inode checksum: 0x00000000
BLOCKS:
debugfs:  

Bisecting this finds 
Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make dioread_nolock the default")
as bad. If I revert the same on top of linus upstream(fb33c6510d5595144d585aa194d377cf74d31911)
I don't hit the corrupttion anymore.

-aneesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
@ 2020-03-19 13:24 ` Ritesh Harjani
  2020-03-19 16:36 ` Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Ritesh Harjani @ 2020-03-19 13:24 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Aneesh Kumar K.V, Jan Kara



On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
> Hi,
> 
> With new vm install I am finding corruption with the vm image if I
> follow up the install with echo 3 > /proc/sys/vm/drop_caches
> 
> The file system reports below error.
> 
> Begin: Running /scripts/local-bottom ... done.
> Begin: Running /scripts/init-bottom ...
> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
> done.
> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 
> And debugfs reports
> 
> debugfs:  stat <917954>
> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> Size of extra inode fields: 0
> Inode checksum: 0x00000000
> BLOCKS:
> debugfs:
> 
> Bisecting this finds
> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make dioread_nolock the default")
> as bad. If I revert the same on top of linus upstream(fb33c6510d5595144d585aa194d377cf74d31911)
> I don't hit the corrupttion anymore.

Tried replicating this and could easily replicate it on Power box.
I tried to reproduce this on x86 too, but could not reproduce on x86.
Now one difference on Power could be that pagesize is 64K and fs
blocksize is 4K.

The issue looks like the guest qemu image file is not properly written
back, after host does echo 3 > drop_caches. (correct me if this is not
the case).

I tried replicating via below test, but it could not reproduce.

Any idea what kind of unit test could be written for this?
I am not sure how exactly qemu is writing to it's image file.


1. Create 2 files. "mmap-file", "mmap-data".
2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
with both 64KB align and 4KB align offsets), try to write
pagesize/blocksize amount of known data pattern.
3. These offsets (which are pagesize/blocksize align) are recorded into
"mmap-data" file via normal read/write calls.
4. Then after we wrote to both files, we munmap the "mmap-file" and
close both of these files.
5. Then we do echo 3 > drop_caches.
6. Then in the verify phase, using the offsets written in "mmap-data"
file, I read the "mmap-file" to verify if it's contents are proper or
not.
With that could not reproduce this issue.


-ritesh



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
@ 2020-03-19 16:36 ` Jan Kara
  2020-03-20  4:07   ` Aneesh Kumar K.V
  2020-03-20  5:34 ` Ritesh Harjani
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
  3 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2020-03-19 16:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-ext4, Theodore Y. Ts'o, Ritesh Harjani

Hi!

On Wed 18-03-20 09:17:51, Aneesh Kumar K.V wrote:
> With new vm install I am finding corruption with the vm image if I
> follow up the install with echo 3 > /proc/sys/vm/drop_caches 
> 
> The file system reports below error.
> 
> Begin: Running /scripts/local-bottom ... done.
> Begin: Running /scripts/init-bottom ...
> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
> done.
> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 
> And debugfs reports
> 
> debugfs:  stat <917954>
> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> Size of extra inode fields: 0
> Inode checksum: 0x00000000
> BLOCKS:
> debugfs:  
> 
> Bisecting this finds 
> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
> dioread_nolock the default") as bad. If I revert the same on top of linus
> upstream(fb33c6510d5595144d585aa194d377cf74d31911) I don't hit the
> corrupttion anymore.

Thanks for report and the bisection! Is this guest or host kernel that you
were bisecting? I presume host but I want to make sure.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-19 16:36 ` Jan Kara
@ 2020-03-20  4:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-20  4:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Y. Ts'o, Ritesh Harjani

On 3/19/20 10:06 PM, Jan Kara wrote:
> Hi!
> 
> On Wed 18-03-20 09:17:51, Aneesh Kumar K.V wrote:
>> With new vm install I am finding corruption with the vm image if I
>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>
>> The file system reports below error.
>>
>> Begin: Running /scripts/local-bottom ... done.
>> Begin: Running /scripts/init-bottom ...
>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
>> done.
>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
>> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
>> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
>>
>> And debugfs reports
>>
>> debugfs:  stat <917954>
>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>> Generation: 0    Version: 0x00000000
>> User:     0   Group:     0   Size: 0
>> File ACL: 0
>> Links: 0   Blockcount: 0
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> Size of extra inode fields: 0
>> Inode checksum: 0x00000000
>> BLOCKS:
>> debugfs:
>>
>> Bisecting this finds
>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
>> dioread_nolock the default") as bad. If I revert the same on top of linus
>> upstream(fb33c6510d5595144d585aa194d377cf74d31911) I don't hit the
>> corrupttion anymore.
> 
> Thanks for report and the bisection! Is this guest or host kernel that you
> were bisecting? I presume host but I want to make sure.
> 

host kernel. W.r.t guest kernel, it is not dependent on guest kernel 
version. I was able to recreate with different guest kernel versions 
(ubuntu 5.3.0-42-generic kernel and also with upstream)

-aneesh


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
  2020-03-19 13:24 ` Ritesh Harjani
  2020-03-19 16:36 ` Jan Kara
@ 2020-03-20  5:34 ` Ritesh Harjani
  2020-03-20 11:49   ` Jan Kara
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
  3 siblings, 1 reply; 9+ messages in thread
From: Ritesh Harjani @ 2020-03-20  5:34 UTC (permalink / raw)
  To: linux-ext4, Theodore Y. Ts'o; +Cc: Aneesh Kumar K.V, Jan Kara



On 3/19/20 6:54 PM, Ritesh Harjani wrote:
> 
> 
> On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
>> Hi,
>>
>> With new vm install I am finding corruption with the vm image if I
>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>
>> The file system reports below error.
>>
>> Begin: Running /scripts/local-bottom ... done.
>> Begin: Running /scripts/init-bottom ...
>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #787185: comm sh: iget: checksum invalid
>> done.
>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> /sbin/init: error while loading shared libraries: libc.so.6: cannot 
>> open shared object file: Error 74
>> [    5.271207] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x00007f00
>>
>> And debugfs reports
>>
>> debugfs:  stat <917954>
>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>> Generation: 0    Version: 0x00000000
>> User:     0   Group:     0   Size: 0
>> File ACL: 0
>> Links: 0   Blockcount: 0
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> Size of extra inode fields: 0
>> Inode checksum: 0x00000000
>> BLOCKS:
>> debugfs:
>>
>> Bisecting this finds
>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make 
>> dioread_nolock the default")
>> as bad. If I revert the same on top of linus 
>> upstream(fb33c6510d5595144d585aa194d377cf74d31911)
>> I don't hit the corrupttion anymore.
> 
> Tried replicating this and could easily replicate it on Power box.
> I tried to reproduce this on x86 too, but could not reproduce on x86.
> Now one difference on Power could be that pagesize is 64K and fs
> blocksize is 4K.
> 
> The issue looks like the guest qemu image file is not properly written
> back, after host does echo 3 > drop_caches. (correct me if this is not
> the case).

Ok. So tried this issue with passing "cache=directsync" parameter to
drive file. This parameter says it should bypass the host side page
cache. With this parameter, I don't see this issue on Power box.

-ritesh


> 
> I tried replicating via below test, but it could not reproduce.
> 
> Any idea what kind of unit test could be written for this?
> I am not sure how exactly qemu is writing to it's image file.
> 
> 
> 1. Create 2 files. "mmap-file", "mmap-data".
> 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
> with both 64KB align and 4KB align offsets), try to write
> pagesize/blocksize amount of known data pattern.
> 3. These offsets (which are pagesize/blocksize align) are recorded into
> "mmap-data" file via normal read/write calls.
> 4. Then after we wrote to both files, we munmap the "mmap-file" and
> close both of these files.
> 5. Then we do echo 3 > drop_caches.
> 6. Then in the verify phase, using the offsets written in "mmap-data"
> file, I read the "mmap-file" to verify if it's contents are proper or
> not.
> With that could not reproduce this issue.
> 
> 
> -ritesh
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-20  5:34 ` Ritesh Harjani
@ 2020-03-20 11:49   ` Jan Kara
  2020-03-21  3:22     ` Ritesh Harjani
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2020-03-20 11:49 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, Theodore Y. Ts'o, Aneesh Kumar K.V, Jan Kara

On Fri 20-03-20 11:04:50, Ritesh Harjani wrote:
> On 3/19/20 6:54 PM, Ritesh Harjani wrote:
> > On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
> > > Hi,
> > > 
> > > With new vm install I am finding corruption with the vm image if I
> > > follow up the install with echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > The file system reports below error.
> > > 
> > > Begin: Running /scripts/local-bottom ... done.
> > > Begin: Running /scripts/init-bottom ...
> > > [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #787185: comm sh: iget: checksum invalid
> > > done.
> > > [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #917954: comm init: iget: checksum invalid
> > > [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode
> > > #917954: comm init: iget: checksum invalid
> > > /sbin/init: error while loading shared libraries: libc.so.6: cannot
> > > open shared object file: Error 74
> > > [    5.271207] Kernel panic - not syncing: Attempted to kill init!
> > > exitcode=0x00007f00
> > > 
> > > And debugfs reports
> > > 
> > > debugfs:  stat <917954>
> > > Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> > > Generation: 0    Version: 0x00000000
> > > User:     0   Group:     0   Size: 0
> > > File ACL: 0
> > > Links: 0   Blockcount: 0
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> > > Size of extra inode fields: 0
> > > Inode checksum: 0x00000000
> > > BLOCKS:
> > > debugfs:
> > > 
> > > Bisecting this finds
> > > Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
> > > dioread_nolock the default")
> > > as bad. If I revert the same on top of linus
> > > upstream(fb33c6510d5595144d585aa194d377cf74d31911)
> > > I don't hit the corrupttion anymore.
> > 
> > Tried replicating this and could easily replicate it on Power box.
> > I tried to reproduce this on x86 too, but could not reproduce on x86.
> > Now one difference on Power could be that pagesize is 64K and fs
> > blocksize is 4K.
> > 
> > The issue looks like the guest qemu image file is not properly written
> > back, after host does echo 3 > drop_caches. (correct me if this is not
> > the case).
> 
> Ok. So tried this issue with passing "cache=directsync" parameter to
> drive file. This parameter says it should bypass the host side page
> cache. With this parameter, I don't see this issue on Power box.

OK, so this likely means that there is something hosed in the writeback
path using unwritten extents when blocksize < pagesize. Maybe we miss some
conversion of unwritten extent to a written one and thus after dropping
caches we effectively loose data?

								Honza

> > I tried replicating via below test, but it could not reproduce.
> > 
> > Any idea what kind of unit test could be written for this?
> > I am not sure how exactly qemu is writing to it's image file.
> > 
> > 
> > 1. Create 2 files. "mmap-file", "mmap-data".
> > 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried
> > with both 64KB align and 4KB align offsets), try to write
> > pagesize/blocksize amount of known data pattern.
> > 3. These offsets (which are pagesize/blocksize align) are recorded into
> > "mmap-data" file via normal read/write calls.
> > 4. Then after we wrote to both files, we munmap the "mmap-file" and
> > close both of these files.
> > 5. Then we do echo 3 > drop_caches.
> > 6. Then in the verify phase, using the offsets written in "mmap-data"
> > file, I read the "mmap-file" to verify if it's contents are proper or
> > not.
> > With that could not reproduce this issue.
> > 
> > 
> > -ritesh
> > 
> > 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 corruption with VM images as 3 > drop_caches
  2020-03-20 11:49   ` Jan Kara
@ 2020-03-21  3:22     ` Ritesh Harjani
  0 siblings, 0 replies; 9+ messages in thread
From: Ritesh Harjani @ 2020-03-21  3:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Y. Ts'o, Aneesh Kumar K.V



On 3/20/20 5:19 PM, Jan Kara wrote:
> On Fri 20-03-20 11:04:50, Ritesh Harjani wrote:
>> On 3/19/20 6:54 PM, Ritesh Harjani wrote:
>>> On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
>>>> Hi,
>>>>
>>>> With new vm install I am finding corruption with the vm image if I
>>>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>>>
>>>> The file system reports below error.
>>>>
>>>> Begin: Running /scripts/local-bottom ... done.
>>>> Begin: Running /scripts/init-bottom ...
>>>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #787185: comm sh: iget: checksum invalid
>>>> done.
>>>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #917954: comm init: iget: checksum invalid
>>>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode
>>>> #917954: comm init: iget: checksum invalid
>>>> /sbin/init: error while loading shared libraries: libc.so.6: cannot
>>>> open shared object file: Error 74
>>>> [    5.271207] Kernel panic - not syncing: Attempted to kill init!
>>>> exitcode=0x00007f00
>>>>
>>>> And debugfs reports
>>>>
>>>> debugfs:  stat <917954>
>>>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>>>> Generation: 0    Version: 0x00000000
>>>> User:     0   Group:     0   Size: 0
>>>> File ACL: 0
>>>> Links: 0   Blockcount: 0
>>>> Fragment:  Address: 0    Number: 0    Size: 0
>>>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>>>> Size of extra inode fields: 0
>>>> Inode checksum: 0x00000000
>>>> BLOCKS:
>>>> debugfs:
>>>>
>>>> Bisecting this finds
>>>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
>>>> dioread_nolock the default")
>>>> as bad. If I revert the same on top of linus
>>>> upstream(fb33c6510d5595144d585aa194d377cf74d31911)
>>>> I don't hit the corrupttion anymore.
>>>
>>> Tried replicating this and could easily replicate it on Power box.
>>> I tried to reproduce this on x86 too, but could not reproduce on x86.
>>> Now one difference on Power could be that pagesize is 64K and fs
>>> blocksize is 4K.
>>>
>>> The issue looks like the guest qemu image file is not properly written
>>> back, after host does echo 3 > drop_caches. (correct me if this is not
>>> the case).
>>
>> Ok. So tried this issue with passing "cache=directsync" parameter to
>> drive file. This parameter says it should bypass the host side page
>> cache. With this parameter, I don't see this issue on Power box.
> 
> OK, so this likely means that there is something hosed in the writeback
> path using unwritten extents when blocksize < pagesize. Maybe we miss some
> conversion of unwritten extent to a written one and thus after dropping
> caches we effectively loose data?
> 

Yes, that seems like it. I will try and create a small test case
considering this. Also will go over the unwritten to written path and
check what did I miss there.

Thanks
ritesh





> 
>>> I tried replicating via below test, but it could not reproduce.
>>>
>>> Any idea what kind of unit test could be written for this?
>>> I am not sure how exactly qemu is writing to it's image file.
>>>
>>>
>>> 1. Create 2 files. "mmap-file", "mmap-data".
>>> 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried
>>> with both 64KB align and 4KB align offsets), try to write
>>> pagesize/blocksize amount of known data pattern.
>>> 3. These offsets (which are pagesize/blocksize align) are recorded into
>>> "mmap-data" file via normal read/write calls.
>>> 4. Then after we wrote to both files, we munmap the "mmap-file" and
>>> close both of these files.
>>> 5. Then we do echo 3 > drop_caches.
>>> 6. Then in the verify phase, using the offsets written in "mmap-data"
>>> file, I read the "mmap-file" to verify if it's contents are proper or
>>> not.
>>> With that could not reproduce this issue.
>>>
>>>
>>> -ritesh
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-03-20  5:34 ` Ritesh Harjani
@ 2020-03-27 20:07 ` Ritesh Harjani
  2020-03-29  2:17   ` Theodore Y. Ts'o
  3 siblings, 1 reply; 9+ messages in thread
From: Ritesh Harjani @ 2020-03-27 20:07 UTC (permalink / raw)
  To: linux-ext4, Theodore Y . Ts'o
  Cc: Jan Kara, Ritesh Harjani, Aneesh Kumar K . V

Currently on calling echo 3 > drop_caches on host machine, we see
FS corruption in the guest. This happens on Power machine where
blocksize < pagesize.

So as a temporary workaound don't enable dioread_nolock by default
for blocksize < pagesize until we identify the root cause.

Also emit a warning msg in case if this mount option is manually
enabled for blocksize < pagesize.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/super.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 27ab130a40d1..6873d9ffa352 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2186,6 +2186,14 @@ static int parse_options(char *options, struct super_block *sb,
 		}
 	}
 #endif
+	if (test_opt(sb, DIOREAD_NOLOCK)) {
+		int blocksize =
+			BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
+		if (blocksize < PAGE_SIZE)
+			ext4_msg(sb, KERN_WARNING, "Warning: mounting with an "
+				 "experimental mount option 'dioread_nolock' "
+				 "for blocksize < PAGE_SIZE");
+	}
 	return 1;
 }
 
@@ -3792,7 +3800,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		set_opt(sb, NO_UID32);
 	/* xattr user namespace & acls are now defaulted on */
 	set_opt(sb, XATTR_USER);
-	set_opt(sb, DIOREAD_NOLOCK);
 #ifdef CONFIG_EXT4_FS_POSIX_ACL
 	set_opt(sb, POSIX_ACL);
 #endif
@@ -3842,6 +3849,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_li_wait_mult = EXT4_DEF_LI_WAIT_MULT;
 
 	blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size);
+
+	if (blocksize == PAGE_SIZE)
+		set_opt(sb, DIOREAD_NOLOCK);
+
 	if (blocksize < EXT4_MIN_BLOCK_SIZE ||
 	    blocksize > EXT4_MAX_BLOCK_SIZE) {
 		ext4_msg(sb, KERN_ERR,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize
  2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
@ 2020-03-29  2:17   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-29  2:17 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: linux-ext4, Jan Kara, Aneesh Kumar K . V

On Sat, Mar 28, 2020 at 01:37:44AM +0530, Ritesh Harjani wrote:
> Currently on calling echo 3 > drop_caches on host machine, we see
> FS corruption in the guest. This happens on Power machine where
> blocksize < pagesize.
> 
> So as a temporary workaound don't enable dioread_nolock by default
> for blocksize < pagesize until we identify the root cause.
> 
> Also emit a warning msg in case if this mount option is manually
> enabled for blocksize < pagesize.
> 
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
2020-03-19 13:24 ` Ritesh Harjani
2020-03-19 16:36 ` Jan Kara
2020-03-20  4:07   ` Aneesh Kumar K.V
2020-03-20  5:34 ` Ritesh Harjani
2020-03-20 11:49   ` Jan Kara
2020-03-21  3:22     ` Ritesh Harjani
2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
2020-03-29  2:17   ` Theodore Y. Ts'o

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git