All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: linux-ext4@vger.kernel.org, "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Jan Kara <jack@suse.cz>
Subject: Re: Ext4 corruption with VM images as 3 > drop_caches
Date: Thu, 19 Mar 2020 18:54:32 +0530	[thread overview]
Message-ID: <20200319132433.A2A88A404D@d06av23.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <87pndagw7s.fsf@linux.ibm.com>



On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
> Hi,
> 
> With new vm install I am finding corruption with the vm image if I
> follow up the install with echo 3 > /proc/sys/vm/drop_caches
> 
> The file system reports below error.
> 
> Begin: Running /scripts/local-bottom ... done.
> Begin: Running /scripts/init-bottom ...
> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode #787185: comm sh: iget: checksum invalid
> done.
> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode #917954: comm init: iget: checksum invalid
> /sbin/init: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 74
> [    5.271207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 
> And debugfs reports
> 
> debugfs:  stat <917954>
> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
> Size of extra inode fields: 0
> Inode checksum: 0x00000000
> BLOCKS:
> debugfs:
> 
> Bisecting this finds
> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make dioread_nolock the default")
> as bad. If I revert the same on top of linus upstream(fb33c6510d5595144d585aa194d377cf74d31911)
> I don't hit the corrupttion anymore.

Tried replicating this and could easily replicate it on Power box.
I tried to reproduce this on x86 too, but could not reproduce on x86.
Now one difference on Power could be that pagesize is 64K and fs
blocksize is 4K.

The issue looks like the guest qemu image file is not properly written
back, after host does echo 3 > drop_caches. (correct me if this is not
the case).

I tried replicating via below test, but it could not reproduce.

Any idea what kind of unit test could be written for this?
I am not sure how exactly qemu is writing to it's image file.


1. Create 2 files. "mmap-file", "mmap-data".
2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
with both 64KB align and 4KB align offsets), try to write
pagesize/blocksize amount of known data pattern.
3. These offsets (which are pagesize/blocksize align) are recorded into
"mmap-data" file via normal read/write calls.
4. Then after we wrote to both files, we munmap the "mmap-file" and
close both of these files.
5. Then we do echo 3 > drop_caches.
6. Then in the verify phase, using the offsets written in "mmap-data"
file, I read the "mmap-file" to verify if it's contents are proper or
not.
With that could not reproduce this issue.


-ritesh



  reply	other threads:[~2020-03-19 13:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18  3:47 Ext4 corruption with VM images as 3 > drop_caches Aneesh Kumar K.V
2020-03-19 13:24 ` Ritesh Harjani [this message]
2020-03-19 16:36 ` Jan Kara
2020-03-20  4:07   ` Aneesh Kumar K.V
2020-03-20  5:34 ` Ritesh Harjani
2020-03-20 11:49   ` Jan Kara
2020-03-21  3:22     ` Ritesh Harjani
2020-03-27 20:07 ` [PATCH] ext4: Don't set dioread_nolock by default for blocksize < pagesize Ritesh Harjani
2020-03-29  2:17   ` Theodore Y. Ts'o
2020-05-11  8:07     ` Ritesh Harjani
2020-05-12 11:45       ` Greg KH
2020-05-12 12:50         ` Ritesh Harjani
2020-05-12 12:59           ` Greg KH
2020-05-12 14:13             ` Sasha Levin
2020-05-12 16:12               ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200319132433.A2A88A404D@d06av23.portsmouth.uk.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.