qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <1847793@bugs.launchpad.net>
To: qemu-devel@nongnu.org
Subject: [Bug 1847793] Re: qemu 4.1.0 - Corrupt guest filesystem after new vm install
Date: Thu, 24 Oct 2019 14:20:00 -0000	[thread overview]
Message-ID: <157192680088.29240.2272202812607798113.malone@soybean.canonical.com> (raw)
In-Reply-To: 157080798335.681.12255731732435282400.malonedeb@chaenomeles.canonical.com

I suppose that the problem described in bug 1846427 can also affect
guest data, so I think it makes sense to divide based on whether there
are only data corruptions or both data and metadata corruptions.

So far, I don’t know of a report of pure guest data corruptions (without
qcow2 metadata being affected) that didn’t happen on XFS, so I assume
there is an issue that affects both data and metadata on all filesystems
(described by 1846427; Kevin has sent a patch series upstream ot address
it), and another one that only affects guest data and only occurs on XFS
(this one).

Actually, there are two problems we know of on XFS:

The first one was a bug in qemu that has been fixed upstream by
b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2.  People that don’t use master
but the 4.1 release instead are likely to hit that problem instead of
the other one.

The second one seems to be a kernel bug.  When fallocating (writing zeroes in our case) and writing to a file in parallel, the write is discarded if:
- The fallocated area begins at or after the EOF,
- The written area begins after the fallocated area,
- The write is submitted through the AIO interface (io_submit()),
- The write and the fallocate operation are submitted before either one finishes (i.e. concurrently),
- The fallocate operation finishes after the write.

In qemu, this happens only with aio=native, and then most of the time
when an FALLOC_FL_ZERO_RANGE happens after the EOF while a write after
that range is ongoing.


Claus as the reporter didn’t use aio=native, so if he’s indeed on XFS, he can’t have hit this second bug.  If he’s on XFS, he will most likely have hit the first one that’s already fixed in master.


Still, we need to fix the second bug.  As for how…  It looks to me like a kernel bug, so in qemu we can’t do anything to fix it.  But we should probably work around it.  Kevin has proposed making zero-writes on XFS serializing until infinity, basically (i.e. UINT64_MAX in practice).  That gives us some layering problems (either the file-posix block driver needs access to the TrackedRequest to extend its length, or the generic block layer needs to know whether a file-posix node is on XFS), and it yields the question of how to detect whether the bug has been fixed in the kernel.

Max

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1847793

Title:
  qemu 4.1.0 - Corrupt guest filesystem after new vm install

Status in QEMU:
  New

Bug description:
  When I install a new vm with qemu 4.1.0 all the guest filesystems are
  corrupt. The first boot from the install dvd iso is ok and the
  installer work fine. But the guest system hangs after the installer
  finishes and I reboot the guest. I can see the grub boot menue but the
  system cannot load the initramfs.

  Testet with:
  - RedHat Enterprise Linux 7.5, 7.6 and 7.7 (RedHat uses xfs for the /boot and / partition)
  Guided install with the graphical installer, no lvm selected.
  - Debian Stable/Buster (Debian uses ext4 for / and /home partition)
  Guidet install with the graphical installer and default options.

  Used commandline to create the vm disk image:
  qemu-img create -f qcow2 /volumes/disk2-part2/vmdisks/vmtest10-1.qcow2 20G

  Used qemu commandline for vm installation:
  #!/bin/sh
  # vmtest10 Installation
  #
  /usr/bin/qemu-system-x86_64  -cpu SandyBridge-IBRS \
      -soundhw hda \
      -M q35 \
      -k de \
      -vga qxl \
      -machine accel=kvm \
      -m 4096 \
      -display gtk \
      -drive file=/volumes/disk2-part2/images/debian-10.0.0-amd64-DVD-1.iso,if=ide,media=cdrom \
      -drive file=/volumes/disk2-part2/images/vmtest10-1.qcow2,if=virtio,media=disk,cache=writeback \
      -boot once=d,menu=off \
      -device virtio-net-pci,mac=52:54:00:2c:02:6c,netdev=vlan0 \
      -netdev bridge,br=br0,id=vlan0 \
      -rtc base=localtime \
      -name "vmtest10" \
      -usb -device usb-tablet \
      -spice disable-ticketing \
      -device virtio-serial-pci \
      -device virtserialport,chardev=spicechannel0,name=com.redhat.spice.0 \
      -chardev spicevmc,id=spicechannel0,name=vdagent $*

  Host OS:
  Archlinux (last updated at 10.10.2019)
  Linux testing 5.3.5-arch1-1-ARCH #1 SMP PREEMPT Mon Oct 7 19:03:08 UTC 2019 x86_64 GNU/Linux
  No libvirt in use.

  
  With qemu 4.0.0 it works fine without any errors.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1847793/+subscriptions


  parent reply	other threads:[~2019-10-24 15:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-11 15:33 [Bug 1847793] [NEW] qemu 4.1.0 - Corrupt guest filesystem after new vm install Claus Paetow
2019-10-14 14:51 ` [Bug 1847793] " Dr. David Alan Gilbert
2019-10-16 12:46 ` Claus Paetow
2019-10-16 13:17 ` Dr. David Alan Gilbert
2019-10-16 13:28 ` Max Reitz
2019-10-16 15:41 ` psyhomb
2019-10-17  9:15 ` Laszlo Ersek (Red Hat)
2019-10-21  8:46 ` Max Reitz
2019-10-21 12:23 ` Simon John
2019-10-24 14:20 ` Max Reitz [this message]
2019-10-30 11:56 ` Matti Hameister
2019-10-30 16:59 ` Max Reitz
2019-10-31 13:55 ` Claus Paetow
2019-11-04 11:50 ` Wayne
2019-11-05 11:41 ` Max Reitz
2020-08-12 11:47 ` Laszlo Ersek (Red Hat)
2021-04-22  7:42 ` Thomas Huth
2021-06-22  4:18 ` Launchpad Bug Tracker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157192680088.29240.2272202812607798113.malone@soybean.canonical.com \
    --to=1847793@bugs.launchpad.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).