All of lore.kernel.org
 help / color / mirror / Atom feed
* kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
@ 2015-03-23 20:01 Chris Murphy
  2015-03-23 21:13 ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2015-03-23 20:01 UTC (permalink / raw)
  To: Btrfs BTRFS

I can't tell if this is a kvm virtio blk device regression, with
cache=none and cache=directsync, or if it's a Btrfs regression.

The summary is that on a host using (Fedora) kernel 3.18.9, 3.19.2, or
any 4.0.0 kernel, with qcow2 on Btrfs, and either cache=none or
directsync, the guest Linux OS experiences many I/O blk device errors.

https://bugzilla.redhat.com/show_bug.cgi?id=1204569

If I put the qcow2 on XFS, the problem doesn't happen.

If I keep the qcow2 on Btrfs, and change the cache= to writeback,
writethrough, or unsafe, the problem doesn't happen.

It happens with either qcow2 compat 0.10 or 1.1. Raw files were not
tested. And block devices other than virtio were not tested.

In the guest, all file systems experience this and complain, some more
than others. Btrfs is most tolerant mainly reporting write errors but
completes an OS installation; ext4 complains a lot but also completes
an OS installation; XFS complains and eventually gives up with a
hardware I/O error and the OS installation fails.

I did this test maybe two years ago and this combination was safe at that time.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
  2015-03-23 20:01 kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs Chris Murphy
@ 2015-03-23 21:13 ` Chris Mason
  2015-03-24 16:10   ` Chris Murphy
  2015-03-25  5:25   ` Chris Murphy
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Mason @ 2015-03-23 21:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Mon, Mar 23, 2015 at 02:01:41PM -0600, Chris Murphy wrote:
> I can't tell if this is a kvm virtio blk device regression, with
> cache=none and cache=directsync, or if it's a Btrfs regression.
> 
> The summary is that on a host using (Fedora) kernel 3.18.9, 3.19.2, or
> any 4.0.0 kernel, with qcow2 on Btrfs, and either cache=none or
> directsync, the guest Linux OS experiences many I/O blk device errors.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1204569
> 
> If I put the qcow2 on XFS, the problem doesn't happen.
> 
> If I keep the qcow2 on Btrfs, and change the cache= to writeback,
> writethrough, or unsafe, the problem doesn't happen.
> 
> It happens with either qcow2 compat 0.10 or 1.1. Raw files were not
> tested. And block devices other than virtio were not tested.
> 
> In the guest, all file systems experience this and complain, some more
> than others. Btrfs is most tolerant mainly reporting write errors but
> completes an OS installation; ext4 complains a lot but also completes
> an OS installation; XFS complains and eventually gives up with a
> hardware I/O error and the OS installation fails.
> 
> I did this test maybe two years ago and this combination was safe at that time.

The last time we tracked down a similar problem, Josef found it was only
on windows guests.  Basically he tracked it down to buffers changing
while in flight.  I'll take a look.

-chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
  2015-03-23 21:13 ` Chris Mason
@ 2015-03-24 16:10   ` Chris Murphy
  2015-03-26 10:00     ` Paul Jones
  2015-03-25  5:25   ` Chris Murphy
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2015-03-24 16:10 UTC (permalink / raw)
  To: Chris Mason, Chris Murphy, Btrfs BTRFS

On Mon, Mar 23, 2015 at 3:13 PM, Chris Mason <clm@fb.com> wrote:
> On Mon, Mar 23, 2015 at 02:01:41PM -0600, Chris Murphy wrote:
>> I can't tell if this is a kvm virtio blk device regression, with
>> cache=none and cache=directsync, or if it's a Btrfs regression.
>>
>> The summary is that on a host using (Fedora) kernel 3.18.9, 3.19.2, or
>> any 4.0.0 kernel, with qcow2 on Btrfs, and either cache=none or
>> directsync, the guest Linux OS experiences many I/O blk device errors.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1204569
>>
>> If I put the qcow2 on XFS, the problem doesn't happen.
>>
>> If I keep the qcow2 on Btrfs, and change the cache= to writeback,
>> writethrough, or unsafe, the problem doesn't happen.
>>
>> It happens with either qcow2 compat 0.10 or 1.1. Raw files were not
>> tested. And block devices other than virtio were not tested.
>>
>> In the guest, all file systems experience this and complain, some more
>> than others. Btrfs is most tolerant mainly reporting write errors but
>> completes an OS installation; ext4 complains a lot but also completes
>> an OS installation; XFS complains and eventually gives up with a
>> hardware I/O error and the OS installation fails.
>>
>> I did this test maybe two years ago and this combination was safe at that time.
>
> The last time we tracked down a similar problem, Josef found it was only
> on windows guests.  Basically he tracked it down to buffers changing
> while in flight.  I'll take a look.

Looks like cache=none and directsync share O_DIRECT in common.

This patch suggests neither of those cache options should be used (for
different reasons).
https://github.com/libguestfs/libguestfs/commit/749e947bb0103f19feda0f29b6cbbf3cbfa350da

I stumbled on this testing GNOME Boxes which is using cache=none,
which it probably shouldn't, but nevertheless none and directsync also
shouldn't cause problems on Btrfs.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
  2015-03-23 21:13 ` Chris Mason
  2015-03-24 16:10   ` Chris Murphy
@ 2015-03-25  5:25   ` Chris Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2015-03-25  5:25 UTC (permalink / raw)
  To: Chris Mason, Btrfs BTRFS

It seems like it may not be a kernel bug. I've been able to reproduce,
and not reproduce it, with kernel 3.18.9, varying in some other factor
(Fedora 21, vs Fedora 20 live installs).

The short version,

With these, the problem does not reproduce:
libvirt-* 1.1.3.9
qemu-* 1.6.1

With these, the problem does reproduce:
libvirt-* 1.2.9.1
qemu-* 2.1.2

I've updated the bug with the details of versions and the exact qemu
command for the reproducing and non-reproducing cases.
https://bugzilla.redhat.com/show_bug.cgi?id=1204569



Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
  2015-03-24 16:10   ` Chris Murphy
@ 2015-03-26 10:00     ` Paul Jones
  2015-03-30 22:47       ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Jones @ 2015-03-26 10:00 UTC (permalink / raw)
  To: Chris Murphy, Chris Mason, Btrfs BTRFS

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1601 bytes --]

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Chris Murphy
> Sent: Wednesday, 25 March 2015 3:10 AM
> To: Chris Mason; Chris Murphy; Btrfs BTRFS
> Subject: Re: kvm bug, guest I/O blk device errors when qcow2 backing file is
> on Btrfs
> 
> On Mon, Mar 23, 2015 at 3:13 PM, Chris Mason <clm@fb.com> wrote:


> > The last time we tracked down a similar problem, Josef found it was
> > only on windows guests.  Basically he tracked it down to buffers
> > changing while in flight.  I'll take a look.
> 
> Looks like cache=none and directsync share O_DIRECT in common.
> 
> This patch suggests neither of those cache options should be used (for
> different reasons).
> https://github.com/libguestfs/libguestfs/commit/749e947bb0103f19feda0f2
> 9b6cbbf3cbfa350da
> 
> I stumbled on this testing GNOME Boxes which is using cache=none, which it
> probably shouldn't, but nevertheless none and directsync also shouldn't
> cause problems on Btrfs. 

I've got a Windows 2012 guest VM running on a linux host and I also have trouble with cache=none. There is one particular inode on the BTRFS filesystem that gets csum errors about every 6-18 hours. I swapped just about everything (hardware) trying to find the problem, but then I remembered I was experimenting with cache options. I changed it back to default and the problem went away.
Interrestingly there are no reported errors on the VM.

Paul.

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs
  2015-03-26 10:00     ` Paul Jones
@ 2015-03-30 22:47       ` Chris Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2015-03-30 22:47 UTC (permalink / raw)
  To: Paul Jones; +Cc: Chris Mason, Btrfs BTRFS

Small update. I've done a strace of the qemu process, filtering for
pwrite,pwritev while the guest experiences these I/O errors and posted
it to the bug. Kevin Wolf looked at it and says
 https://bugzilla.redhat.com/show_bug.cgi?id=1204569#c20

> There is one single failing pwritev() call in it:
> 3091  <... pwritev resumed> )           = -1 EEXIST (File exists)
> Now EEXIST is not an error code that makes any sense for a pwritev() call, but a
> quick search on the internet suggests that btrfs does indeed use this error code
> if something goes wrong internally. You should probably mention this error code
> in your linux-btrfs thread, perhaps it gives the btrfs developers a hint.



Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-03-30 22:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-23 20:01 kvm bug, guest I/O blk device errors when qcow2 backing file is on Btrfs Chris Murphy
2015-03-23 21:13 ` Chris Mason
2015-03-24 16:10   ` Chris Murphy
2015-03-26 10:00     ` Paul Jones
2015-03-30 22:47       ` Chris Murphy
2015-03-25  5:25   ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.