All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
@ 2016-01-19 22:02 Roman Mamedov
  2016-01-19 22:18 ` Paolo Bonzini
  2016-01-20  0:19 ` Liu Bo
  0 siblings, 2 replies; 12+ messages in thread
From: Roman Mamedov @ 2016-01-19 22:02 UTC (permalink / raw)
  To: linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

Hello,

I'm facing a strange issue:

Starting with the kernel 4.4.0, a KVM guest stored on a Btrfs filesystem,
if it's using the "virtio-scsi" disk backend, will hard lock-up instantly,
as soon as the Btrfs subvolume which contains its backing file is snapshotted.

There's nothing in dmesg neither on the guest, nor on the host;
the KVM process can be killed from the host just fine and then restarted,
so it doesn't seem to be a kernel-side deadlock of any sort.

KVM disk controller which exhibits the problem:

 -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd \

The alternative which works fine:

 -device ide-hd,drive=hd,bus=ide.0 \

The disk device line is common to both cases:

 -drive if=none,id=hd,cache=writeback,aio=threads,format=raw,file=$NAME.img,discard=unmap,detect-zeroes=unmap \

Also tried aio=native with the problematic variant, no change.

Both variants work fine and unaffected by snapshotting on kernel 3.18.25.

Any ideas?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-19 22:02 Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting Roman Mamedov
@ 2016-01-19 22:18 ` Paolo Bonzini
  2016-01-19 23:03   ` Roman Mamedov
  2016-01-20  0:19 ` Liu Bo
  1 sibling, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-01-19 22:18 UTC (permalink / raw)
  To: Roman Mamedov, linux-btrfs, kvm



On 19/01/2016 23:02, Roman Mamedov wrote:
> Hello,
> 
> I'm facing a strange issue:
> 
> Starting with the kernel 4.4.0, a KVM guest stored on a Btrfs
> filesystem, if it's using the "virtio-scsi" disk backend, will hard
> lock-up instantly, as soon as the Btrfs subvolume which contains
> its backing file is snapshotted.

So this is kernel 4.4 on the host, and virtio-scsi in the guest.  What
about virtio-blk in the guest?

The only difference I can see between ide and virtio-* is that IDE
only has a queue depth of 1.


> There's nothing in dmesg neither on the guest, nor on the host; the
> KVM process can be killed from the host just fine and then
> restarted, so it doesn't seem to be a kernel-side deadlock of any
> sort.
> 
> KVM disk controller which exhibits the problem:
> 
> -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd \
> 
> The alternative which works fine:
> 
> -device ide-hd,drive=hd,bus=ide.0 \
> 
> The disk device line is common to both cases:
> 
> -drive
> if=none,id=hd,cache=writeback,aio=threads,format=raw,file=$NAME.img,discard=unmap,detect-zeroes=unmap
> \

So you're snapshotting the subvolume that includes $NAME.img?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-19 22:18 ` Paolo Bonzini
@ 2016-01-19 23:03   ` Roman Mamedov
  2016-01-20  8:28     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Mamedov @ 2016-01-19 23:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

Hello,

On Tue, 19 Jan 2016 23:18:59 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> So this is kernel 4.4 on the host, and virtio-scsi in the guest.  What
> about virtio-blk in the guest?

Tried with:

  -device virtio-blk-pci,drive=hd,scsi=off

it locks up as well.

In fact turns out it's not 100% lock-up, more like 50-80% chance.

But it certainly happens if I trigger heavy disk activity in the guest:

  while true; do dd if=/dev/zero of=zerofile count=1024 bs=1M; sync; done

...at the same time repeatedly snapshotting the subvolume on the host.

Could not trigger it even once with IDE so far.

> > -drive
> > if=none,id=hd,cache=writeback,aio=threads,format=raw,file=$NAME.img,discard=unmap,detect-zeroes=unmap
> > \
> 
> So you're snapshotting the subvolume that includes $NAME.img?

Yes


-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-19 22:02 Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting Roman Mamedov
  2016-01-19 22:18 ` Paolo Bonzini
@ 2016-01-20  0:19 ` Liu Bo
  2016-01-20  5:08   ` Roman Mamedov
  1 sibling, 1 reply; 12+ messages in thread
From: Liu Bo @ 2016-01-20  0:19 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs, kvm

On Wed, Jan 20, 2016 at 03:02:23AM +0500, Roman Mamedov wrote:
> Hello,
> 
> I'm facing a strange issue:
> 
> Starting with the kernel 4.4.0, a KVM guest stored on a Btrfs filesystem,
> if it's using the "virtio-scsi" disk backend, will hard lock-up instantly,
> as soon as the Btrfs subvolume which contains its backing file is snapshotted.
> 
> There's nothing in dmesg neither on the guest, nor on the host;
> the KVM process can be killed from the host just fine and then restarted,
> so it doesn't seem to be a kernel-side deadlock of any sort.
> 
> KVM disk controller which exhibits the problem:
> 
>  -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd \
> 
> The alternative which works fine:
> 
>  -device ide-hd,drive=hd,bus=ide.0 \
> 
> The disk device line is common to both cases:
> 
>  -drive if=none,id=hd,cache=writeback,aio=threads,format=raw,file=$NAME.img,discard=unmap,detect-zeroes=unmap \
> 
> Also tried aio=native with the problematic variant, no change.
> 
> Both variants work fine and unaffected by snapshotting on kernel 3.18.25.
> 
> Any ideas?

What about 'btrfs fi df' 's output on the directory where $NAME.img is
located?

According to my experience, kvm guest can be 'no response' when no space
on the backend's filesystem, but you said that the alternative way
work, so it doesn't look like a no space issue.

Thanks,

-liubo
> 
> -- 
> With respect,
> Roman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20  0:19 ` Liu Bo
@ 2016-01-20  5:08   ` Roman Mamedov
  2016-01-20 11:34     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Mamedov @ 2016-01-20  5:08 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 1087 bytes --]

On Tue, 19 Jan 2016 16:19:44 -0800
Liu Bo <bo.li.liu@oracle.com> wrote:

> What about 'btrfs fi df' 's output on the directory where $NAME.img is
> located?

$ sudo btrfs fi df .
Data, single: total=8.59TiB, used=8.47TiB
System, DUP: total=32.00MiB, used=624.00KiB
Metadata, DUP: total=13.00GiB, used=9.87GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
-               9.1T  8.5T  571G  94% /mnt/r5/vm

The guest VM disk itself is 8 GB in size.

> According to my experience, kvm guest can be 'no response' when no space
> on the backend's filesystem, but you said that the alternative way
> work, so it doesn't look like a no space issue.

I don't get any ENOSPC errors on the host, also it doesn't seem to be a
slowdown caused by disk I/O searching for free space, just a complete lock-up:
I monitor the guest with 'ping', and when it happens, the ping responses stop
immediately and do not return, also there is no reaction to any keypress in the
KVM VNC window.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-19 23:03   ` Roman Mamedov
@ 2016-01-20  8:28     ` Paolo Bonzini
  2016-01-20 13:58       ` Roman Mamedov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-01-20  8:28 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs, kvm

> Hello,
> 
> On Tue, 19 Jan 2016 23:18:59 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > So this is kernel 4.4 on the host, and virtio-scsi in the guest.  What
> > about virtio-blk in the guest?
> 
> Tried with:
> 
>   -device virtio-blk-pci,drive=hd,scsi=off
> 
> it locks up as well.
> 
> In fact turns out it's not 100% lock-up, more like 50-80% chance.
> 
> But it certainly happens if I trigger heavy disk activity in the guest:
> 
>   while true; do dd if=/dev/zero of=zerofile count=1024 bs=1M; sync; done
> 
> ...at the same time repeatedly snapshotting the subvolume on the host.
> 
> Could not trigger it even once with IDE so far.

With IDE it's pretty much impossible to trigger heavy disk activity
(which comes from the sync more than the dd).

Can you reproduce it on the host without QEMU in the middle?  Also,
can you find the value of the "FUA" file in sysfs (e.g. with
"cat /sys/block/sda/device/scsi_disk/*/FUA")?

Paolo

> > > -drive
> > > if=none,id=hd,cache=writeback,aio=threads,format=raw,file=$NAME.img,discard=unmap,detect-zeroes=unmap
> > > \
> > 
> > So you're snapshotting the subvolume that includes $NAME.img?
> 
> Yes
> 
> 
> --
> With respect,
> Roman
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20  5:08   ` Roman Mamedov
@ 2016-01-20 11:34     ` Paolo Bonzini
  2016-01-20 15:08       ` Roman Mamedov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-01-20 11:34 UTC (permalink / raw)
  To: Roman Mamedov, bo.li.liu; +Cc: linux-btrfs, kvm



On 20/01/2016 06:08, Roman Mamedov wrote:
> I don't get any ENOSPC errors on the host, also it doesn't seem to
> be a slowdown caused by disk I/O searching for free space, just a
> complete lock-up: I monitor the guest with 'ping', and when it
> happens, the ping responses stop immediately and do not return,
> also there is no reaction to any keypress in the KVM VNC window.

Can you go to the QEMU monitor (Ctrl-Alt-F2 typically, or just add
"-monitor stdio" to the QEMU command line) and type "info status"?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20  8:28     ` Paolo Bonzini
@ 2016-01-20 13:58       ` Roman Mamedov
  2016-01-20 17:28         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Mamedov @ 2016-01-20 13:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Wed, 20 Jan 2016 03:28:04 -0500 (EST)
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Can you reproduce it on the host without QEMU in the middle?

I did not try to stress-test this on purpose yet, but this is an actively used
system which is snapshotting its /home, VMs and root subvolumes every hour, so
far no lock-ups of any other application than KVM, and with KVM it's trivially
easy to hit, in fact first two lockups occured in the first two hours of using
the kernel 4.4.0 at the hourly snapshots...

> Also, can you find the value of the "FUA" file in sysfs (e.g. with
> "cat /sys/block/sda/device/scsi_disk/*/FUA")?

It is 0 both in IDE and SCSI modes.

Since you mentioned queue depth, I tried setting this on SCSI:

echo 1 > /sys/block/sda/device/queue_depth

(was 128 by default). Does not solve the issue and doesn't seem to make it any
harder to hit.

Couple more details about the locked-up state:

- 0.0% "wa" state in 'top', so there's doesn't seem to be any disk activity

- I can disconnect and then successfully reconnect to the VNC port of the
  locked-up QEMU/KVM process (!), so it looks like it's not entirely dead, but
  just the virtualization part(?)

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20 11:34     ` Paolo Bonzini
@ 2016-01-20 15:08       ` Roman Mamedov
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Mamedov @ 2016-01-20 15:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: bo.li.liu, linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

On Wed, 20 Jan 2016 12:34:02 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 20/01/2016 06:08, Roman Mamedov wrote:
> > I don't get any ENOSPC errors on the host, also it doesn't seem to
> > be a slowdown caused by disk I/O searching for free space, just a
> > complete lock-up: I monitor the guest with 'ping', and when it
> > happens, the ping responses stop immediately and do not return,
> > also there is no reaction to any keypress in the KVM VNC window.
> 
> Can you go to the QEMU monitor (Ctrl-Alt-F2 typically, or just add
> "-monitor stdio" to the QEMU command line) and type "info status"?

Aha, now this is getting somewhere.
------------------
(qemu) info status
VM status: paused (io-error)

(qemu) info block
hd: vm.img (raw)
    I/O status:       nospace
    Detect zeroes:    unmap

ide1-cd0: [not inserted]
    Removable device: not locked, tray closed

floppy0: [not inserted]
    Removable device: not locked, tray closed

sd0: [not inserted]
    Removable device: not locked, tray closed

(qemu) info block-jobs
No active jobs
------------------

So this seems to be an transient ENOSPC error returned to the data writing
process by the FS during snapshotting. I can now reproduce this without KVM,
just with dd. Seems not much to do with KVM then, I will post more details to
the btrfs list. Thanks!

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20 13:58       ` Roman Mamedov
@ 2016-01-20 17:28         ` Paolo Bonzini
  2016-01-20 18:46           ` Roman Mamedov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-01-20 17:28 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs, kvm



On 20/01/2016 14:58, Roman Mamedov wrote:
> On Wed, 20 Jan 2016 03:28:04 -0500 (EST) Paolo Bonzini
> <pbonzini@redhat.com> wrote:
> 
>> Can you reproduce it on the host without QEMU in the middle?
> 
> I did not try to stress-test this on purpose yet, but this is an
> actively used system which is snapshotting its /home, VMs and root
> subvolumes every hour, so far no lock-ups of any other application
> than KVM, and with KVM it's trivially easy to hit, in fact first
> two lockups occured in the first two hours of using the kernel
> 4.4.0 at the hourly snapshots...
> 
>> Also, can you find the value of the "FUA" file in sysfs (e.g.
>> with "cat /sys/block/sda/device/scsi_disk/*/FUA")?
> 
> It is 0 both in IDE and SCSI modes.

And in the host?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20 17:28         ` Paolo Bonzini
@ 2016-01-20 18:46           ` Roman Mamedov
  2016-01-20 20:38             ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Mamedov @ 2016-01-20 18:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-btrfs, kvm

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Wed, 20 Jan 2016 18:28:33 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> >> Also, can you find the value of the "FUA" file in sysfs (e.g.
> >> with "cat /sys/block/sda/device/scsi_disk/*/FUA")?
> > 
> > It is 0 both in IDE and SCSI modes.
> 
> And in the host?

On the host the backing block device for the host filesystem is an MD RAID6, it
does not have the FUA setting. For the actual sdX disks, it's all 0 as well.

As I said this might be not KVM-specific, but rather a Btrfs-only related
issue. That KVM with IDE doesn't show the problem, might be a coincidence since
as you mentioned it doesn't load the disks as heavily.

If you're interested in my reproducer outside of KVM, I started a new thread:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/52369

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting
  2016-01-20 18:46           ` Roman Mamedov
@ 2016-01-20 20:38             ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2016-01-20 20:38 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs, kvm



On 20/01/2016 19:46, Roman Mamedov wrote:
> On Wed, 20 Jan 2016 18:28:33 +0100 Paolo Bonzini
> <pbonzini@redhat.com> wrote:
> 
>>>> Also, can you find the value of the "FUA" file in sysfs
>>>> (e.g. with "cat /sys/block/sda/device/scsi_disk/*/FUA")?
>>> 
>>> It is 0 both in IDE and SCSI modes.
>> 
>> And in the host?
> 
> On the host the backing block device for the host filesystem is an
> MD RAID6, it does not have the FUA setting. For the actual sdX
> disks, it's all 0 as well.
> 
> As I said this might be not KVM-specific, but rather a Btrfs-only
> related issue. That KVM with IDE doesn't show the problem, might be
> a coincidence since as you mentioned it doesn't load the disks as
> heavily.
> 
> If you're interested in my reproducer outside of KVM, I started a
> new thread: 
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/52369

Oh that's great.  Indeed the "info status" output would have pointed
at an ENOSPC.  It's definitely not KVM specific.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-20 20:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-19 22:02 Kernel 4.4.0 KVM guest on Btrfs locks up on snapshotting Roman Mamedov
2016-01-19 22:18 ` Paolo Bonzini
2016-01-19 23:03   ` Roman Mamedov
2016-01-20  8:28     ` Paolo Bonzini
2016-01-20 13:58       ` Roman Mamedov
2016-01-20 17:28         ` Paolo Bonzini
2016-01-20 18:46           ` Roman Mamedov
2016-01-20 20:38             ` Paolo Bonzini
2016-01-20  0:19 ` Liu Bo
2016-01-20  5:08   ` Roman Mamedov
2016-01-20 11:34     ` Paolo Bonzini
2016-01-20 15:08       ` Roman Mamedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.