All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
@ 2021-08-20 17:25 ` bugzilla-daemon
  2021-08-21  8:53 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-08-20 17:25 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

Roland Kletzing (devzero@web.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |devzero@web.de

--- Comment #1 from Roland Kletzing (devzero@web.de) ---
i can confirm there is a severe issue here, which renders kvm/proxmox virtually
unusable when you have significantly io loaded hosts, i.e. if there is lots of
write io on the host or guest.

whenever you get into a situation when the disk io where the vm resides on is
getting saturated, the VMs start going nuts and getting hiccup, i.e. severe
latency is getting added to the guests. 

they behave "jumpy", you can't use the console or they are totaly sluggish,
ping goes up above 10secs , kernel throws "BUG: soft lockup - CPU#X stuck for
XXs!" and such...

i have found that with cache=writeback for the virtual machines disk which
reside on the appropriate hdd wich heavy io, things go much much more smoothly. 

without cache=writeback , live migration/move could make a guest go crazy.

now with cache=writeback i could do 3 live migrations in parallel , even with
lots of io inside the virtual machines, and even with additional writer/reader
in the host os (dd from/to the disk - ping to the guests mostly is <5ms.

so, to me this problem appears to be related to disk io saturation and probably
related to sync writes, what else can explain that cache=writeback helps so
much ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
  2021-08-20 17:25 ` [Bug 199727] CPU freezes in KVM guests during high IO load on host bugzilla-daemon
@ 2021-08-21  8:53 ` bugzilla-daemon
  2021-08-22 12:11 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-08-21  8:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #3 from Roland Kletzing (devzero@web.de) ---
i had a look at kvm process with strace. 

with virtual disk caching option set to "Default (no cache)", kvm is doing IO
submission via io_submit() instead of pwritev(), and apparently that can be a
long blocking call.

i see whenever the VM getting hickup and pingtime goes through the roof, there
is long blocking io_submit() operation in progress 

looks like a "design issue" to me and "Default (no cache)" thus being a bad
default setting.

see:
https://lwn.net/Articles/508064/

and 
https://stackoverflow.com/questions/34572559/asynchronous-io-io-submit-latency-in-ubuntu-linux

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
  2021-08-20 17:25 ` [Bug 199727] CPU freezes in KVM guests during high IO load on host bugzilla-daemon
  2021-08-21  8:53 ` bugzilla-daemon
@ 2021-08-22 12:11 ` bugzilla-daemon
  2021-08-29 14:58 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-08-22 12:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #4 from Roland Kletzing (devzero@web.de) ---
http://blog.vmsplice.net/2015/08/asynchronous-file-io-on-linux-plus-ca.html

"However, the io_submit(2) system call remains a treacherous ally in the quest
for asynchronous file I/O. I don't think much has changed since 2009 in making
Linux AIO the best asynchronous file I/O mechanism.

The main problem is that io_submit(2) waits for I/O in some cases. It can
block! This defeats the purpose of asynchronous file I/O because the caller is
stuck until the system call completes. If called from a program's event loop,
the program becomes unresponsive until the system call returns. But even if
io_submit(2) is invoked from a dedicated thread where blocking doesn't matter,
latency is introduced to any further I/O requests submitted in the same
io_submit(2) call.

Sources of blocking in io_submit(2) depend on the file system and block devices
being used. There are many different cases but in general they occur because
file I/O code paths contain synchronous I/O (for metadata I/O or page cache
write-out) as well as locks/waiting (for serializing operations). This is why
the io_submit(2) system call can be held up while submitting a request.

This means io_submit(2) works best on fully-allocated files, volumes, or block
devices. Anything else is likely to result in blocking behavior and cause poor
performance."

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2021-08-22 12:11 ` bugzilla-daemon
@ 2021-08-29 14:58 ` bugzilla-daemon
  2022-01-13 12:09 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-08-29 14:58 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #5 from Roland Kletzing (devzero@web.de) ---
apparently, things got even worse with proxmox 7, as it seems it's using async
io (via io_uring) by default for all virtual disk IO, i.e. the workaround
"cache=writeback" does not work for me anymore.

if i set aio=threads by directly editing VM configuration, things run smoothly
again.

so, still being curious:

why do VMs get severe hiccup with async (via io_submit or io_uring) when
storage is getting some load and why does that NOT happen with the described
workaround ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2021-08-29 14:58 ` bugzilla-daemon
@ 2022-01-13 12:09 ` bugzilla-daemon
  2022-02-10 13:22 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-01-13 12:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #6 from Roland Kletzing (devzero@web.de) ---
what i have also seen is VM freezes when backup runs in our gitlab vm server,
which is apparently related to fsync/fdatasync sync writes.  

at least for zfs there exists some write starvation issue , as large sync
writes may starve small ones, as there apparently is no fair scheduling for it,
see 

https://github.com/openzfs/zfs/issues/10110

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2022-01-13 12:09 ` bugzilla-daemon
@ 2022-02-10 13:22 ` bugzilla-daemon
  2022-02-12  0:13 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-02-10 13:22 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #7 from Yann Papouin (yann.papouin@gmail.com) ---
(In reply to Roland Kletzing from comment #5)
> apparently, things got even worse with proxmox 7, as it seems it's using
> async io (via io_uring) by default for all virtual disk IO, i.e. the
> workaround "cache=writeback" does not work for me anymore.
> 
> if i set aio=threads by directly editing VM configuration, things run
> smoothly again.

Are you using "cache=writeback" with "aio=threads" ?
For me, using "aio=threads" reduces the VM freeze (High CPU Load) but it still
happens on high disk IO (backup/disk move)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2022-02-10 13:22 ` bugzilla-daemon
@ 2022-02-12  0:13 ` bugzilla-daemon
  2022-02-12 10:26 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-02-12  0:13 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #8 from Roland Kletzing (devzero@web.de) ---
yes.

i found the following interesting information. i think this explains a LOT.

https://docs.openeuler.org/en/docs/20.03_LTS/docs/Virtualization/best-practices.html#i-o-thread-configuration

I/O Thread Configuration
Overview

By default, QEMU main threads handle backend VM read and write operations on
the KVM. This causes the following issues:

    VM I/O requests are processed by a QEMU main thread. Therefore, the
single-thread CPU usage becomes the bottleneck of VM I/O performance.
    The QEMU global lock (qemu_global_mutex) is used when VM I/O requests are
processed by the QEMU main thread. If the I/O processing takes a long time, the
QEMU main thread will occupy the global lock for a long time. As a result, the
VM vCPU cannot be scheduled properly, affecting the overall VM performance and
user experience.

You can configure the I/O thread attribute for the virtio-blk disk or
virtio-scsi controller. At the QEMU backend, an I/O thread is used to process
read and write requests of a virtual disk. The mapping relationship between the
I/O thread and the virtio-blk disk or virtio-scsi controller can be a
one-to-one relationship to minimize the impact on the QEMU main thread, enhance
the overall I/O performance of the VM, and improve user experience.
Configu

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2022-02-12  0:13 ` bugzilla-daemon
@ 2022-02-12 10:26 ` bugzilla-daemon
  2022-02-24 18:58 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-02-12 10:26 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #9 from Roland Kletzing (devzero@web.de) ---
https://qemu-devel.nongnu.narkive.com/I59Sm5TH/lock-contention-in-qemu
<snip>
I find the timeslice of vCPU thread in QEMU/KVM is unstable when there
are lots of read requests (for example, read 4KB each time (8GB in
total) from one file) from Guest OS. I also find that this phenomenon
may be caused by lock contention in QEMU layer. I find this problem
under following workload.
<snip>
Yes, there is a way to reduce jitter caused by the QEMU global mutex:

qemu -object iothread,id=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw,cache=none \
-device virtio-blk-pci,iothread=iothread0,drive=drive0

Now the ioeventfd and thread pool completions will be processed in
iothread0 instead of the QEMU main loop thread. This thread does not
take the QEMU global mutex so vcpu execution is not hindered.

This feature is called virtio-blk dataplane.
<snip>


i tried "virtio scsi single" with "aio=threads" and "iothread=1" in proxmox,
and after that, even with totally heavy read/write io inside 2 VMs (located on
the same spinning hdd on top of zfs lz4 + zstd dataset and qcow) and severe
write starvation (some ioping  >>30s), even while live migrating both vm disks
in parallel to another zfs dataset on the same hdd, i get absolutely NO jitter
in ping anymore. ping to both VMs is constantly at <0.2ms 

from the kvm pid:
-object iothread,id=iothread-virtioscsi0  
-device
virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 
-drive
file=/hddpool/vms-files-lz4/images/116/vm-116-disk-3.qcow2,if=none,id=drive-scsi0,cache=writeback,aio=threads,format=qcow2,detect-zeroes=on 
-device
scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2022-02-12 10:26 ` bugzilla-daemon
@ 2022-02-24 18:58 ` bugzilla-daemon
  2022-02-25  9:49 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-02-24 18:58 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #10 from Roland Kletzing (devzero@web.de) ---
nobody listening? 

what should we do with this bugreport now?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2022-02-24 18:58 ` bugzilla-daemon
@ 2022-02-25  9:49 ` bugzilla-daemon
  2022-03-02 13:33 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-02-25  9:49 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #11 from Kai Zimmer (zimmer@bbaw.de) ---
Thanks for your research Roland Kietzing. I'm also a user of Proxmox.

We spent a lot of time for troubleshooting this problem and after years finally
invested into a decent full flash storage system - now the problem has
disappeared here. But this cannot be considered a solution for all affected
users of Proxmox. 

I fear that the error description is a discouraging for any kvm developer:
"Proxmox is a Debian based virtualization distribution with an Ubuntu LTS based
kernel."

Maybe test a recent vanilla kernel version and add it to the bug metadata to
get more attention?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2022-02-25  9:49 ` bugzilla-daemon
@ 2022-03-02 13:33 ` bugzilla-daemon
  2022-03-07 19:01 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-02 13:33 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

Stefan Hajnoczi (stefanha@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stefanha@gmail.com

--- Comment #12 from Stefan Hajnoczi (stefanha@gmail.com) ---
Hi,
I contribute to QEMU and have encountered similar issues in the past. QEMU
configuration options that should allow you to avoid this issue and it sounds
like you have found options that work for you.

If io_submit(2) is blocking with aio=native, try aio=io_uring. If that is not
available (older kernels), use aio=threads to work around this particular
problem.

I recommend cache=none. Although cache=writeback can shift the problem around
it doesn't solve it and leaves the VMs open to unpredictable performance
(including I/O stalls like this) due to host memory pressure and host page
cache I/O.

Regarding the original bug report, it's a limitation of that particular QEMU
configuration. I don't think anything will be done about it in the Linux
kernel. Maybe Proxmox can adjust the QEMU configuration to avoid it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2022-03-02 13:33 ` bugzilla-daemon
@ 2022-03-07 19:01 ` bugzilla-daemon
  2022-03-08  6:20 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-07 19:01 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #13 from Roland Kletzing (devzero@web.de) ---
hello, thanks - aio=io_uring is no better, the only real way to get to a stable
system is virtio-scsi-single/iothreads=1/aio=threads

the question is why aio=native and io_uring has issues and threads has not...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (11 preceding siblings ...)
  2022-03-07 19:01 ` bugzilla-daemon
@ 2022-03-08  6:20 ` bugzilla-daemon
  2022-03-08  8:01 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-08  6:20 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #14 from Stefan Hajnoczi (stefanha@gmail.com) ---
(In reply to Roland Kletzing from comment #13)
> hello, thanks - aio=io_uring is no better, the only real way to get to a
> stable system is virtio-scsi-single/iothreads=1/aio=threads
> 
> the question is why aio=native and io_uring has issues and threads has not...

Are you using cache=none with io_uring and the io_uring_enter(2) syscall is
blocking for a long period of time?

aio=threads avoids softlockups because the preadv(2)/pwritev(2)/fdatasync(2)
syscalls run in worker threads that don't take the QEMU global mutex. Therefore
vcpu threads can execute even when I/O is stuck in the kernel due to a lock.

io_uring should avoid that problem too because it is supposed to submit I/O
truly asynchronously.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (12 preceding siblings ...)
  2022-03-08  6:20 ` bugzilla-daemon
@ 2022-03-08  8:01 ` bugzilla-daemon
  2022-03-08  8:26 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-08  8:01 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #15 from Roland Kletzing (devzero@web.de) ---
yes, i was using cache=none and io_uring also caused issues. 

>aio=threads avoids softlockups because the preadv(2)/pwritev(2)/fdatasync(2)
> syscalls run in worker threads that don't take the QEMU global mutex. 
>Therefore vcpu threads can execute even when I/O is stuck in the kernel due to
>a lock.

yes, that was a long search/journey to get to this information/params....

regarding io_uring - after proxmox enabled it as default, it was taken back
again after some issues had been reported.

have look at:
https://github.com/proxmox/qemu-server/blob/master/debian/changelog

maybe it's not ready for primetime yet !?

-- Proxmox Support Team <support@proxmox.com>  Fri, 30 Jul 2021 16:53:44 +0200
qemu-server (7.0-11) bullseye; urgency=medium
<snip>
  * lvm: avoid the use of io_uring for now
<snip>
-- Proxmox Support Team <support@proxmox.com>  Fri, 23 Jul 2021 11:08:48 +0200
qemu-server (7.0-10) bullseye; urgency=medium
<snip>
  * avoid using io_uring for drives backed by LVM and configured for write-back
    or write-through cache
<snip>
 -- Proxmox Support Team <support@proxmox.com>  Mon, 05 Jul 2021 20:49:50 +0200
qemu-server (7.0-6) bullseye; urgency=medium
<snip>
  * For now do not use io_uring for drives backed by Ceph RBD, with KRBD and
    write-back or write-through cache enabled, as in that case some polling/IO
    may hang in QEMU 6.0.
<snip>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (13 preceding siblings ...)
  2022-03-08  8:01 ` bugzilla-daemon
@ 2022-03-08  8:26 ` bugzilla-daemon
  2022-03-26 15:17 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-08  8:26 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #16 from Stefan Hajnoczi (stefanha@gmail.com) ---
On Tue, 8 Mar 2022 at 08:01, <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=199727
>
> --- Comment #15 from Roland Kletzing (devzero@web.de) ---
> yes, i was using cache=none and io_uring also caused issues.
>
> >aio=threads avoids softlockups because the preadv(2)/pwritev(2)/fdatasync(2)
> > syscalls run in worker threads that don't take the QEMU global mutex.
> >Therefore vcpu threads can execute even when I/O is stuck in the kernel due
> to
> >a lock.
>
> yes, that was a long search/journey to get to this information/params....
>
> regarding io_uring - after proxmox enabled it as default, it was taken back
> again after some issues had been reported.
>
> have look at:
> https://github.com/proxmox/qemu-server/blob/master/debian/changelog
>
> maybe it's not ready for primetime yet !?
>
> -- Proxmox Support Team <support@proxmox.com>  Fri, 30 Jul 2021 16:53:44
> +0200
> qemu-server (7.0-11) bullseye; urgency=medium
> <snip>
>   * lvm: avoid the use of io_uring for now
> <snip>
> -- Proxmox Support Team <support@proxmox.com>  Fri, 23 Jul 2021 11:08:48
> +0200
> qemu-server (7.0-10) bullseye; urgency=medium
> <snip>
>   * avoid using io_uring for drives backed by LVM and configured for
>   write-back
>     or write-through cache
> <snip>
>  -- Proxmox Support Team <support@proxmox.com>  Mon, 05 Jul 2021 20:49:50
>  +0200
> qemu-server (7.0-6) bullseye; urgency=medium
> <snip>
>   * For now do not use io_uring for drives backed by Ceph RBD, with KRBD and
>     write-back or write-through cache enabled, as in that case some
>     polling/IO
>     may hang in QEMU 6.0.
> <snip>

Changelog messages mention cache=writethrough and cache=writeback,
which are both problematic because host memory pressure will interfere
with guest performance. That is probably not an issue with io_uring
per se, just another symptom of using cache=writeback/writethrough in
cases where it's inappropriate.

If you have trace data showing io_uring_enter(2) hanging with
cache=none then Jens Axboe and other io_uring developers may be able
to help resolve that.

Stefan

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (14 preceding siblings ...)
  2022-03-08  8:26 ` bugzilla-daemon
@ 2022-03-26 15:17 ` bugzilla-daemon
  2022-04-06 23:25 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-03-26 15:17 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

Chris M (cmultari@mpihq.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cmultari@mpihq.com

--- Comment #17 from Chris M (cmultari@mpihq.com) ---
Experiencing the same issue with Proxmox 7 under high IO load. To achieve the
highest stability, are you setting all VMs to async IO Threads/IO Thread/Virtio
SCSI Single, or just the machines with the highest load?  I moved our higher
load machines to those settings but still experience the issue at times.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (15 preceding siblings ...)
  2022-03-26 15:17 ` bugzilla-daemon
@ 2022-04-06 23:25 ` bugzilla-daemon
  2022-04-06 23:52 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-04-06 23:25 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

Gergely Kovacs (gkovacs@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|3.x, 4.2, 4.4, 4.10         |3.x, 4.x, 5.x

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (16 preceding siblings ...)
  2022-04-06 23:25 ` bugzilla-daemon
@ 2022-04-06 23:52 ` bugzilla-daemon
  2022-11-29 10:03 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-04-06 23:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #18 from Gergely Kovacs (gkovacs@gmail.com) ---
Thank you Roland Kletzing for your exhaustive investigation and Stefan Hajnoczi
for your insightful comments. This is a problem that has been affecting us (and
many many users of Proxmox and likely vanilla KVM) for more than a decade, yet
the Proxmox developers were unable to solve it or even reproduce it (despite
the large number of forum threads and bugs filed), hence the reason for me
creating this bugreport 4 years ago.

It looks like we are closing in: the KVM global mutex could be the real
culprit, as in our case the problems were only mostly gone by moving all our VM
storage to NVMe (increasing IO bandwidth by a LOT), but fully gone after
setting VirtIO SCSI Single / iothread=1 / aio=threads on all our KVM guests.
For many years VM migrations or restores could render other VMs on the same
host practically unusable for the duration of the heavy IO, now these
operations can be safely done.

I will experiment with io_uring in the near future and report back my findings,
will leave the status NEW since I reckon attention should be given to the ring
io code to achieve the same stability as threaded io.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (17 preceding siblings ...)
  2022-04-06 23:52 ` bugzilla-daemon
@ 2022-11-29 10:03 ` bugzilla-daemon
  2024-02-01 13:15 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-11-29 10:03 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #19 from Roland Kletzing (devzero@web.de) ---
@chris

>To achieve the highest stability, are you setting all VMs to async IO
>Threads/IO Thread/Virtio SCSI Single, or just the machines with the 
>highest load?

i have set ALL our virtual maschines to this

@gergely, any news on this? should we close this ticket ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (18 preceding siblings ...)
  2022-11-29 10:03 ` bugzilla-daemon
@ 2024-02-01 13:15 ` bugzilla-daemon
  2024-02-01 13:25 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2024-02-01 13:15 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #20 from Marco Gabriel (mgabriel@inett.de) ---
Thanks to all, especially Robert, Stefan and Gergely for this exhaustive bug
research.

Please keep this ticket open as this problem still persists in Proxmox 8.x with
kernel 6.5 and it seems to get worse.

Thanks to all,
Marco

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (19 preceding siblings ...)
  2024-02-01 13:15 ` bugzilla-daemon
@ 2024-02-01 13:25 ` bugzilla-daemon
  2024-02-01 13:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2024-02-01 13:25 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #21 from Roland Kletzing (devzero@web.de) ---
do you have more details (e.g. proxmox forum thread) for this ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (20 preceding siblings ...)
  2024-02-01 13:25 ` bugzilla-daemon
@ 2024-02-01 13:46 ` bugzilla-daemon
  2024-02-01 13:51 ` bugzilla-daemon
  2024-02-01 19:56 ` bugzilla-daemon
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2024-02-01 13:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #22 from Marco Gabriel (mgabriel@inett.de) ---
I have several sources, at least multiple clients suffering from possible the
same problem.

As we're in touch with the Proxmox Support, I can't directly point to a forum
message, but probably to related/same issues in other trackers:

- https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756
-
https://forum.proxmox.com/threads/redhat-virtio-developers-would-like-to-coordinate-with-proxmox-devs-re-vioscsi-reset-to-device-system-unresponsive.139160/
(I guess you know this thread already)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (21 preceding siblings ...)
  2024-02-01 13:46 ` bugzilla-daemon
@ 2024-02-01 13:51 ` bugzilla-daemon
  2024-02-01 19:56 ` bugzilla-daemon
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2024-02-01 13:51 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #23 from Marco Gabriel (mgabriel@inett.de) ---
(In reply to Roland Kletzing from comment #13)
> hello, thanks - aio=io_uring is no better, the only real way to get to a
> stable system is virtio-scsi-single/iothreads=1/aio=threads
> 
> the question is why aio=native and io_uring has issues and threads has not...

Just for reference: Using aio=threads doesn't help on our lab and customer
setups (Proxmox/Ceph HCI) - we still see vm freezes after several minutes when
I/O load is high.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 199727] CPU freezes in KVM guests during high IO load on host
       [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
                   ` (22 preceding siblings ...)
  2024-02-01 13:51 ` bugzilla-daemon
@ 2024-02-01 19:56 ` bugzilla-daemon
  23 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2024-02-01 19:56 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=199727

--- Comment #24 from Gergely Kovacs (gkovacs@gmail.com) ---
To be clear, in our experience aio=threads and iothread=1 solved all VM freezes
on local storage, regardless of the VM running from SATA/SAS HDD or SSD, or
NVME SSD, so it's a great mitigating step. Ceph is not solved for us either,
therefore using Ceph is still not recommended for KVM storage until this bug is
actually fixed (instead of mitigated).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-02-01 19:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-199727-28872@https.bugzilla.kernel.org/>
2021-08-20 17:25 ` [Bug 199727] CPU freezes in KVM guests during high IO load on host bugzilla-daemon
2021-08-21  8:53 ` bugzilla-daemon
2021-08-22 12:11 ` bugzilla-daemon
2021-08-29 14:58 ` bugzilla-daemon
2022-01-13 12:09 ` bugzilla-daemon
2022-02-10 13:22 ` bugzilla-daemon
2022-02-12  0:13 ` bugzilla-daemon
2022-02-12 10:26 ` bugzilla-daemon
2022-02-24 18:58 ` bugzilla-daemon
2022-02-25  9:49 ` bugzilla-daemon
2022-03-02 13:33 ` bugzilla-daemon
2022-03-07 19:01 ` bugzilla-daemon
2022-03-08  6:20 ` bugzilla-daemon
2022-03-08  8:01 ` bugzilla-daemon
2022-03-08  8:26 ` bugzilla-daemon
2022-03-26 15:17 ` bugzilla-daemon
2022-04-06 23:25 ` bugzilla-daemon
2022-04-06 23:52 ` bugzilla-daemon
2022-11-29 10:03 ` bugzilla-daemon
2024-02-01 13:15 ` bugzilla-daemon
2024-02-01 13:25 ` bugzilla-daemon
2024-02-01 13:46 ` bugzilla-daemon
2024-02-01 13:51 ` bugzilla-daemon
2024-02-01 19:56 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.