All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 215459] New: VM freezes starting with kernel 5.15
@ 2022-01-06 11:03 bugzilla-daemon
  2022-01-06 11:18 ` Maxim Levitsky
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 11:03 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

            Bug ID: 215459
           Summary: VM freezes starting with kernel 5.15
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.15.*
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: th3voic3@mailbox.org
        Regression: No

Created attachment 300234
  --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
qemu.hook and libvirt xml

Hi,

starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10 VM.
Downgrading to 5.14.16 fixes the issue.

I can't find any error messages in dmesg when this happens and comparing the
dmesg output between 5.14.16 and 5.15.7 didn't show any differences.


Additional info:
* 5.15.x
* I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
* My specs are:
** i7-10700k
** ASUS z490-A PRIME Motherboard
** 64 GB RAM
** Passthrough Card: NVIDIA 2070 Super
** Host is using the integrated Graphics chip

Steps to reproduce:
Boot any 5.15 kernel and start the VM and after some time (no specific trigger
as far as I can see) the VM freezes.

After some testing the solution seems to be:

I read about this:
20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635

And so I checked
cat /sys/module/kvm_intel/parameters/enable_apicv

which returns Y to me by default.

So I added
options kvm_intel enable_apicv=0
to /etc/modprobe.d/kvm.conf


cat /sys/module/kvm_intel/parameters/enable_apicv
now returns N

So far I haven't encountered any freezes.

The confusing part is that APICv shouldn't be available with my CPU

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 215459] New: VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
@ 2022-01-06 11:18 ` Maxim Levitsky
  2022-01-06 11:18 ` [Bug 215459] " bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2022-01-06 11:18 UTC (permalink / raw)
  To: bugzilla-daemon, kvm; +Cc: Sean Christopherson

On Thu, 2022-01-06 at 11:03 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
>             Bug ID: 215459
>            Summary: VM freezes starting with kernel 5.15
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.15.*
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: th3voic3@mailbox.org
>         Regression: No
> 
> Created attachment 300234
>   --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
> qemu.hook and libvirt xml
> 
> Hi,
> 
> starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10 VM.
> Downgrading to 5.14.16 fixes the issue.
> 
> I can't find any error messages in dmesg when this happens and comparing the
> dmesg output between 5.14.16 and 5.15.7 didn't show any differences.
> 
> 
> Additional info:
> * 5.15.x
> * I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
> * My specs are:
> ** i7-10700k
> ** ASUS z490-A PRIME Motherboard
> ** 64 GB RAM
> ** Passthrough Card: NVIDIA 2070 Super
> ** Host is using the integrated Graphics chip
> 
> Steps to reproduce:
> Boot any 5.15 kernel and start the VM and after some time (no specific trigger
> as far as I can see) the VM freezes.
> 
> After some testing the solution seems to be:
> 
> I read about this:
> 20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
> https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635
> 
> And so I checked
> cat /sys/module/kvm_intel/parameters/enable_apicv
> 
> which returns Y to me by default.
> 
> So I added
> options kvm_intel enable_apicv=0
> to /etc/modprobe.d/kvm.conf
> 
> 
> cat /sys/module/kvm_intel/parameters/enable_apicv
> now returns N
> 
> So far I haven't encountered any freezes.
> 
> The confusing part is that APICv shouldn't be available with my CPU

I guess you are lucky and your cpu has it? 
Does /sys/module/kvm_intel/parameters/enable_apicv show Y on 5.14.16 as well?

I know that there were few fixes in regard to posted interrupts on intel,
which might explain the problem.

You might want to try 5.16 kernel when it released.

Best regards,
	Maxim Levitsky

> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
  2022-01-06 11:18 ` Maxim Levitsky
@ 2022-01-06 11:18 ` bugzilla-daemon
  2022-01-06 13:12 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 11:18 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #1 from mlevitsk@redhat.com ---
On Thu, 2022-01-06 at 11:03 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
>             Bug ID: 215459
>            Summary: VM freezes starting with kernel 5.15
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.15.*
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: th3voic3@mailbox.org
>         Regression: No
> 
> Created attachment 300234
>   --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
> qemu.hook and libvirt xml
> 
> Hi,
> 
> starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10 VM.
> Downgrading to 5.14.16 fixes the issue.
> 
> I can't find any error messages in dmesg when this happens and comparing the
> dmesg output between 5.14.16 and 5.15.7 didn't show any differences.
> 
> 
> Additional info:
> * 5.15.x
> * I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
> * My specs are:
> ** i7-10700k
> ** ASUS z490-A PRIME Motherboard
> ** 64 GB RAM
> ** Passthrough Card: NVIDIA 2070 Super
> ** Host is using the integrated Graphics chip
> 
> Steps to reproduce:
> Boot any 5.15 kernel and start the VM and after some time (no specific
> trigger
> as far as I can see) the VM freezes.
> 
> After some testing the solution seems to be:
> 
> I read about this:
> 20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
>
> https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635
> 
> And so I checked
> cat /sys/module/kvm_intel/parameters/enable_apicv
> 
> which returns Y to me by default.
> 
> So I added
> options kvm_intel enable_apicv=0
> to /etc/modprobe.d/kvm.conf
> 
> 
> cat /sys/module/kvm_intel/parameters/enable_apicv
> now returns N
> 
> So far I haven't encountered any freezes.
> 
> The confusing part is that APICv shouldn't be available with my CPU

I guess you are lucky and your cpu has it? 
Does /sys/module/kvm_intel/parameters/enable_apicv show Y on 5.14.16 as well?

I know that there were few fixes in regard to posted interrupts on intel,
which might explain the problem.

You might want to try 5.16 kernel when it released.

Best regards,
        Maxim Levitsky

>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
  2022-01-06 11:18 ` Maxim Levitsky
  2022-01-06 11:18 ` [Bug 215459] " bugzilla-daemon
@ 2022-01-06 13:12 ` bugzilla-daemon
  2022-01-06 13:43   ` Maxim Levitsky
  2022-01-06 13:43 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 13:12 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #2 from th3voic3@mailbox.org ---
(In reply to mlevitsk from comment #1)
> On Thu, 2022-01-06 at 11:03 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215459
> > 
> >             Bug ID: 215459
> >            Summary: VM freezes starting with kernel 5.15
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 5.15.*
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: kvm
> >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> >           Reporter: th3voic3@mailbox.org
> >         Regression: No
> > 
> > Created attachment 300234 [details]
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
> > qemu.hook and libvirt xml
> > 
> > Hi,
> > 
> > starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10
> VM.
> > Downgrading to 5.14.16 fixes the issue.
> > 
> > I can't find any error messages in dmesg when this happens and comparing
> the
> > dmesg output between 5.14.16 and 5.15.7 didn't show any differences.
> > 
> > 
> > Additional info:
> > * 5.15.x
> > * I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
> > * My specs are:
> > ** i7-10700k
> > ** ASUS z490-A PRIME Motherboard
> > ** 64 GB RAM
> > ** Passthrough Card: NVIDIA 2070 Super
> > ** Host is using the integrated Graphics chip
> > 
> > Steps to reproduce:
> > Boot any 5.15 kernel and start the VM and after some time (no specific
> > trigger
> > as far as I can see) the VM freezes.
> > 
> > After some testing the solution seems to be:
> > 
> > I read about this:
> > 20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
> >
> >
> https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635
> > 
> > And so I checked
> > cat /sys/module/kvm_intel/parameters/enable_apicv
> > 
> > which returns Y to me by default.
> > 
> > So I added
> > options kvm_intel enable_apicv=0
> > to /etc/modprobe.d/kvm.conf
> > 
> > 
> > cat /sys/module/kvm_intel/parameters/enable_apicv
> > now returns N
> > 
> > So far I haven't encountered any freezes.
> > 
> > The confusing part is that APICv shouldn't be available with my CPU
> 
> I guess you are lucky and your cpu has it? 
> Does /sys/module/kvm_intel/parameters/enable_apicv show Y on 5.14.16 as well?
Yep just checked again.

> 
> I know that there were few fixes in regard to posted interrupts on intel,
> which might explain the problem.
I tried checking with
for i in $(find /sys/class/iommu/dmar* -type l); do echo -n "$i: "; echo $(( (
0x$(cat $i/intel-iommu/cap) >> 59 ) & 1 )); done
cat: /intel-iommu/cap: No such file or directory
/sys/class/iommu/dmar0: 0
/sys/class/iommu/dmar1: 0


So posted interrupts don't work on my system anyways?


> 
> You might want to try 5.16 kernel when it released.
I will definitely check again thanks.

Assuming I really do have APICv: is there anything I need to change in my XML
to really make use of this feature or does it work "out of the box"?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 13:12 ` bugzilla-daemon
@ 2022-01-06 13:43   ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2022-01-06 13:43 UTC (permalink / raw)
  To: bugzilla-daemon, kvm; +Cc: Sean Christopherson

On Thu, 2022-01-06 at 13:12 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> --- Comment #2 from th3voic3@mailbox.org ---
> (In reply to mlevitsk from comment #1)
> > On Thu, 2022-01-06 at 11:03 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=215459
> > > 
> > >             Bug ID: 215459
> > >            Summary: VM freezes starting with kernel 5.15
> > >            Product: Virtualization
> > >            Version: unspecified
> > >     Kernel Version: 5.15.*
> > >           Hardware: Intel
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: kvm
> > >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> > >           Reporter: th3voic3@mailbox.org
> > >         Regression: No
> > > 
> > > Created attachment 300234 [details]
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
> > > qemu.hook and libvirt xml
> > > 
> > > Hi,
> > > 
> > > starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10
> > VM.
> > > Downgrading to 5.14.16 fixes the issue.
> > > 
> > > I can't find any error messages in dmesg when this happens and comparing
> > the
> > > dmesg output between 5.14.16 and 5.15.7 didn't show any differences.
> > > 
> > > 
> > > Additional info:
> > > * 5.15.x
> > > * I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
> > > * My specs are:
> > > ** i7-10700k
> > > ** ASUS z490-A PRIME Motherboard
> > > ** 64 GB RAM
> > > ** Passthrough Card: NVIDIA 2070 Super
> > > ** Host is using the integrated Graphics chip
> > > 
> > > Steps to reproduce:
> > > Boot any 5.15 kernel and start the VM and after some time (no specific
> > > trigger
> > > as far as I can see) the VM freezes.
> > > 
> > > After some testing the solution seems to be:
> > > 
> > > I read about this:
> > > 20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
> > > 
> > > 
> > https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635
> > > And so I checked
> > > cat /sys/module/kvm_intel/parameters/enable_apicv
> > > 
> > > which returns Y to me by default.
> > > 
> > > So I added
> > > options kvm_intel enable_apicv=0
> > > to /etc/modprobe.d/kvm.conf
> > > 
> > > 
> > > cat /sys/module/kvm_intel/parameters/enable_apicv
> > > now returns N
> > > 
> > > So far I haven't encountered any freezes.
> > > 
> > > The confusing part is that APICv shouldn't be available with my CPU
> > 
> > I guess you are lucky and your cpu has it? 
> > Does /sys/module/kvm_intel/parameters/enable_apicv show Y on 5.14.16 as well?
> Yep just checked again.
> 
> > I know that there were few fixes in regard to posted interrupts on intel,
> > which might explain the problem.
> I tried checking with
> for i in $(find /sys/class/iommu/dmar* -type l); do echo -n "$i: "; echo $(( (
> 0x$(cat $i/intel-iommu/cap) >> 59 ) & 1 )); done
> cat: /intel-iommu/cap: No such file or directory
> /sys/class/iommu/dmar0: 0
> /sys/class/iommu/dmar1: 0
> 
> 
> So posted interrupts don't work on my system anyways?

Yes and no.

APICv consists of 4 parts:

1. virtualization of host->guest interrupts.

That allows KVM to deliver interrupts to a vCPU without VMexit,
interrupts that can be sent from say main qemu thread or from iothread,
or from in-kernel timers, etc.
As long as the sender runs on a different core, you get VMexit less interrupt.
If enable_apicv is true, your cpu ought to have this.

2. virtualization of apic registers.
That allows to avoid VM exits on some guest apic acceses like writing EOI
register, and such. Very primitive support for this exits even without APIVc,
called FlexPriority/TPR virtualization.

3. virtualization of IPIs (inter process interrupts)
Intel currenlty only supports self-ipi, where a vCPU sends an interrupt to itself,
and it seems that finally they on track to support ful IPI virtualization.

4. Delivery of interrupts from passed-through devices to VM through virtual apic.
That feature apparently you don't have enabled. Its optional but still very nice to
have.

So you are lucky to have quite working APICv, and once it works for you
it should help with latency overall.

I think that this might be related to
"KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"


There was a thread on the mailing list about exact same issue you are facing
APICv enabled but have a pass through device which doesn't use posted interrupts
Can't seem to find the thread now.

This patch actualy fixed the issue, but I haven't fully followed on what
commit introduced the issue.

I also wonder if the same issue can happen on AVIC.

Best reards,
	Maxim Levitsky


> 
> 
> > You might want to try 5.16 kernel when it released.
> I will definitely check again thanks.
> 
> Assuming I really do have APICv: is there anything I need to change in my XML
> to really make use of this feature or does it work "out of the box"?
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-01-06 13:12 ` bugzilla-daemon
@ 2022-01-06 13:43 ` bugzilla-daemon
  2022-01-06 18:52 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 13:43 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #3 from mlevitsk@redhat.com ---
On Thu, 2022-01-06 at 13:12 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> --- Comment #2 from th3voic3@mailbox.org ---
> (In reply to mlevitsk from comment #1)
> > On Thu, 2022-01-06 at 11:03 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=215459
> > > 
> > >             Bug ID: 215459
> > >            Summary: VM freezes starting with kernel 5.15
> > >            Product: Virtualization
> > >            Version: unspecified
> > >     Kernel Version: 5.15.*
> > >           Hardware: Intel
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: kvm
> > >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> > >           Reporter: th3voic3@mailbox.org
> > >         Regression: No
> > > 
> > > Created attachment 300234 [details]
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=300234&action=edit
> > > qemu.hook and libvirt xml
> > > 
> > > Hi,
> > > 
> > > starting with kernel 5.15 I'm experiencing freezes in my VFIO Windows 10
> > VM.
> > > Downgrading to 5.14.16 fixes the issue.
> > > 
> > > I can't find any error messages in dmesg when this happens and comparing
> > the
> > > dmesg output between 5.14.16 and 5.15.7 didn't show any differences.
> > > 
> > > 
> > > Additional info:
> > > * 5.15.x
> > > * I'm attaching my libvirt config and my /etc/libvirt/hooks/qemu
> > > * My specs are:
> > > ** i7-10700k
> > > ** ASUS z490-A PRIME Motherboard
> > > ** 64 GB RAM
> > > ** Passthrough Card: NVIDIA 2070 Super
> > > ** Host is using the integrated Graphics chip
> > > 
> > > Steps to reproduce:
> > > Boot any 5.15 kernel and start the VM and after some time (no specific
> > > trigger
> > > as far as I can see) the VM freezes.
> > > 
> > > After some testing the solution seems to be:
> > > 
> > > I read about this:
> > > 20210713142023.106183-9-mlevitsk@redhat.com/#24319635">
> > > 
> > > 
> >
> https://patchwork.kernel.org/project/kvm/patch/20210713142023.106183-9-mlevitsk@redhat.com/#24319635
> > > And so I checked
> > > cat /sys/module/kvm_intel/parameters/enable_apicv
> > > 
> > > which returns Y to me by default.
> > > 
> > > So I added
> > > options kvm_intel enable_apicv=0
> > > to /etc/modprobe.d/kvm.conf
> > > 
> > > 
> > > cat /sys/module/kvm_intel/parameters/enable_apicv
> > > now returns N
> > > 
> > > So far I haven't encountered any freezes.
> > > 
> > > The confusing part is that APICv shouldn't be available with my CPU
> > 
> > I guess you are lucky and your cpu has it? 
> > Does /sys/module/kvm_intel/parameters/enable_apicv show Y on 5.14.16 as
> well?
> Yep just checked again.
> 
> > I know that there were few fixes in regard to posted interrupts on intel,
> > which might explain the problem.
> I tried checking with
> for i in $(find /sys/class/iommu/dmar* -type l); do echo -n "$i: "; echo $((
> (
> 0x$(cat $i/intel-iommu/cap) >> 59 ) & 1 )); done
> cat: /intel-iommu/cap: No such file or directory
> /sys/class/iommu/dmar0: 0
> /sys/class/iommu/dmar1: 0
> 
> 
> So posted interrupts don't work on my system anyways?

Yes and no.

APICv consists of 4 parts:

1. virtualization of host->guest interrupts.

That allows KVM to deliver interrupts to a vCPU without VMexit,
interrupts that can be sent from say main qemu thread or from iothread,
or from in-kernel timers, etc.
As long as the sender runs on a different core, you get VMexit less interrupt.
If enable_apicv is true, your cpu ought to have this.

2. virtualization of apic registers.
That allows to avoid VM exits on some guest apic acceses like writing EOI
register, and such. Very primitive support for this exits even without APIVc,
called FlexPriority/TPR virtualization.

3. virtualization of IPIs (inter process interrupts)
Intel currenlty only supports self-ipi, where a vCPU sends an interrupt to
itself,
and it seems that finally they on track to support ful IPI virtualization.

4. Delivery of interrupts from passed-through devices to VM through virtual
apic.
That feature apparently you don't have enabled. Its optional but still very
nice to
have.

So you are lucky to have quite working APICv, and once it works for you
it should help with latency overall.

I think that this might be related to
"KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"


There was a thread on the mailing list about exact same issue you are facing
APICv enabled but have a pass through device which doesn't use posted
interrupts
Can't seem to find the thread now.

This patch actualy fixed the issue, but I haven't fully followed on what
commit introduced the issue.

I also wonder if the same issue can happen on AVIC.

Best reards,
        Maxim Levitsky


> 
> 
> > You might want to try 5.16 kernel when it released.
> I will definitely check again thanks.
> 
> Assuming I really do have APICv: is there anything I need to change in my XML
> to really make use of this feature or does it work "out of the box"?
>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-01-06 13:43 ` bugzilla-daemon
@ 2022-01-06 18:52 ` bugzilla-daemon
  2022-01-06 20:42   ` Maxim Levitsky
  2022-01-06 20:42 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 18:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

Sean Christopherson (seanjc@google.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |seanjc@google.com

--- Comment #4 from Sean Christopherson (seanjc@google.com) ---
The fix Maxim is referring to is commit fdba608f15e2 ("KVM: VMX: Wake vCPU when
delivering posted IRQ even if vCPU == this vCPU").  But the buggy commit was
introduced back in v5.8, so it's unlikely that's the issue, or at least that
it's the only issue.  And assuming the VM in question has multiple vCPUs (which
I'm pretty sure is true based on the config), that bug is unlikely to cause the
entire VM to freeze; the expected symptom is that a vCPU isn't awakened when it
should be, and while it's possible multiple vCPUs could get unlucky, taking
down the entire VM is highly improbable.  That said, it's worth trying that
fix, I'm just not very optimistic :-)

Assuming this is something different, the biggest relevant changes in v5.15 are
that the TDP MMU is enabled by default, and that the APIC access page memslot
is not deleted when APICv is inhibited.

Can you try disabling the TDP MMU with APICv still enabled?  KVM allows that to
be toggled without unloading, e.g. "echo N | sudo tee
/sys/module/kvm/parameters/tdp_mmu", the VM just needs to be started after the
param is toggled.

Running v5.16 (or v5.16-rc8, as there are no KVM changes expected between rc8
ad the final release) would also be very helpful.  If we get lucky and the
issue is resolved in v5.16, then it would be nice to "reverse" bisect to
understand exactly what fixed the problem.

> Assuming I really do have APICv: is there anything I need to change in my XML
> to really make use of this feature or does it work "out of the box"?

APICv works out of the box, though lack of IOMMU support does mean that your
system can't post interrupts from devices, which is usually the biggest
performance benefit to APICv on Intel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 18:52 ` bugzilla-daemon
@ 2022-01-06 20:42   ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2022-01-06 20:42 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On Thu, 2022-01-06 at 18:52 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> Sean Christopherson (seanjc@google.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |seanjc@google.com
> 
> --- Comment #4 from Sean Christopherson (seanjc@google.com) ---
> The fix Maxim is referring to is commit fdba608f15e2 ("KVM: VMX: Wake vCPU when
> delivering posted IRQ even if vCPU == this vCPU").  But the buggy commit was
> introduced back in v5.8, so it's unlikely that's the issue, or at least that
> it's the only issue.  And assuming the VM in question has multiple vCPUs (which
> I'm pretty sure is true based on the config), that bug is unlikely to cause the
> entire VM to freeze; the expected symptom is that a vCPU isn't awakened when it
> should be, and while it's possible multiple vCPUs could get unlucky, taking
> down the entire VM is highly improbable.  That said, it's worth trying that
> fix, I'm just not very optimistic :-)

Actually in my experience in both Linux and Windows, a stuck vCPU derails the whole VM.
That is how I found about the AVIC errata - only one vCPU got stuck and the whole VM froze,
and it was a a windows VM.

On Linux also these days things like RCU and such make everything freeze very fast.

> 
> Assuming this is something different, the biggest relevant changes in v5.15 are
> that the TDP MMU is enabled by default, and that the APIC access page memslot
> is not deleted when APICv is inhibited.

> 
> Can you try disabling the TDP MMU with APICv still enabled?  KVM allows that to
> be toggled without unloading, e.g. "echo N | sudo tee
> /sys/module/kvm/parameters/tdp_mmu", the VM just needs to be started after the
> param is toggled.

This is a very good idea. I keep on forgetting that TDP mmu is now the default.

> 
> Running v5.16 (or v5.16-rc8, as there are no KVM changes expected between rc8
> ad the final release) would also be very helpful.  If we get lucky and the
> issue is resolved in v5.16, then it would be nice to "reverse" bisect to
> understand exactly what fixed the problem.

Or just bisect it if not fixed. It would be very helpful!

> 
> > Assuming I really do have APICv: is there anything I need to change in my XML
> > to really make use of this feature or does it work "out of the box"?
> 
> APICv works out of the box, though lack of IOMMU support does mean that your
> system can't post interrupts from devices, which is usually the biggest
> performance benefit to APICv on Intel.

I haven't measured it formally, but with posted timer interrupts on AMD,
this does quite reduce the number of VM exits, even without any pass-through
devices.

For passthrough devices, also note that without IOMMU support, still,
while the device does send a regular interrupt to the host, then host
handler uses APICv to deliver it to the guest, so assuming that interrupt
is not pinned on one of vCPUs, the VM still doesn't get a VM exit.

I once benchmarked a pass-through nvme device on old Xeon which didn't had
IOMMU posted interrupts, and APICv still made quite a difference.

I so wish Intel would not disable this feature on consumer systems.
But then AVIC has bugs.. Oh well.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-01-06 18:52 ` bugzilla-daemon
@ 2022-01-06 20:42 ` bugzilla-daemon
  2022-01-06 21:12 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 20:42 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #5 from mlevitsk@redhat.com ---
On Thu, 2022-01-06 at 18:52 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> Sean Christopherson (seanjc@google.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |seanjc@google.com
> 
> --- Comment #4 from Sean Christopherson (seanjc@google.com) ---
> The fix Maxim is referring to is commit fdba608f15e2 ("KVM: VMX: Wake vCPU
> when
> delivering posted IRQ even if vCPU == this vCPU").  But the buggy commit was
> introduced back in v5.8, so it's unlikely that's the issue, or at least that
> it's the only issue.  And assuming the VM in question has multiple vCPUs
> (which
> I'm pretty sure is true based on the config), that bug is unlikely to cause
> the
> entire VM to freeze; the expected symptom is that a vCPU isn't awakened when
> it
> should be, and while it's possible multiple vCPUs could get unlucky, taking
> down the entire VM is highly improbable.  That said, it's worth trying that
> fix, I'm just not very optimistic :-)

Actually in my experience in both Linux and Windows, a stuck vCPU derails the
whole VM.
That is how I found about the AVIC errata - only one vCPU got stuck and the
whole VM froze,
and it was a a windows VM.

On Linux also these days things like RCU and such make everything freeze very
fast.

> 
> Assuming this is something different, the biggest relevant changes in v5.15
> are
> that the TDP MMU is enabled by default, and that the APIC access page memslot
> is not deleted when APICv is inhibited.

> 
> Can you try disabling the TDP MMU with APICv still enabled?  KVM allows that
> to
> be toggled without unloading, e.g. "echo N | sudo tee
> /sys/module/kvm/parameters/tdp_mmu", the VM just needs to be started after
> the
> param is toggled.

This is a very good idea. I keep on forgetting that TDP mmu is now the default.

> 
> Running v5.16 (or v5.16-rc8, as there are no KVM changes expected between rc8
> ad the final release) would also be very helpful.  If we get lucky and the
> issue is resolved in v5.16, then it would be nice to "reverse" bisect to
> understand exactly what fixed the problem.

Or just bisect it if not fixed. It would be very helpful!

> 
> > Assuming I really do have APICv: is there anything I need to change in my
> XML
> > to really make use of this feature or does it work "out of the box"?
> 
> APICv works out of the box, though lack of IOMMU support does mean that your
> system can't post interrupts from devices, which is usually the biggest
> performance benefit to APICv on Intel.

I haven't measured it formally, but with posted timer interrupts on AMD,
this does quite reduce the number of VM exits, even without any pass-through
devices.

For passthrough devices, also note that without IOMMU support, still,
while the device does send a regular interrupt to the host, then host
handler uses APICv to deliver it to the guest, so assuming that interrupt
is not pinned on one of vCPUs, the VM still doesn't get a VM exit.

I once benchmarked a pass-through nvme device on old Xeon which didn't had
IOMMU posted interrupts, and APICv still made quite a difference.

I so wish Intel would not disable this feature on consumer systems.
But then AVIC has bugs.. Oh well.

Best regards,
        Maxim Levitsky

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-01-06 20:42 ` bugzilla-daemon
@ 2022-01-06 21:12 ` bugzilla-daemon
  2022-01-07  8:52 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-06 21:12 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #6 from th3voic3@mailbox.org ---
(In reply to Sean Christopherson from comment #4)
> The fix Maxim is referring to is commit fdba608f15e2 ("KVM: VMX: Wake vCPU
> when delivering posted IRQ even if vCPU == this vCPU").  But the buggy
> commit was introduced back in v5.8, so it's unlikely that's the issue, or at
> least that it's the only issue.  And assuming the VM in question has
> multiple vCPUs (which I'm pretty sure is true based on the config), that bug
> is unlikely to cause the entire VM to freeze; the expected symptom is that a
> vCPU isn't awakened when it should be, and while it's possible multiple
> vCPUs could get unlucky, taking down the entire VM is highly improbable. 
> That said, it's worth trying that fix, I'm just not very optimistic :-)
> 
> Assuming this is something different, the biggest relevant changes in v5.15
> are that the TDP MMU is enabled by default, and that the APIC access page
> memslot is not deleted when APICv is inhibited.
> 
> Can you try disabling the TDP MMU with APICv still enabled?  KVM allows that
> to be toggled without unloading, e.g. "echo N | sudo tee
> /sys/module/kvm/parameters/tdp_mmu", the VM just needs to be started after
> the param is toggled.

I enabled APICv again and toggled the setting and did a quick test. I tested a
couple of things that often caused freezes. So far so good. Now I've added the
toggle to my qemu hooks prepare section and will do further testing. 

Thanks for the input so far

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-01-06 21:12 ` bugzilla-daemon
@ 2022-01-07  8:52 ` bugzilla-daemon
  2022-01-07 10:08 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-07  8:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #7 from th3voic3@mailbox.org ---
Tested again today and now when I disable tdp_mmu the VM takes a very long time
to start and it seems the startup never really finishes.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-01-07  8:52 ` bugzilla-daemon
@ 2022-01-07 10:08 ` bugzilla-daemon
  2022-01-10  9:30 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-07 10:08 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #8 from th3voic3@mailbox.org ---
Tested again today and now when I disable tdp_mmu the VM takes a very long time
to start and it seems the startup never really finishes.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-01-07 10:08 ` bugzilla-daemon
@ 2022-01-10  9:30 ` bugzilla-daemon
  2022-01-10 22:29   ` Maxim Levitsky
  2022-01-10 22:29 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-10  9:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #9 from th3voic3@mailbox.org ---
I've compiled the 5.16 kernel now and so far it's looking very good. APICv and
tdp_mmu are both enabled. Also thanks to dynamic PREEMPT I no longer need to
recompile to enable voluntary preemption to cut down my VMs boot time.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-10  9:30 ` bugzilla-daemon
@ 2022-01-10 22:29   ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2022-01-10 22:29 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On Mon, 2022-01-10 at 09:30 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> --- Comment #9 from th3voic3@mailbox.org ---
> I've compiled the 5.16 kernel now and so far it's looking very good. APICv and
> tdp_mmu are both enabled. Also thanks to dynamic PREEMPT I no longer need to
> recompile to enable voluntary preemption to cut down my VMs boot time.
> 
Great to hear that.
 
I am just curious, with what PREEMPT setting, the boot is slow?
With full preemption? I also noticed that long ago, before I joined redhat,
back when I was just a VFIO fan, that booting with large amounts of ram
(32 back then I think), forced preemption and passed-through GPU makes
The VM hang for about like 1/2 of a minute before it shows the bios splash screen.
 
Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (9 preceding siblings ...)
  2022-01-10  9:30 ` bugzilla-daemon
@ 2022-01-10 22:29 ` bugzilla-daemon
  2022-01-11  8:29 ` bugzilla-daemon
  2023-01-27 13:11 ` bugzilla-daemon
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-10 22:29 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #10 from mlevitsk@redhat.com ---
On Mon, 2022-01-10 at 09:30 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> --- Comment #9 from th3voic3@mailbox.org ---
> I've compiled the 5.16 kernel now and so far it's looking very good. APICv
> and
> tdp_mmu are both enabled. Also thanks to dynamic PREEMPT I no longer need to
> recompile to enable voluntary preemption to cut down my VMs boot time.
> 
Great to hear that.

I am just curious, with what PREEMPT setting, the boot is slow?
With full preemption? I also noticed that long ago, before I joined redhat,
back when I was just a VFIO fan, that booting with large amounts of ram
(32 back then I think), forced preemption and passed-through GPU makes
The VM hang for about like 1/2 of a minute before it shows the bios splash
screen.

Best regards,
        Maxim Levitsky

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (10 preceding siblings ...)
  2022-01-10 22:29 ` bugzilla-daemon
@ 2022-01-11  8:29 ` bugzilla-daemon
  2023-01-27 13:11 ` bugzilla-daemon
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2022-01-11  8:29 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

--- Comment #11 from th3voic3@mailbox.org ---
(In reply to mlevitsk from comment #10)
> On Mon, 2022-01-10 at 09:30 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215459
> > 
> > --- Comment #9 from th3voic3@mailbox.org ---
> > I've compiled the 5.16 kernel now and so far it's looking very good. APICv
> > and
> > tdp_mmu are both enabled. Also thanks to dynamic PREEMPT I no longer need
> to
> > recompile to enable voluntary preemption to cut down my VMs boot time.
> > 
> Great to hear that.
>  
> I am just curious, with what PREEMPT setting, the boot is slow?
> With full preemption? I also noticed that long ago, before I joined redhat,
> back when I was just a VFIO fan, that booting with large amounts of ram
> (32 back then I think), forced preemption and passed-through GPU makes
> The VM hang for about like 1/2 of a minute before it shows the bios splash
> screen.
>  
> Best regards,
>       Maxim Levitsky

Yeah it used to be like that.
I just compared the two though (full vs voluntary PREEMPT) and it doesn't seem
to make a difference anymore. Full preemption is now just as fast as voluntary
preemption, so I'm just sticking with full preemption since it's my distros
default.

Best regards

Nico

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug 215459] VM freezes starting with kernel 5.15
  2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
                   ` (11 preceding siblings ...)
  2022-01-11  8:29 ` bugzilla-daemon
@ 2023-01-27 13:11 ` bugzilla-daemon
  12 siblings, 0 replies; 17+ messages in thread
From: bugzilla-daemon @ 2023-01-27 13:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=215459

Roland Kletzing (devzero@web.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |devzero@web.de

--- Comment #12 from Roland Kletzing (devzero@web.de) ---
also see https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-27 13:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 11:03 [Bug 215459] New: VM freezes starting with kernel 5.15 bugzilla-daemon
2022-01-06 11:18 ` Maxim Levitsky
2022-01-06 11:18 ` [Bug 215459] " bugzilla-daemon
2022-01-06 13:12 ` bugzilla-daemon
2022-01-06 13:43   ` Maxim Levitsky
2022-01-06 13:43 ` bugzilla-daemon
2022-01-06 18:52 ` bugzilla-daemon
2022-01-06 20:42   ` Maxim Levitsky
2022-01-06 20:42 ` bugzilla-daemon
2022-01-06 21:12 ` bugzilla-daemon
2022-01-07  8:52 ` bugzilla-daemon
2022-01-07 10:08 ` bugzilla-daemon
2022-01-10  9:30 ` bugzilla-daemon
2022-01-10 22:29   ` Maxim Levitsky
2022-01-10 22:29 ` bugzilla-daemon
2022-01-11  8:29 ` bugzilla-daemon
2023-01-27 13:11 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.