[v4,13/17] kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
diff mbox series

Message ID 1572648072-84536-14-git-send-email-suravee.suthikulpanit@amd.com
State Superseded
Headers show
Series
  • kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode
Related show

Commit Message

Suthikulpanit, Suravee Nov. 1, 2019, 10:41 p.m. UTC
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/i8254.c            | 10 ++++++++++
 arch/x86/kvm/svm.c              |  8 +++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Nov. 2, 2019, 9:57 a.m. UTC | #1
On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> +	/*
> +	 * AMD SVM AVIC accelerates EOI write and does not trap.
> +	 * This cause in-kernel PIT re-inject mode to fail
> +	 * since it checks ps->irq_ack before kvm_set_irq()
> +	 * and relies on the ack notifier to timely queue
> +	 * the pt->worker work iterm and reinject the missed tick.
> +	 * So, deactivate APICv when PIT is in reinject mode.
> +	 */
>  	if (reinject) {
> +		kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
>  		/* The initial state is preserved while ps->reinject == 0. */
>  		kvm_pit_reset_reinject(pit);
>  		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>  		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>  	} else {
> +		kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
>  		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>  		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);

This is not too nice for Intel which does support (through the EOI exit
mask) APICv even if PIT reinjection active.

We can work around it by adding a global mask of inhibit reasons that
apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.

Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
care about.

Paolo
Suthikulpanit, Suravee Nov. 4, 2019, 6:54 p.m. UTC | #2
Paolo,

Thanks for quick response.

On 11/2/19 4:57 AM, Paolo Bonzini wrote:
> On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
>> +	/*
>> +	 * AMD SVM AVIC accelerates EOI write and does not trap.
>> +	 * This cause in-kernel PIT re-inject mode to fail
>> +	 * since it checks ps->irq_ack before kvm_set_irq()
>> +	 * and relies on the ack notifier to timely queue
>> +	 * the pt->worker work iterm and reinject the missed tick.
>> +	 * So, deactivate APICv when PIT is in reinject mode.
>> +	 */
>>   	if (reinject) {
>> +		kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
>>   		/* The initial state is preserved while ps->reinject == 0. */
>>   		kvm_pit_reset_reinject(pit);
>>   		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>>   		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>>   	} else {
>> +		kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
>>   		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>>   		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> 
> This is not too nice for Intel which does support (through the EOI exit
> mask) APICv even if PIT reinjection active.

I see you point.

> We can work around it by adding a global mask of inhibit reasons that
> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
> 
> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
> care about.
> 
> Paolo
> 

What about we enhance the pre_update_apivc_exec_ctrl() to also return 
success/fail. In here, the vendor specific code can decide to update 
APICv state or not.

Thanks,
Suravee
Paolo Bonzini Nov. 4, 2019, 9:49 p.m. UTC | #3
On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
> I see you point.
> 
>> We can work around it by adding a global mask of inhibit reasons that
>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
>>
>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
>> care about.
>
> What about we enhance the pre_update_apivc_exec_ctrl() to also return 
> success/fail. In here, the vendor specific code can decide to update 
> APICv state or not.

That works for me, too.  Something like return false for deactivate and
true for activate.

Paolo
Roman Kagan Nov. 4, 2019, 11:17 p.m. UTC | #4
On Sat, Nov 02, 2019 at 10:57:47AM +0100, Paolo Bonzini wrote:
> On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> > +	/*
> > +	 * AMD SVM AVIC accelerates EOI write and does not trap.
> > +	 * This cause in-kernel PIT re-inject mode to fail
> > +	 * since it checks ps->irq_ack before kvm_set_irq()
> > +	 * and relies on the ack notifier to timely queue
> > +	 * the pt->worker work iterm and reinject the missed tick.
> > +	 * So, deactivate APICv when PIT is in reinject mode.
> > +	 */
> >  	if (reinject) {
> > +		kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
> >  		/* The initial state is preserved while ps->reinject == 0. */
> >  		kvm_pit_reset_reinject(pit);
> >  		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> >  		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> >  	} else {
> > +		kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
> >  		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> >  		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> 
> This is not too nice for Intel which does support (through the EOI exit
> mask) APICv even if PIT reinjection active.

Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
given a non-empty eoi_exit_bitmap, and enable it back on a clear
eoi_exit_bitmap.  This may remove the need to add special treatment to
PIT etc.

Thanks,
Roman.
Alexander Graf Nov. 5, 2019, 7:05 a.m. UTC | #5
> Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
> On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
>> I see you point.
>> 
>>> We can work around it by adding a global mask of inhibit reasons that
>>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
>>> 
>>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
>>> care about.
>> 
>> What about we enhance the pre_update_apivc_exec_ctrl() to also return 
>> success/fail. In here, the vendor specific code can decide to update 
>> APICv state or not.
> 
> That works for me, too.  Something like return false for deactivate and
> true for activate.

I'm trying to wrap my head around how that will work with live migration. Do we also need to save/restore the deactivate reasons?

Alex

> 
> Paolo



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Roman Kagan Nov. 5, 2019, 8:40 a.m. UTC | #6
On Tue, Nov 05, 2019 at 07:05:57AM +0000, Graf (AWS), Alexander wrote:
> 
> 
> > Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> > 
> > On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
> >> I see you point.
> >> 
> >>> We can work around it by adding a global mask of inhibit reasons that
> >>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
> >>> 
> >>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
> >>> care about.
> >> 
> >> What about we enhance the pre_update_apivc_exec_ctrl() to also return 
> >> success/fail. In here, the vendor specific code can decide to update 
> >> APICv state or not.
> > 
> > That works for me, too.  Something like return false for deactivate and
> > true for activate.
> 
> I'm trying to wrap my head around how that will work with live
> migration. Do we also need to save/restore the deactivate reasons?

Nope, this is all invisible to userland.  The target will deduce the
deactivation reasons on its own from the user-visible setup like PIT
configuration, Hyper-V SynIC, etc.

Roman.
Paolo Bonzini Nov. 5, 2019, 10:47 p.m. UTC | #7
On 05/11/19 00:17, Roman Kagan wrote:
>> This is not too nice for Intel which does support (through the EOI exit
>> mask) APICv even if PIT reinjection active.
> Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
> given a non-empty eoi_exit_bitmap, and enable it back on a clear
> eoi_exit_bitmap.  This may remove the need to add special treatment to
> PIT etc.

That is a very nice idea---we can make that a single disable reason,
like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.

Paolo
Suthikulpanit, Suravee Nov. 11, 2019, 5:37 p.m. UTC | #8
Roman/Paolo

On 11/5/2019 4:47 PM, Paolo Bonzini wrote:
> On 05/11/19 00:17, Roman Kagan wrote:
>>> This is not too nice for Intel which does support (through the EOI exit
>>> mask) APICv even if PIT reinjection active.
>> Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
>> given a non-empty eoi_exit_bitmap, and enable it back on a clear
>> eoi_exit_bitmap.  This may remove the need to add special treatment to
>> PIT etc.
> 
> That is a very nice idea---we can make that a single disable reason,
> like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.

I took at look at the svm_load_eoi_exitmap() and it is called via:
     kvm_make_scan_ioapic_request() ->
         KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() ->
             KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap()

The kvm_make_scan_ioapic_request() is called from multiple places:

arch/x86/kvm/irq_comm.c:
     * kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing()

arch/x86/kvm/ioapic.c:
     * kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier
     * kvm_set_ioapic() : Setting ioapic irqchip
     * ioapic_mmio_write() -> ioapic_write_indirect()

arch/x86/kvm/lapic.c:
     * recalculate_apic_map()

Most calls would be from ioapic_mmio_write()->ioapic_write_indirect().

In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting
APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce
large overhead especially with SMP machine. So, for now, let's just disable APICv
when in-kernel PIT is in reinject (delay) mode.

I'll also add the logic to avoid unnecessary overhead for Intel.

Thanks,
Suravee
Roman Kagan Nov. 12, 2019, 12:22 p.m. UTC | #9
On Mon, Nov 11, 2019 at 11:37:53AM -0600, Suravee Suthikulpanit wrote:
> On 11/5/2019 4:47 PM, Paolo Bonzini wrote:
> > On 05/11/19 00:17, Roman Kagan wrote:
> > > > This is not too nice for Intel which does support (through the EOI exit
> > > > mask) APICv even if PIT reinjection active.
> > > Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
> > > given a non-empty eoi_exit_bitmap, and enable it back on a clear
> > > eoi_exit_bitmap.  This may remove the need to add special treatment to
> > > PIT etc.
> > 
> > That is a very nice idea---we can make that a single disable reason,
> > like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.
> 
> I took at look at the svm_load_eoi_exitmap() and it is called via:
>     kvm_make_scan_ioapic_request() ->
>         KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() ->
>             KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap()
> 
> The kvm_make_scan_ioapic_request() is called from multiple places:
> 
> arch/x86/kvm/irq_comm.c:
>     * kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing()
> 
> arch/x86/kvm/ioapic.c:
>     * kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier
>     * kvm_set_ioapic() : Setting ioapic irqchip
>     * ioapic_mmio_write() -> ioapic_write_indirect()
> 
> arch/x86/kvm/lapic.c:
>     * recalculate_apic_map()
> 
> Most calls would be from ioapic_mmio_write()->ioapic_write_indirect().
> 
> In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting
> APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce
> large overhead especially with SMP machine.

This doesn't look like a hot path, so I'm not sure it needs to be
optimized for performance.  Especially so since
kvm_make_scan_ioapic_request does kvm_make_all_cpus_request which isn't
particularly fast by definition, and I guess the extra overhead there
won't be noticable.

OTOH introducing extra code paths has its maintenance costs, so sticking
the simple logic in svm_load_eoi_exitmap looks attractive.

Just my 2c,
Roman.

Patch
diff mbox series

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fe61269..460f7a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,6 +858,7 @@  enum kvm_irqchip_mode {
 #define APICV_DEACT_BIT_HYPERV     1
 #define APICV_DEACT_BIT_NESTED     2
 #define APICV_DEACT_BIT_IRQWIN     3
+#define APICV_DEACT_BIT_PIT_REINJ  4
 
 struct kvm_arch {
 	unsigned long n_used_mmu_pages;
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4a6dc54..3f77fda 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -295,12 +295,22 @@  void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
 	if (atomic_read(&ps->reinject) == reinject)
 		return;
 
+	/*
+	 * AMD SVM AVIC accelerates EOI write and does not trap.
+	 * This cause in-kernel PIT re-inject mode to fail
+	 * since it checks ps->irq_ack before kvm_set_irq()
+	 * and relies on the ack notifier to timely queue
+	 * the pt->worker work iterm and reinject the missed tick.
+	 * So, deactivate APICv when PIT is in reinject mode.
+	 */
 	if (reinject) {
+		kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
 		/* The initial state is preserved while ps->reinject == 0. */
 		kvm_pit_reset_reinject(pit);
 		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
 		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 	} else {
+		kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
 		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
 		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 	}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e7ff04..9812feb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1679,7 +1679,13 @@  static int avic_update_access_page(struct kvm *kvm, bool activate)
 	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
-	if (kvm->arch.apic_access_page_done == activate)
+	/*
+	 * During kvm_destroy_vm(), kvm_pit_set_reinject() could trigger
+	 * APICv mode change, which update APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
+	 * memory region. So, we need to ensure that kvm->mm == current->mm.
+	 */
+	if ((kvm->arch.apic_access_page_done == activate) ||
+	    (kvm->mm != current->mm))
 		goto out;
 
 	ret = __x86_set_memory_region(kvm,