linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
@ 2023-08-07  6:26 Ake Koomsin
  2023-08-07 14:00 ` Maxim Levitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Ake Koomsin @ 2023-08-07  6:26 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H . Peter Anvin, Ake Koomsin

Current KVM does not expect L1 hypervisor to allow L2 guest to access
APIC page directly when APICv is enabled. When this happens, KVM
emulates the access itself resulting in interrupt lost.

As this kind of hypervisor is rare, it is simpler to inhibit APICv upon
detecting direct APIC access from L2 to avoid unexpected interrupt lost.

Signed-off-by: Ake Koomsin <ake@igel.co.jp>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/mmu/mmu.c          | 33 ++++++++++++++++++++++++++-------
 arch/x86/kvm/svm/svm.h          |  3 ++-
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3bc146dfd38d..8764b11922a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1188,6 +1188,12 @@ enum kvm_apicv_inhibit {
 	APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
 	APICV_INHIBIT_REASON_APIC_BASE_MODIFIED,
 
+	/*
+	 * APICv is disabled because L1 hypervisor allows L2 guest to access
+	 * APIC directly.
+	 */
+	APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS,
+
 	/******************************************************/
 	/* INHIBITs that are relevant only to the AMD's AVIC. */
 	/******************************************************/
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ec169f5c7dce..c1150ef9fce1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4293,6 +4293,30 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
 }
 
+static int __kvm_faultin_pfn_guest_mode(struct kvm_vcpu *vcpu,
+					struct kvm_page_fault *fault)
+{
+	struct kvm_memory_slot *slot = fault->slot;
+
+	/* Don't expose private memslots to L2. */
+	fault->slot = NULL;
+	fault->pfn = KVM_PFN_NOSLOT;
+	fault->map_writable = false;
+
+	/*
+	 * APICv does not work when L1 hypervisor allows L2 guest to access
+	 * APIC directly. As this kind of L1 hypervisor is rare, it is simpler
+	 * to inhibit APICv when we detect direct APIC access from L2, and
+	 * fallback to emulation path to avoid interrupt lost.
+	 */
+	if (unlikely(slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
+		     kvm_apicv_activated(vcpu->kvm)))
+		kvm_set_apicv_inhibit(vcpu->kvm,
+				      APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS);
+
+	return RET_PF_CONTINUE;
+}
+
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
@@ -4307,13 +4331,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return RET_PF_RETRY;
 
 	if (!kvm_is_visible_memslot(slot)) {
-		/* Don't expose private memslots to L2. */
-		if (is_guest_mode(vcpu)) {
-			fault->slot = NULL;
-			fault->pfn = KVM_PFN_NOSLOT;
-			fault->map_writable = false;
-			return RET_PF_CONTINUE;
-		}
+		if (is_guest_mode(vcpu))
+			return __kvm_faultin_pfn_guest_mode(vcpu, fault);
 		/*
 		 * If the APIC access page exists but is disabled, go directly
 		 * to emulation without caching the MMIO access or creating a
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 18af7e712a5a..8d77932ee0fb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -683,7 +683,8 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
 	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
 	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
 	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
-	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED)	\
+	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED) |	\
+	BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS)	\
 )
 
 bool avic_hardware_setup(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index df461f387e20..f652397c9765 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8189,7 +8189,8 @@ static void vmx_hardware_unsetup(void)
 	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
 	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
 	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
-	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED)	\
+	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
+	BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS)	\
 )
 
 static void vmx_vm_destroy(struct kvm *kvm)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-07  6:26 [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2 Ake Koomsin
@ 2023-08-07 14:00 ` Maxim Levitsky
  2023-08-07 18:04   ` Sean Christopherson
  2023-08-08  7:45   ` Ake Koomsin
  0 siblings, 2 replies; 7+ messages in thread
From: Maxim Levitsky @ 2023-08-07 14:00 UTC (permalink / raw)
  To: Ake Koomsin, kvm, linux-kernel
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H . Peter Anvin

У пн, 2023-08-07 у 15:26 +0900, Ake Koomsin пише:
> Current KVM does not expect L1 hypervisor to allow L2 guest to access
> APIC page directly when APICv is enabled. When this happens, KVM
> emulates the access itself resulting in interrupt lost.
> 
> As this kind of hypervisor is rare, it is simpler to inhibit APICv upon
> detecting direct APIC access from L2 to avoid unexpected interrupt lost.
> 
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/mmu/mmu.c          | 33 ++++++++++++++++++++++++++-------
>  arch/x86/kvm/svm/svm.h          |  3 ++-
>  arch/x86/kvm/vmx/vmx.c          |  3 ++-
>  4 files changed, 36 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3bc146dfd38d..8764b11922a0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1188,6 +1188,12 @@ enum kvm_apicv_inhibit {
>  	APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
>  	APICV_INHIBIT_REASON_APIC_BASE_MODIFIED,
>  
> +	/*
> +	 * APICv is disabled because L1 hypervisor allows L2 guest to access
> +	 * APIC directly.
> +	 */
> +	APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS,
> +
>  	/******************************************************/
>  	/* INHIBITs that are relevant only to the AMD's AVIC. */
>  	/******************************************************/
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index ec169f5c7dce..c1150ef9fce1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4293,6 +4293,30 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
>  }
>  
> +static int __kvm_faultin_pfn_guest_mode(struct kvm_vcpu *vcpu,
> +					struct kvm_page_fault *fault)
> +{
> +	struct kvm_memory_slot *slot = fault->slot;
> +
> +	/* Don't expose private memslots to L2. */
> +	fault->slot = NULL;
> +	fault->pfn = KVM_PFN_NOSLOT;
> +	fault->map_writable = false;
> +
> +	/*
> +	 * APICv does not work when L1 hypervisor allows L2 guest to access
> +	 * APIC directly. As this kind of L1 hypervisor is rare, it is simpler
> +	 * to inhibit APICv when we detect direct APIC access from L2, and
> +	 * fallback to emulation path to avoid interrupt lost.
> +	 */
> +	if (unlikely(slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
> +		     kvm_apicv_activated(vcpu->kvm)))
> +		kvm_set_apicv_inhibit(vcpu->kvm,
> +				      APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS);

Is there a good reason why KVM doesn't expose APIC memslot to a nested guest?
While nested guest runs, the L1's APICv is "inhibited" effectively anyway, so writes to this memslot
should update APIC registers and be picked up by APICv hardware when L1 resumes execution.

Since APICv alows itself to be inhibited due to other reasons, it means that just like AVIC, it should be able
to pick up arbitrary changes to APIC registers which happened while it was inhibited,
just like AVIC does.

I'll take a look at the code to see if APICv does this (I know AVIC's code much better that APICv's)

Is there a reproducer for this bug?

Best regards,
	Maxim Levitsky

> +
> +	return RET_PF_CONTINUE;
> +}
> +
>  static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	struct kvm_memory_slot *slot = fault->slot;
> @@ -4307,13 +4331,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  		return RET_PF_RETRY;
>  
>  	if (!kvm_is_visible_memslot(slot)) {
> -		/* Don't expose private memslots to L2. */
> -		if (is_guest_mode(vcpu)) {
> -			fault->slot = NULL;
> -			fault->pfn = KVM_PFN_NOSLOT;
> -			fault->map_writable = false;
> -			return RET_PF_CONTINUE;
> -		}
> +		if (is_guest_mode(vcpu))
> +			return __kvm_faultin_pfn_guest_mode(vcpu, fault);
>  		/*
>  		 * If the APIC access page exists but is disabled, go directly
>  		 * to emulation without caching the MMIO access or creating a
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 18af7e712a5a..8d77932ee0fb 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -683,7 +683,8 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
>  	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
>  	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
>  	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
> -	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED)	\
> +	BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED) |	\
> +	BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS)	\
>  )
>  
>  bool avic_hardware_setup(void);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index df461f387e20..f652397c9765 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8189,7 +8189,8 @@ static void vmx_hardware_unsetup(void)
>  	BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |		\
>  	BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |	\
>  	BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |	\
> -	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED)	\
> +	BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |	\
> +	BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS)	\
>  )
>  
>  static void vmx_vm_destroy(struct kvm *kvm)




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-07 14:00 ` Maxim Levitsky
@ 2023-08-07 18:04   ` Sean Christopherson
  2023-08-08  7:45   ` Ake Koomsin
  1 sibling, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-07 18:04 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Ake Koomsin, kvm, linux-kernel, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin

On Mon, Aug 07, 2023, Maxim Levitsky wrote:
> У пн, 2023-08-07 у 15:26 +0900, Ake Koomsin пише:
> > Current KVM does not expect L1 hypervisor to allow L2 guest to access
> > APIC page directly when APICv is enabled. When this happens, KVM
> > emulates the access itself resulting in interrupt lost.

Kinda stating the obvious, but as Maxim alluded to, emulating an APIC access while
APICv is active should not result in lost interrupts.  I.e. suppressing APICv is
likely masking a bug that isn't unique to this specific scenario.

> Is there a good reason why KVM doesn't expose APIC memslot to a nested guest?

AFAIK, simply because no one has ever requested that KVM support such a use case.

> While nested guest runs, the L1's APICv is "inhibited" effectively anyway, so
> writes to this memslot should update APIC registers and be picked up by APICv
> hardware when L1 resumes execution.
> 
> Since APICv alows itself to be inhibited due to other reasons, it means that
> just like AVIC, it should be able to pick up arbitrary changes to APIC
> registers which happened while it was inhibited, just like AVIC does.
> 
> I'll take a look at the code to see if APICv does this (I know AVIC's code
> much better that APICv's)
> 
> Is there a reproducer for this bug?

+1, this needs a reproducer, or at the very least a very detailed explanation
and analysis.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-07 14:00 ` Maxim Levitsky
  2023-08-07 18:04   ` Sean Christopherson
@ 2023-08-08  7:45   ` Ake Koomsin
  2023-08-08 23:48     ` Sean Christopherson
  1 sibling, 1 reply; 7+ messages in thread
From: Ake Koomsin @ 2023-08-08  7:45 UTC (permalink / raw)
  To: Maxim Levitsky, kvm, linux-kernel, Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H . Peter Anvin

On Mon, 07 Aug 2023 17:00:58 +0300
Maxim Levitsky <mlevitsk@redhat.com> wrote:
 
> Is there a good reason why KVM doesn't expose APIC memslot to a
> nested guest? While nested guest runs, the L1's APICv is "inhibited"
> effectively anyway, so writes to this memslot should update APIC
> registers and be picked up by APICv hardware when L1 resumes
> execution.
> 
> Since APICv alows itself to be inhibited due to other reasons, it
> means that just like AVIC, it should be able to pick up arbitrary
> changes to APIC registers which happened while it was inhibited, just
> like AVIC does.
> 
> I'll take a look at the code to see if APICv does this (I know AVIC's
> code much better that APICv's)
> 
> Is there a reproducer for this bug?
> 
> Best regards,
> 	Maxim Levitsky

From reading old commits (3a2936dedd20 and 1313cc2bd8f6), I interprete that
current KVM implementation does not expect direct APIC access from L2 guests.
I assume that there might be some challenging implementation issues.

To reproduce the problem, we need to run a micro hypervisor named BitVisor on
KVM. This hypervisor, when running on real machine, lets its guest access
physical APIC directly. As BitVisor intends to run on real machine, when running
under KVM, it conceals all KVM related features reported through CPUID. The L2
guest will initialize and run as if it runs on a physical machine. We also need
an Intel machine that support APICv. (I test on Intel 13th machine. The problem
should also be reproducible on Intel 12th machine). Current BitVisor's SVM
implementations always monitor MMIO access so we cannot reproduce the problem.

BitVisor VMX implementation under UEFI environment by default hooks the APIC
access during initialization. The purpose of this APIC access hook is to
bootstrap AP processors during UEFI ExitBootServices. When booting a guest OS,
the firmware sends INIT signal during ExitBootServices. BitVisor then bootstrap
AP processors, put them to guest mode, and unhook APIC access. After this,
the guest can now access APIC memory directly.

As far as I understand the KVM implemntation, when BitVisor still hooks APIC
access, EPT_VIOLATION occurs when L2 guest accesses APIC page. The EPT_VIOLATION
is then forwarded to BitVisor. BitVisor eventually accesses APIC on behalf of
the L2 guest. In this case, APICv works properly because the access is from L1.
After BitVisor unhooks the APIC page, the first access to APIC from the L2 guest
goes to EPT_VIOLATION handling path. This handling path marks the APIC page with
a reserved flag, and causes the access to retry eventually. Subsequent accesses
are then handled in EPT_MISCONFIG path, emulating the MMIO access. Interrupt
seems to disappear after this.

Here is the steps to reproduce the problem.

1) hg clone http://hg.code.sf.net/p/bitvisor/code bitvisor-code

2) Enter the cloned directory and type 'make' (No need to worry about warnings
   at the moment. The default configuration is good enough to reproduce the
   problem). We now have bitvisor.elf after the compilation.

3) Enter boot/uefi-boot, and type 'make' to compile the UEFI bootloader. We
   need mingw for this. We now have loadvmm.efi after the compilation.

4) Put bitvisor.elf and loadvmm.efi to together in a folder. The folder
   is going to look like the following:
   ~/x86_test
   ├── bitvisor.elf
   └── loadvmm.efi

5) Run the following qemu command. Replace UEFI firmware path and other
   parameters as you prefer. Make sure -smp 2 is there. Otherwise, there will be
   no INIT signal during UEFI ExitBootServices. (I use QEMU 8.0.3)

qemu-system-x86_64 -cpu host -enable-kvm -bios /usr/share/edk2-ovmf/OVMF_CODE.fd \
-drive file=fat:rw:~/x86_test/,format=raw \
-cdrom ~/Downloads/Fedora-Workstation-Live-x86_64-38-1.6.iso \
-M q35 -m 8192 -smp 2 -serial stdio

6) During the launch, enter the bios config by hitting esc key repeatedly.
   Then, select 'Boot Manager' and choose 'EFI Internel Shell' to enter the
   UEFI shell.

7) The directory we specify in the command should be at fs0. Type 'fs0:' in
   the shell.

8) Type 'loadvmm.efi' to load BitVisor. We should see the following message

Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
ooooooooooooooooooooooooooooooooooooooooooooooooooo
...
MCFG [0] 0000:00-FF (B0000000,10000000)
Starting a virtual machine.

9) We should now return to the shell. Right now we are running as a L2 guest.

10) Next is to boot Linux from the live cd or your prefered method. We can see
    the panic related to "panic - not syncing: IO-APIC + timer doesn't work!".
    The panic can be reproduced quite easy. Even though, it happens to pass to
    timer check, or you specify 'no_timer_check' boot parameter, it will stall
    during SMP bringup.

The idea from step 6 to step 10 is to start BitVisor first, and start Linux on
top of it. You can adjust the step as you like. Feel free to ask me anything
regarding reproducing the problem with BitVisor if the giving steps are not
sufficient.

The problem does not happen when enable_apicv=N. Note that SMP bringup with
enable_apicv=N can fail. This is another problem. We don't have to worry about
this for now. Linux seems to have no delay between INIT DEASSERT and SIPI during
its SMP bringup. This can easily makes INIT and SIPI pending together resultling
in signal lost.

I admit that my knowledge on KVM and APICv is very limited. I may misunderstand
the problem. If you don't mind, would it be possible for you to guide me which
code path should I pay attention to? I would love to learn to find out the
actual cause of the problem.


Best Regards
Ake Koomsin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-08  7:45   ` Ake Koomsin
@ 2023-08-08 23:48     ` Sean Christopherson
  2023-08-09  8:42       ` Ake Koomsin
  2023-08-25  3:58       ` Ake Koomsin
  0 siblings, 2 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-08 23:48 UTC (permalink / raw)
  To: Ake Koomsin
  Cc: Maxim Levitsky, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin

On Tue, Aug 08, 2023, Ake Koomsin wrote:
> On Mon, 07 Aug 2023 17:00:58 +0300
> Maxim Levitsky <mlevitsk@redhat.com> wrote:
>  
> > Is there a good reason why KVM doesn't expose APIC memslot to a
> > nested guest? While nested guest runs, the L1's APICv is "inhibited"
> > effectively anyway, so writes to this memslot should update APIC
> > registers and be picked up by APICv hardware when L1 resumes
> > execution.
> > 
> > Since APICv alows itself to be inhibited due to other reasons, it
> > means that just like AVIC, it should be able to pick up arbitrary
> > changes to APIC registers which happened while it was inhibited, just
> > like AVIC does.
> > 
> > I'll take a look at the code to see if APICv does this (I know AVIC's
> > code much better that APICv's)
> > 
> > Is there a reproducer for this bug?
>
> The idea from step 6 to step 10 is to start BitVisor first, and start Linux on
> top of it. You can adjust the step as you like. Feel free to ask me anything
> regarding reproducing the problem with BitVisor if the giving steps are not
> sufficient.

Thank you for the detailed repro steps!  However, it's likely going to be O(weeks)
before anyone is able to look at this in detail given the extensive repro steps.
If you have bandwidth, it's probably worth trying to reproduce the problem in a
KVM selftest (or a KVM-Unit-Test), e.g. create a nested VM, send an IPI from L2,
and see if it gets routed correctly.  This purely a suggestion to try and get a
faster fix, it's by no means necessary.

Actually, typing that out raises a question (or two).  What APICv VMCS control
settings does BitVisor use?  E.g. is BitVisor enabling APICv for its VM (L2)?
If so, what values for the APIC access page and vAPIC page are shoved into
BitVisor's VMCS?

> The problem does not happen when enable_apicv=N. Note that SMP bringup with
> enable_apicv=N can fail. This is another problem. We don't have to worry about
> this for now. Linux seems to have no delay between INIT DEASSERT and SIPI during
> its SMP bringup. This can easily makes INIT and SIPI pending together resultling
> in signal lost.
> 
> I admit that my knowledge on KVM and APICv is very limited. I may misunderstand
> the problem. If you don't mind, would it be possible for you to guide me which
> code path should I pay attention to? I would love to learn to find out the
> actual cause of the problem.

KVM *should* emulate the APIC MMIO access from L2.  The call stack should reach
apic_mmio_write(), and assuming it's an ICR write, KVM should send an IPI.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-08 23:48     ` Sean Christopherson
@ 2023-08-09  8:42       ` Ake Koomsin
  2023-08-25  3:58       ` Ake Koomsin
  1 sibling, 0 replies; 7+ messages in thread
From: Ake Koomsin @ 2023-08-09  8:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin

On Tue, 8 Aug 2023 16:48:19 -0700
Sean Christopherson <seanjc@google.com> wrote:

> > The idea from step 6 to step 10 is to start BitVisor first, and
> > start Linux on top of it. You can adjust the step as you like. Feel
> > free to ask me anything regarding reproducing the problem with
> > BitVisor if the giving steps are not sufficient.  
> 
> Thank you for the detailed repro steps!  However, it's likely going
> to be O(weeks) before anyone is able to look at this in detail given
> the extensive repro steps. If you have bandwidth, it's probably worth
> trying to reproduce the problem in a KVM selftest (or a
> KVM-Unit-Test), e.g. create a nested VM, send an IPI from L2, and see
> if it gets routed correctly.  This purely a suggestion to try and get
> a faster fix, it's by no means necessary.
> 
> Actually, typing that out raises a question (or two).  What APICv
> VMCS control settings does BitVisor use?  E.g. is BitVisor enabling
> APICv for its VM (L2)? If so, what values for the APIC access page
> and vAPIC page are shoved into BitVisor's VMCS?

BitVisor does not set up APICv at all. It also does not setup APIC
access page at all. It does not try to emulate APIC at all. It only
monitors for APIC INIT event through EPT_VIOLATION mechanism only for
its AP bringup and stop monitoring after that. As I mentioned in the
previous mail, when BitVisor runs on real hardware, it lets the guest
control real APIC directly.

As it is a micro hypervisor, it runs only one guest OS. Its main focus
is on device access monitoring/manipulation depending on the
configuration. It tries to avoid anything to do with interrupts as
much as possible.

In mean time, I will try to get deeper into KVM internal. Thank you
very much suggesting on KVM-Unit-Test.

> > The problem does not happen when enable_apicv=N. Note that SMP
> > bringup with enable_apicv=N can fail. This is another problem. We
> > don't have to worry about this for now. Linux seems to have no
> > delay between INIT DEASSERT and SIPI during its SMP bringup. This
> > can easily makes INIT and SIPI pending together resultling in
> > signal lost.
> > 
> > I admit that my knowledge on KVM and APICv is very limited. I may
> > misunderstand the problem. If you don't mind, would it be possible
> > for you to guide me which code path should I pay attention to? I
> > would love to learn to find out the actual cause of the problem.  
> 
> KVM *should* emulate the APIC MMIO access from L2.  The call stack
> should reach apic_mmio_write(), and assuming it's an ICR write, KVM
> should send an IPI.

When enable_apicv=N, interrupts work properly. This is why I wrote this
RFC patch.

Regarding SMP bringup fail, The thing is when L2 Linux guest runs on top
of L1 BitVisor, it is not going to rely on KVM specific features at all.
In this case, it seems to me that vcpus possibly can not change their
state to wait-for-sipi in time once INIT is issued (might be due to 
scheduling?). This does not happen when BitVisor runs on real hardware.
Once you have time to try BitVisor, please let me know if you can
reproduce the problem with the default configuration. Trying with
-smp 8+ on a machine with many cores might be easy to reproduce the 
problem. I test mine on i5-13600K.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2
  2023-08-08 23:48     ` Sean Christopherson
  2023-08-09  8:42       ` Ake Koomsin
@ 2023-08-25  3:58       ` Ake Koomsin
  1 sibling, 0 replies; 7+ messages in thread
From: Ake Koomsin @ 2023-08-25  3:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin

On Wed Aug 9, 2023 at 8:48 AM JST, Sean Christopherson wrote:
> Thank you for the detailed repro steps!  However, it's likely going to be O(weeks)
> before anyone is able to look at this in detail given the extensive repro steps.
> If you have bandwidth, it's probably worth trying to reproduce the problem in a
> KVM selftest (or a KVM-Unit-Test), e.g. create a nested VM, send an IPI from L2,
> and see if it gets routed correctly.  This purely a suggestion to try and get a
> faster fix, it's by no means necessary.

Hi

I have tried KVM Unit Test and want to report back the result.

Note 1: BitVisor does not let L2 guest see any KVM features in CPUID. It aims
to run on real hardware. The L2 guest will not be aware that it runs on a
hypervisor.

Note 2: BitVisor stops monitoring APIC access once it detects INIT in APIC
ICR write. bringup_aps() in lib/x86/smp.c unconditionally does INIT and SIPI
even though SMP=1. This is actually good to test direct physical APIC access
from L2 guest I think. BitVisor stops monitoring APIC access after
bringup_aps() is called. No APIC access goes to L1 BitVisor after this.

=== Procedure to chain-loading BitVisor and apic.efi from the unit test ===

1) 'hg clone http://hg.code.sf.net/p/bitvisor/code bitvisor'

2) Compile BitVisor by running 'make' command. The default config is ok.
   'bitvisor.elf' is created at the project root directory after compilation
   is done.

3) Apply the following patch to BitVisor. This is to make loadvmm.efi load
   BitVisor following by apic.efi

-------------------------------------------------------------------------------
diff --git a/boot/uefi-loader/loadvmm.c b/boot/uefi-loader/loadvmm.c
--- a/boot/uefi-loader/loadvmm.c
+++ b/boot/uefi-loader/loadvmm.c
@@ -212,5 +212,47 @@ efi_main (EFI_HANDLE image, EFI_SYSTEM_T
 	file->Close (file);
 	if (!boot_error)
 		return EFI_LOAD_ERROR;
+
+	static CHAR16 apic_path[4096];
+	EFI_HANDLE apic_image;
+	UINT32 npages;
+	create_file_path (loaded_image->FilePath, L"apic.efi", apic_path,
+			  sizeof apic_path / sizeof apic_path[0]);
+	status = fileio->OpenVolume (fileio, &file);
+	if (EFI_ERROR (status)) {
+		print (systab, L"OpenVolume ", status);
+		return status;
+	}
+	status = file->Open (file, &file2, apic_path, EFI_FILE_MODE_READ, 0);
+	if (EFI_ERROR (status)) {
+		print (systab, L"Open ", status);
+		return status;
+	}
+	/* apic.efi is about 1.2MB at the time of test, ~300 pages */
+	npages = 300;
+	status = systab->BootServices->AllocatePages (AllocateMaxAddress,
+						      EfiLoaderData, npages,
+						      &paddr);
+	if (EFI_ERROR (status)) {
+		print (systab, L"AllocatePages ", status);
+		return status;
+	}
+	readsize = npages * 4096;
+	status = file2->Read (file2, &readsize, (void *)paddr);
+	if (EFI_ERROR (status)) {
+		print (systab, L"Read ", status);
+		return status;
+	}
+	status = systab->BootServices->LoadImage (TRUE, image, NULL, (void *)paddr, readsize, &apic_image);
+	if (EFI_ERROR (status)) {
+		print (systab, L"LoadImage ", status);
+		return status;
+	}
+	status = systab->BootServices->StartImage (apic_image, NULL, NULL);
+	if (EFI_ERROR (status)) {
+		print (systab, L"StartImage ", status);
+		return status;
+	}
+
 	return EFI_SUCCESS;
 }
-------------------------------------------------------------------------------

4) Change directory to /path/to/bitvsor/boot/uefi-loader. Compile 'loadvmm.efi'
   by running 'make' command. Mingw64 is required to compiled the loader.
   Modify loadvmm.efi's Makefile to set your 'EXE_CC' if necessary.

5) Apply the following patch to KVM Unit Test code to copy the above
   loadvmm.efi as BOOTX64.EFI and make sure that bitvisor.elf and apic.efi are
   in the same folder as loadvmm.efi. The patch is dirty but it gets the job
   done. Replace '/path/to/loadvmm.efi' and '/path/to/bitvisor.elf' to match
   your testing environment.

-------------------------------------------------------------------------------
diff --git a/x86/efi/run b/x86/efi/run
index 85aeb94..fefb3cc 100755
--- a/x86/efi/run
+++ b/x86/efi/run
@@ -42,6 +42,10 @@ fi
 
 mkdir -p "$EFI_CASE_DIR"
 cp "$EFI_SRC/$EFI_CASE.efi" "$EFI_CASE_BINARY"
+cp "/path/to/loadvmm.efi" "$EFI_CASE_BINARY"
+cp "/path/to/bitvisor.elf" "$EFI_CASE_DIR/"
+cp "$EFI_SRC/$EFI_CASE.efi" "$EFI_CASE_DIR/$EFI_CASE.efi"
 
 # Run test case with 256MiB QEMU memory. QEMU default memory size is 128MiB.
 # After UEFI boot up and we call `LibMemoryMap()`, the largest consecutive
-------------------------------------------------------------------------------

6) The following bad hack is probably needed to avoid stall when testing with
   EFI_SMP > 1

-------------------------------------------------------------------------------
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index b9b91c7..ba74321 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -279,6 +279,9 @@ void bringup_aps(void)
 	/* INIT */
 	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);
 
+	for(int i = 0; i < 30000000; i++)
+		cpu_relax();
+
 	/* SIPI */
 	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP, 0);
 
-------------------------------------------------------------------------------

7) Compile KVM Unit Test with EFI enabled and run KVM Unit Test with the
   following command:

   ./x86/efi/run apic.efi -cpu host -m 2048M

The following section is the report from testing on my machine

CPU: 13th Gen Intel i5-13600K (20) @ 5.100GHz
Kernel: Latest kvm.git, default config
QEMU Version: 8.0.4

=== enable_apicv=N and EFI_SMP=1 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494489584 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6b7000
paging enabled
cr0 = 80010021
cr3 = 153f000
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494527968 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6b7000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
PASS: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
PASS: APIC physical broadcast address
PASS: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: pending nmi
PASS: APIC LVT timer one shot
starting apic change mode
PASS: TMICT value reset
PASS: TMCCT should have a non-zero value
PASS: TMCCT should have reached 0
PASS: TMCCT should have a non-zero value
PASS: TMCCT should not be reset to TMICT value
PASS: TMCCT should be reset to the initial-count
PASS: TMCCT should not be reset to init
PASS: TMCCT should have reach zero
PASS: TMCCT should stay at zero
PASS: tsc deadline timer
PASS: tsc deadline timer clearing
PASS: apicbase: relocate apic
PASS: apicbase: reserved physaddr bits
PASS: apicbase: reserved low bits
SUMMARY: 48 tests

=== enable_apicv=N and EFI_SMP=2 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494511056 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 1 APs
... Likely need bad hack in step 6 to continue ...
enabling apic
setup: CPU 1 online
Address of image is: 0x7e6b9000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
PASS: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
PASS: APIC physical broadcast address
PASS: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: nmi-after-sti
FAIL: multiple nmi
PASS: pending nmi
PASS: APIC LVT timer one shot
starting apic change mode
PASS: TMICT value reset
PASS: TMCCT should have a non-zero value
PASS: TMCCT should have reached 0
PASS: TMCCT should have a non-zero value
PASS: TMCCT should not be reset to TMICT value
PASS: TMCCT should be reset to the initial-count
PASS: TMCCT should not be reset to init
PASS: TMCCT should have reach zero
PASS: TMCCT should stay at zero
PASS: tsc deadline timer
PASS: tsc deadline timer clearing
PASS: xapic id matches cpuid
PASS: writeable xapic id
PASS: non-writeable x2apic id
PASS: sane x2apic id
PASS: x2apic id matches cpuid
PASS: correct xapic id after reset
PASS: apicbase: relocate apic
PASS: apicbase: reserved physaddr bits
PASS: apicbase: reserved low bits
SUMMARY: 56 tests, 1 unexpected failures

=== enable_apicv=Y and EFI_SMP=1 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494569088 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6c0000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
FAIL: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
FAIL: APIC physical broadcast address
FAIL: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: pending nmi
...Stall...

=== enable_apicv=Y and EFI_SMP=2 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494574368 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PRO/1000 found.
MAC Address: 52:54:00:12:34:56
core_io_unregister_handler: port: c180-c19f
Uninstall 2 protocol(s) successfully
[0000:00:02.0] Disconnected PCI device drivers
Wait for PHY reset and link setup completion.
PCI: 7 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 1 APs
... Likely need bad hack in step 6 to continue ...
enabling apic
setup: CPU 1 online
Address of image is: 0x7e461000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
...Stall...

When enable_apicv=N, it can complete the test while with enable_apicv=Y, it
cannot.

I am not sure if I violate any assumption KVM Unit Test made by doing this
experiment. However, I think it is worth reporting.

Feel free to ask me if there are problems when trying to reproduce the
experiment or you need more info.


Best Regards
Ake Koomsin

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-25  3:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-07  6:26 [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2 Ake Koomsin
2023-08-07 14:00 ` Maxim Levitsky
2023-08-07 18:04   ` Sean Christopherson
2023-08-08  7:45   ` Ake Koomsin
2023-08-08 23:48     ` Sean Christopherson
2023-08-09  8:42       ` Ake Koomsin
2023-08-25  3:58       ` Ake Koomsin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).