kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
@ 2017-08-01 21:00 David Matlack
  2017-08-01 21:00 ` [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit David Matlack
  2017-08-02  8:18 ` [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown Paolo Bonzini
  0 siblings, 2 replies; 5+ messages in thread
From: David Matlack @ 2017-08-01 21:00 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, David Matlack

According to the Intel SDM, software cannot rely on the current VMCS to be
coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
flushes.

24.11.1 Software Use of Virtual-Machine Control Structures
...
  If a logical processor leaves VMX operation, any VMCSs active on
  that logical processor may be corrupted (see below). To prevent
  such corruption of a VMCS that may be used either after a return
  to VMX operation or on another logical processor, software should
  execute VMCLEAR for that VMCS before executing the VMXOFF instruction
  or removing power from the processor (e.g., as part of a transition
  to the S3 and S4 power states).
...

This fixes a "suspicious rcu_dereference_check() usage!" warning during
kvm_vm_release() because nested_release_vmcs12() calls
kvm_vcpu_write_guest_page() without holding kvm->srcu.

Signed-off-by: David Matlack <dmatlack@google.com>
---
This patch applies on top of Paolo's "[PATCH] KVM: nVMX: do not pin the VMCS12".
(http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1455166.html)

 arch/x86/kvm/vmx.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5c03340f7827..07d2198db225 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -419,7 +419,7 @@ struct nested_vmx {
 	/*
 	 * Cache of the guest's VMCS, existing outside of guest memory.
 	 * Loaded from guest memory during VMPTRLD. Flushed to guest
-	 * memory during VMXOFF, VMCLEAR, VMPTRLD.
+	 * memory during VMCLEAR and VMPTRLD.
 	 */
 	struct vmcs12 *cached_vmcs12;
 	/*
@@ -7131,6 +7131,12 @@ static int nested_vmx_check_permission(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
+{
+	vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL, SECONDARY_EXEC_SHADOW_VMCS);
+	vmcs_write64(VMCS_LINK_POINTER, -1ull);
+}
+
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
 	if (vmx->nested.current_vmptr == -1ull)
@@ -7141,9 +7147,7 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 		   they were modified */
 		copy_shadow_to_vmcs12(vmx);
 		vmx->nested.sync_shadow_vmcs = false;
-		vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
-				SECONDARY_EXEC_SHADOW_VMCS);
-		vmcs_write64(VMCS_LINK_POINTER, -1ull);
+		vmx_disable_shadow_vmcs(vmx);
 	}
 	vmx->nested.posted_intr_nv = -1;
 
@@ -7166,12 +7170,14 @@ static void free_nested(struct vcpu_vmx *vmx)
 
 	vmx->nested.vmxon = false;
 	free_vpid(vmx->nested.vpid02);
-	nested_release_vmcs12(vmx);
+	vmx->nested.posted_intr_nv = -1;
+	vmx->nested.current_vmptr = -1ull;
 	if (vmx->nested.msr_bitmap) {
 		free_page((unsigned long)vmx->nested.msr_bitmap);
 		vmx->nested.msr_bitmap = NULL;
 	}
 	if (enable_shadow_vmcs) {
+		vmx_disable_shadow_vmcs(vmx);
 		vmcs_clear(vmx->vmcs01.shadow_vmcs);
 		free_vmcs(vmx->vmcs01.shadow_vmcs);
 		vmx->vmcs01.shadow_vmcs = NULL;
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit
  2017-08-01 21:00 [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown David Matlack
@ 2017-08-01 21:00 ` David Matlack
  2017-08-02  8:17   ` Paolo Bonzini
  2017-08-02  8:18 ` [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown Paolo Bonzini
  1 sibling, 1 reply; 5+ messages in thread
From: David Matlack @ 2017-08-01 21:00 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, David Matlack

The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/vmx.c | 53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 07d2198db225..b277a0409563 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4952,6 +4952,28 @@ static bool vmx_get_enable_apicv(void)
 	return enable_apicv;
 }
 
+static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	gfn_t gfn;
+
+	/*
+	 * Don't need to mark the APIC access page dirty; it is never
+	 * written to by the CPU during APIC virtualization.
+	 */
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
+		gfn = vmcs12->virtual_apic_page_addr >> PAGE_SHIFT;
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);
+	}
+
+	if (nested_cpu_has_posted_intr(vmcs12)) {
+		gfn = vmcs12->posted_intr_desc_addr >> PAGE_SHIFT;
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);
+	}
+}
+
+
 static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4959,18 +4981,15 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	void *vapic_page;
 	u16 status;
 
-	if (vmx->nested.pi_desc &&
-	    vmx->nested.pi_pending) {
-		vmx->nested.pi_pending = false;
-		if (!pi_test_and_clear_on(vmx->nested.pi_desc))
-			return;
-
-		max_irr = find_last_bit(
-			(unsigned long *)vmx->nested.pi_desc->pir, 256);
+	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
+		return;
 
-		if (max_irr == 256)
-			return;
+	vmx->nested.pi_pending = false;
+	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
+		return;
 
+	max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
+	if (max_irr != 256) {
 		vapic_page = kmap(vmx->nested.virtual_apic_page);
 		__kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page);
 		kunmap(vmx->nested.virtual_apic_page);
@@ -4982,6 +5001,8 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 			vmcs_write16(GUEST_INTR_STATUS, status);
 		}
 	}
+
+	nested_mark_vmcs12_pages_dirty(vcpu);
 }
 
 static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
@@ -8029,6 +8050,18 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 				vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
 				KVM_ISA_VMX);
 
+	/*
+	 * The host physical addresses of some pages of guest memory
+	 * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
+	 * may write to these pages via their host physical address while
+	 * L2 is running, bypassing any address-translation-based dirty
+	 * tracking (e.g. EPT write protection).
+	 *
+	 * Mark them dirty on every exit from L2 to prevent them from
+	 * getting out of sync with dirty tracking.
+	 */
+	nested_mark_vmcs12_pages_dirty(vcpu);
+
 	if (vmx->nested.nested_run_pending)
 		return false;
 
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit
  2017-08-01 21:00 ` [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit David Matlack
@ 2017-08-02  8:17   ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2017-08-02  8:17 UTC (permalink / raw)
  To: David Matlack, kvm

On 01/08/2017 23:00, David Matlack wrote:
> The host physical addresses of L1's Virtual APIC Page and Posted
> Interrupt descriptor are loaded into the VMCS02. The CPU may write
> to these pages via their host physical address while L2 is running,
> bypassing address-translation-based dirty tracking (e.g. EPT write
> protection). Mark them dirty on every exit from L2 to prevent them
> from getting out of sync with dirty tracking.
> 
> Also mark the virtual APIC page and the posted interrupt descriptor
> dirty when KVM is virtualizing posted interrupt processing.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>  arch/x86/kvm/vmx.c | 53 +++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 07d2198db225..b277a0409563 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4952,6 +4952,28 @@ static bool vmx_get_enable_apicv(void)
>  	return enable_apicv;
>  }
>  
> +static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
> +{
> +	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +	gfn_t gfn;
> +
> +	/*
> +	 * Don't need to mark the APIC access page dirty; it is never
> +	 * written to by the CPU during APIC virtualization.
> +	 */
> +
> +	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
> +		gfn = vmcs12->virtual_apic_page_addr >> PAGE_SHIFT;
> +		kvm_vcpu_mark_page_dirty(vcpu, gfn);
> +	}
> +
> +	if (nested_cpu_has_posted_intr(vmcs12)) {
> +		gfn = vmcs12->posted_intr_desc_addr >> PAGE_SHIFT;
> +		kvm_vcpu_mark_page_dirty(vcpu, gfn);
> +	}
> +}
> +
> +
>  static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4959,18 +4981,15 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>  	void *vapic_page;
>  	u16 status;
>  
> -	if (vmx->nested.pi_desc &&
> -	    vmx->nested.pi_pending) {
> -		vmx->nested.pi_pending = false;
> -		if (!pi_test_and_clear_on(vmx->nested.pi_desc))
> -			return;
> -
> -		max_irr = find_last_bit(
> -			(unsigned long *)vmx->nested.pi_desc->pir, 256);
> +	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
> +		return;
>  
> -		if (max_irr == 256)
> -			return;
> +	vmx->nested.pi_pending = false;
> +	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
> +		return;
>  
> +	max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
> +	if (max_irr != 256) {
>  		vapic_page = kmap(vmx->nested.virtual_apic_page);
>  		__kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page);
>  		kunmap(vmx->nested.virtual_apic_page);
> @@ -4982,6 +5001,8 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>  			vmcs_write16(GUEST_INTR_STATUS, status);
>  		}
>  	}
> +
> +	nested_mark_vmcs12_pages_dirty(vcpu);
>  }
>  
>  static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
> @@ -8029,6 +8050,18 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>  				vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
>  				KVM_ISA_VMX);
>  
> +	/*
> +	 * The host physical addresses of some pages of guest memory
> +	 * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
> +	 * may write to these pages via their host physical address while
> +	 * L2 is running, bypassing any address-translation-based dirty
> +	 * tracking (e.g. EPT write protection).
> +	 *
> +	 * Mark them dirty on every exit from L2 to prevent them from
> +	 * getting out of sync with dirty tracking.
> +	 */
> +	nested_mark_vmcs12_pages_dirty(vcpu);
> +
>  	if (vmx->nested.nested_run_pending)
>  		return false;
>  
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
  2017-08-01 21:00 [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown David Matlack
  2017-08-01 21:00 ` [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit David Matlack
@ 2017-08-02  8:18 ` Paolo Bonzini
  2017-08-02 20:37   ` Radim Krčmář
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2017-08-02  8:18 UTC (permalink / raw)
  To: David Matlack, kvm

On 01/08/2017 23:00, David Matlack wrote:
> According to the Intel SDM, software cannot rely on the current VMCS to be
> coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
> flushes.
> 
> 24.11.1 Software Use of Virtual-Machine Control Structures
> ...
>   If a logical processor leaves VMX operation, any VMCSs active on
>   that logical processor may be corrupted (see below). To prevent
>   such corruption of a VMCS that may be used either after a return
>   to VMX operation or on another logical processor, software should
>   execute VMCLEAR for that VMCS before executing the VMXOFF instruction
>   or removing power from the processor (e.g., as part of a transition
>   to the S3 and S4 power states).
> ...
> 
> This fixes a "suspicious rcu_dereference_check() usage!" warning during
> kvm_vm_release() because nested_release_vmcs12() calls
> kvm_vcpu_write_guest_page() without holding kvm->srcu.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> This patch applies on top of Paolo's "[PATCH] KVM: nVMX: do not pin the VMCS12".
> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1455166.html)

Thanks, I think Radim should first apply the RCU-on-teardown patch
(which I'll resend formally today), then "do not pin the VMCS12", then
these two.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> 
>  arch/x86/kvm/vmx.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5c03340f7827..07d2198db225 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -419,7 +419,7 @@ struct nested_vmx {
>  	/*
>  	 * Cache of the guest's VMCS, existing outside of guest memory.
>  	 * Loaded from guest memory during VMPTRLD. Flushed to guest
> -	 * memory during VMXOFF, VMCLEAR, VMPTRLD.
> +	 * memory during VMCLEAR and VMPTRLD.
>  	 */
>  	struct vmcs12 *cached_vmcs12;
>  	/*
> @@ -7131,6 +7131,12 @@ static int nested_vmx_check_permission(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
> +{
> +	vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL, SECONDARY_EXEC_SHADOW_VMCS);
> +	vmcs_write64(VMCS_LINK_POINTER, -1ull);
> +}
> +
>  static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>  {
>  	if (vmx->nested.current_vmptr == -1ull)
> @@ -7141,9 +7147,7 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>  		   they were modified */
>  		copy_shadow_to_vmcs12(vmx);
>  		vmx->nested.sync_shadow_vmcs = false;
> -		vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> -				SECONDARY_EXEC_SHADOW_VMCS);
> -		vmcs_write64(VMCS_LINK_POINTER, -1ull);
> +		vmx_disable_shadow_vmcs(vmx);
>  	}
>  	vmx->nested.posted_intr_nv = -1;
>  
> @@ -7166,12 +7170,14 @@ static void free_nested(struct vcpu_vmx *vmx)
>  
>  	vmx->nested.vmxon = false;
>  	free_vpid(vmx->nested.vpid02);
> -	nested_release_vmcs12(vmx);
> +	vmx->nested.posted_intr_nv = -1;
> +	vmx->nested.current_vmptr = -1ull;
>  	if (vmx->nested.msr_bitmap) {
>  		free_page((unsigned long)vmx->nested.msr_bitmap);
>  		vmx->nested.msr_bitmap = NULL;
>  	}
>  	if (enable_shadow_vmcs) {
> +		vmx_disable_shadow_vmcs(vmx);
>  		vmcs_clear(vmx->vmcs01.shadow_vmcs);
>  		free_vmcs(vmx->vmcs01.shadow_vmcs);
>  		vmx->vmcs01.shadow_vmcs = NULL;
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
  2017-08-02  8:18 ` [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown Paolo Bonzini
@ 2017-08-02 20:37   ` Radim Krčmář
  0 siblings, 0 replies; 5+ messages in thread
From: Radim Krčmář @ 2017-08-02 20:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: David Matlack, kvm

2017-08-02 10:18+0200, Paolo Bonzini:
> On 01/08/2017 23:00, David Matlack wrote:
> > According to the Intel SDM, software cannot rely on the current VMCS to be
> > coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
> > flushes.
> > 
> > 24.11.1 Software Use of Virtual-Machine Control Structures
> > ...
> >   If a logical processor leaves VMX operation, any VMCSs active on
> >   that logical processor may be corrupted (see below). To prevent
> >   such corruption of a VMCS that may be used either after a return
> >   to VMX operation or on another logical processor, software should
> >   execute VMCLEAR for that VMCS before executing the VMXOFF instruction
> >   or removing power from the processor (e.g., as part of a transition
> >   to the S3 and S4 power states).
> > ...
> > 
> > This fixes a "suspicious rcu_dereference_check() usage!" warning during
> > kvm_vm_release() because nested_release_vmcs12() calls
> > kvm_vcpu_write_guest_page() without holding kvm->srcu.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> > This patch applies on top of Paolo's "[PATCH] KVM: nVMX: do not pin the VMCS12".
> > (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1455166.html)
> 
> Thanks, I think Radim should first apply the RCU-on-teardown patch
> (which I'll resend formally today), then "do not pin the VMCS12", then
> these two.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Applied in that order, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-02 20:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 21:00 [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown David Matlack
2017-08-01 21:00 ` [PATCH 2/2] KVM: nVMX: mark vmcs12 pages dirty on L2 exit David Matlack
2017-08-02  8:17   ` Paolo Bonzini
2017-08-02  8:18 ` [PATCH 1/2] kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown Paolo Bonzini
2017-08-02 20:37   ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).