[PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization
@ 2017-05-05 19:25 Bandan Das
  2017-05-05 19:25 ` [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation Bandan Das
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Bandan Das @ 2017-05-05 19:25 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, linux-kernel

v2:
2/3: Clear out all bits except bit 12
3/3: Slightly modify an existing comment, honor L0's
PML setting when clearing it for L1

v1:
http://www.spinics.net/lists/kvm/msg149247.html

These patches implement PML on top of EPT A/D emulation
(ae1e2d1082ae).

When dirty bit is being set, we write the gpa to the
buffer provided by L1. If the index overflows, we just
change the exit reason before running L1.

Bandan Das (3):
  kvm: x86: Add a hook for arch specific dirty logging emulation
  nVMX: Implement emulated Page Modification Logging
  nVMX: Advertise PML to L1 hypervisor

 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/mmu.c              | 15 +++++++
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/paging_tmpl.h      |  4 ++
 arch/x86/kvm/vmx.c              | 97 ++++++++++++++++++++++++++++++++++++++---
 5 files changed, 112 insertions(+), 7 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation
  2017-05-05 19:25 [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Bandan Das
@ 2017-05-05 19:25 ` Bandan Das
  2017-05-10 10:49   ` Huang, Kai
  2017-05-05 19:25 ` [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging Bandan Das
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Bandan Das @ 2017-05-05 19:25 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, linux-kernel

When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.c              | 15 +++++++++++++++
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/paging_tmpl.h      |  4 ++++
 4 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5bddf92..9c761fe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1020,6 +1020,8 @@ struct kvm_x86_ops {
 	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
 					   struct kvm_memory_slot *slot,
 					   gfn_t offset, unsigned long mask);
+	int (*write_log_dirty)(struct kvm_vcpu *vcpu);
+
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5586765..5d3376f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1498,6 +1498,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
+/**
+ * kvm_arch_write_log_dirty - emulate dirty page logging
+ * @vcpu: Guest mode vcpu
+ *
+ * Emulate arch specific page modification logging for the
+ * nested hypervisor
+ */
+int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops->write_log_dirty)
+		return kvm_x86_ops->write_log_dirty(vcpu);
+
+	return 0;
+}
+
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn)
 {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d8ccb32..2797580 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -202,4 +202,5 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn);
+int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 314d207..5624174 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -226,6 +226,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 		if (level == walker->level && write_fault &&
 				!(pte & PT_GUEST_DIRTY_MASK)) {
 			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
+#if PTTYPE == PTTYPE_EPT
+			if (kvm_arch_write_log_dirty(vcpu))
+				return -EINVAL;
+#endif
 			pte |= PT_GUEST_DIRTY_MASK;
 		}
 		if (pte == orig_pte)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging
  2017-05-05 19:25 [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Bandan Das
  2017-05-05 19:25 ` [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation Bandan Das
@ 2017-05-05 19:25 ` Bandan Das
  2017-05-10 10:48   ` Huang, Kai
  2017-05-05 19:25 ` [PATCH v2 3/3] nVMX: Advertise PML to L1 hypervisor Bandan Das
  2017-05-09 15:22 ` [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Paolo Bonzini
  3 siblings, 1 reply; 15+ messages in thread
From: Bandan Das @ 2017-05-05 19:25 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, linux-kernel

With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.

This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/vmx.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2211697..8b9e942 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -248,6 +248,7 @@ struct __packed vmcs12 {
 	u64 xss_exit_bitmap;
 	u64 guest_physical_address;
 	u64 vmcs_link_pointer;
+	u64 pml_address;
 	u64 guest_ia32_debugctl;
 	u64 guest_ia32_pat;
 	u64 guest_ia32_efer;
@@ -369,6 +370,7 @@ struct __packed vmcs12 {
 	u16 guest_ldtr_selector;
 	u16 guest_tr_selector;
 	u16 guest_intr_status;
+	u16 guest_pml_index;
 	u16 host_es_selector;
 	u16 host_cs_selector;
 	u16 host_ss_selector;
@@ -407,6 +409,7 @@ struct nested_vmx {
 	/* Has the level1 guest done vmxon? */
 	bool vmxon;
 	gpa_t vmxon_ptr;
+	bool pml_full;
 
 	/* The guest-physical address of the current VMCS L1 keeps for L2 */
 	gpa_t current_vmptr;
@@ -742,6 +745,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD(GUEST_LDTR_SELECTOR, guest_ldtr_selector),
 	FIELD(GUEST_TR_SELECTOR, guest_tr_selector),
 	FIELD(GUEST_INTR_STATUS, guest_intr_status),
+	FIELD(GUEST_PML_INDEX, guest_pml_index),
 	FIELD(HOST_ES_SELECTOR, host_es_selector),
 	FIELD(HOST_CS_SELECTOR, host_cs_selector),
 	FIELD(HOST_SS_SELECTOR, host_ss_selector),
@@ -767,6 +771,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
+	FIELD64(PML_ADDRESS, pml_address),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
 	FIELD64(GUEST_IA32_PAT, guest_ia32_pat),
 	FIELD64(GUEST_IA32_EFER, guest_ia32_efer),
@@ -1349,6 +1354,11 @@ static inline bool nested_cpu_has_xsaves(struct vmcs12 *vmcs12)
 		vmx_xsaves_supported();
 }
 
+static inline bool nested_cpu_has_pml(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_PML);
+}
+
 static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
 {
 	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
@@ -9368,13 +9378,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 		struct x86_exception *fault)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 exit_reason;
+	unsigned long exit_qualification = vcpu->arch.exit_qualification;
 
-	if (fault->error_code & PFERR_RSVD_MASK)
+	if (vmx->nested.pml_full) {
+		exit_reason = EXIT_REASON_PML_FULL;
+		vmx->nested.pml_full = false;
+		exit_qualification &= INTR_INFO_UNBLOCK_NMI;
+	} else if (fault->error_code & PFERR_RSVD_MASK)
 		exit_reason = EXIT_REASON_EPT_MISCONFIG;
 	else
 		exit_reason = EXIT_REASON_EPT_VIOLATION;
-	nested_vmx_vmexit(vcpu, exit_reason, 0, vcpu->arch.exit_qualification);
+
+	nested_vmx_vmexit(vcpu, exit_reason, 0, exit_qualification);
 	vmcs12->guest_physical_address = fault->address;
 }
 
@@ -9717,6 +9734,22 @@ static int nested_vmx_check_msr_switch_controls(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+static int nested_vmx_check_pml_controls(struct kvm_vcpu *vcpu,
+					 struct vmcs12 *vmcs12)
+{
+	u64 address = vmcs12->pml_address;
+	int maxphyaddr = cpuid_maxphyaddr(vcpu);
+
+	if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_PML)) {
+		if (!nested_cpu_has_ept(vmcs12) ||
+		    !IS_ALIGNED(address, 4096)  ||
+		    address >> maxphyaddr)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int nested_vmx_msr_check_common(struct kvm_vcpu *vcpu,
 				       struct vmx_msr_entry *e)
 {
@@ -10252,6 +10285,9 @@ static int check_vmentry_prereqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	if (nested_vmx_check_msr_switch_controls(vcpu, vmcs12))
 		return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
 
+	if (nested_vmx_check_pml_controls(vcpu, vmcs12))
+		return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
+
 	if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
 				vmx->nested.nested_vmx_procbased_ctls_low,
 				vmx->nested.nested_vmx_procbased_ctls_high) ||
@@ -11146,6 +11182,46 @@ static void vmx_flush_log_dirty(struct kvm *kvm)
 	kvm_flush_pml_buffers(kvm);
 }
 
+static int vmx_write_pml_buffer(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	gpa_t gpa;
+	struct page *page = NULL;
+	u64 *pml_address;
+
+	if (is_guest_mode(vcpu)) {
+		WARN_ON_ONCE(vmx->nested.pml_full);
+
+		/*
+		 * Check if PML is enabled for the nested guest.
+		 * Whether eptp bit 6 is set is already checked
+		 * as part of A/D emulation.
+		 */
+		vmcs12 = get_vmcs12(vcpu);
+		if (!nested_cpu_has_pml(vmcs12))
+			return 0;
+
+		if (vmcs12->guest_pml_index > PML_ENTITY_NUM) {
+			vmx->nested.pml_full = true;
+			return 1;
+		}
+
+		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
+
+		page = nested_get_page(vcpu, vmcs12->pml_address);
+		if (!page)
+			return 0;
+
+		pml_address = kmap(page);
+		pml_address[vmcs12->guest_pml_index--] = gpa;
+		kunmap(page);
+		nested_release_page_clean(page);
+	}
+
+	return 0;
+}
+
 static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 					   struct kvm_memory_slot *memslot,
 					   gfn_t offset, unsigned long mask)
@@ -11505,6 +11581,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
 	.flush_log_dirty = vmx_flush_log_dirty,
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
+	.write_log_dirty = vmx_write_pml_buffer,
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] nVMX: Advertise PML to L1 hypervisor
  2017-05-05 19:25 [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Bandan Das
  2017-05-05 19:25 ` [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation Bandan Das
  2017-05-05 19:25 ` [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging Bandan Das
@ 2017-05-05 19:25 ` Bandan Das
  2017-05-09 15:22 ` [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Paolo Bonzini
  3 siblings, 0 replies; 15+ messages in thread
From: Bandan Das @ 2017-05-05 19:25 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, linux-kernel

Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/vmx.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8b9e942..a5f6054 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2763,8 +2763,11 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
 			VMX_EPT_EXTENT_CONTEXT_BIT | VMX_EPT_2MB_PAGE_BIT |
 			VMX_EPT_1GB_PAGE_BIT;
-	       if (enable_ept_ad_bits)
+		if (enable_ept_ad_bits) {
+			vmx->nested.nested_vmx_secondary_ctls_high |=
+				SECONDARY_EXEC_ENABLE_PML;
 		       vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
+		}
 	} else
 		vmx->nested.nested_vmx_ept_caps = 0;
 
@@ -8128,7 +8131,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	case EXIT_REASON_PREEMPTION_TIMER:
 		return false;
 	case EXIT_REASON_PML_FULL:
-		/* We don't expose PML support to L1. */
+		/* We emulate PML support to L1. */
 		return false;
 	default:
 		return true;
@@ -9923,7 +9926,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 			  bool from_vmentry, u32 *entry_failure_code)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 exec_control;
+	u32 exec_control, vmcs12_exec_ctrl;
 
 	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
 	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
@@ -10054,8 +10057,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 				  SECONDARY_EXEC_APIC_REGISTER_VIRT);
 		if (nested_cpu_has(vmcs12,
-				CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
-			exec_control |= vmcs12->secondary_vm_exec_control;
+				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) {
+			vmcs12_exec_ctrl = vmcs12->secondary_vm_exec_control &
+				~SECONDARY_EXEC_ENABLE_PML;
+			exec_control |= vmcs12_exec_ctrl;
+		}
 
 		if (exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) {
 			vmcs_write64(EOI_EXIT_BITMAP0,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization
  2017-05-05 19:25 [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Bandan Das
                   ` (2 preceding siblings ...)
  2017-05-05 19:25 ` [PATCH v2 3/3] nVMX: Advertise PML to L1 hypervisor Bandan Das
@ 2017-05-09 15:22 ` Paolo Bonzini
  2017-05-09 16:03   ` Bandan Das
  3 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-09 15:22 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: linux-kernel



On 05/05/2017 21:25, Bandan Das wrote:
> v2:
> 2/3: Clear out all bits except bit 12
> 3/3: Slightly modify an existing comment, honor L0's
> PML setting when clearing it for L1
> 
> v1:
> http://www.spinics.net/lists/kvm/msg149247.html
> 
> These patches implement PML on top of EPT A/D emulation
> (ae1e2d1082ae).
> 
> When dirty bit is being set, we write the gpa to the
> buffer provided by L1. If the index overflows, we just
> change the exit reason before running L1.

I tested this with api/dirty-log-perf, and nested PML is more than 3
times faster than pml=0.  I want to do a few more tests because I don't
see any PML full exits in the L1 trace, but it seems to be a nice
improvement!

Paolo

> Bandan Das (3):
>   kvm: x86: Add a hook for arch specific dirty logging emulation
>   nVMX: Implement emulated Page Modification Logging
>   nVMX: Advertise PML to L1 hypervisor
> 
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/mmu.c              | 15 +++++++
>  arch/x86/kvm/mmu.h              |  1 +
>  arch/x86/kvm/paging_tmpl.h      |  4 ++
>  arch/x86/kvm/vmx.c              | 97 ++++++++++++++++++++++++++++++++++++++---
>  5 files changed, 112 insertions(+), 7 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization
  2017-05-09 15:22 ` [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Paolo Bonzini
@ 2017-05-09 16:03   ` Bandan Das
  2017-05-09 16:04     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Bandan Das @ 2017-05-09 16:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 05/05/2017 21:25, Bandan Das wrote:
>> v2:
>> 2/3: Clear out all bits except bit 12
>> 3/3: Slightly modify an existing comment, honor L0's
>> PML setting when clearing it for L1
>> 
>> v1:
>> http://www.spinics.net/lists/kvm/msg149247.html
>> 
>> These patches implement PML on top of EPT A/D emulation
>> (ae1e2d1082ae).
>> 
>> When dirty bit is being set, we write the gpa to the
>> buffer provided by L1. If the index overflows, we just
>> change the exit reason before running L1.
>
> I tested this with api/dirty-log-perf, and nested PML is more than 3
> times faster than pml=0.  I want to do a few more tests because I don't
> see any PML full exits in the L1 trace, but it seems to be a nice
> improvement!

Thanks for testing! Regarding the PML full exits, I did notice their
absence. I induced it artifically in my testing with a lower index
and it seemed to work fine.

> Paolo
>
>> Bandan Das (3):
>>   kvm: x86: Add a hook for arch specific dirty logging emulation
>>   nVMX: Implement emulated Page Modification Logging
>>   nVMX: Advertise PML to L1 hypervisor
>> 
>>  arch/x86/include/asm/kvm_host.h |  2 +
>>  arch/x86/kvm/mmu.c              | 15 +++++++
>>  arch/x86/kvm/mmu.h              |  1 +
>>  arch/x86/kvm/paging_tmpl.h      |  4 ++
>>  arch/x86/kvm/vmx.c              | 97 ++++++++++++++++++++++++++++++++++++++---
>>  5 files changed, 112 insertions(+), 7 deletions(-)
>> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization
  2017-05-09 16:03   ` Bandan Das
@ 2017-05-09 16:04     ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-09 16:04 UTC (permalink / raw)
  To: Bandan Das; +Cc: kvm, linux-kernel



On 09/05/2017 18:03, Bandan Das wrote:
>> I tested this with api/dirty-log-perf, and nested PML is more than 3
>> times faster than pml=0.  I want to do a few more tests because I don't
>> see any PML full exits in the L1 trace, but it seems to be a nice
>> improvement!
>
> Thanks for testing! Regarding the PML full exits, I did notice their
> absence. I induced it artifically in my testing with a lower index
> and it seemed to work fine.

Yes, tracing showed that it's because the guest's EXTERNAL_INTERRUPT
exits are too frequent.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging
  2017-05-05 19:25 ` [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging Bandan Das
@ 2017-05-10 10:48   ` Huang, Kai
  2017-05-10 14:46     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Huang, Kai @ 2017-05-10 10:48 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: pbonzini, linux-kernel



On 5/6/2017 7:25 AM, Bandan Das wrote:
> With EPT A/D enabled, processor access to L2 guest
> paging structures will result in a write violation.
> When this happens, write the GUEST_PHYSICAL_ADDRESS
> to the pml buffer provided by L1 if the access is
> write and the dirty bit is being set.
>
> This patch also adds necessary checks during VMEntry if L1
> has enabled PML. If the PML index overflows, we change the
> exit reason and run L1 to simulate a PML full event.
>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2211697..8b9e942 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -248,6 +248,7 @@ struct __packed vmcs12 {
>  	u64 xss_exit_bitmap;
>  	u64 guest_physical_address;
>  	u64 vmcs_link_pointer;
> +	u64 pml_address;
>  	u64 guest_ia32_debugctl;
>  	u64 guest_ia32_pat;
>  	u64 guest_ia32_efer;
> @@ -369,6 +370,7 @@ struct __packed vmcs12 {
>  	u16 guest_ldtr_selector;
>  	u16 guest_tr_selector;
>  	u16 guest_intr_status;
> +	u16 guest_pml_index;
>  	u16 host_es_selector;
>  	u16 host_cs_selector;
>  	u16 host_ss_selector;
> @@ -407,6 +409,7 @@ struct nested_vmx {
>  	/* Has the level1 guest done vmxon? */
>  	bool vmxon;
>  	gpa_t vmxon_ptr;
> +	bool pml_full;
>
>  	/* The guest-physical address of the current VMCS L1 keeps for L2 */
>  	gpa_t current_vmptr;
> @@ -742,6 +745,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
>  	FIELD(GUEST_LDTR_SELECTOR, guest_ldtr_selector),
>  	FIELD(GUEST_TR_SELECTOR, guest_tr_selector),
>  	FIELD(GUEST_INTR_STATUS, guest_intr_status),
> +	FIELD(GUEST_PML_INDEX, guest_pml_index),
>  	FIELD(HOST_ES_SELECTOR, host_es_selector),
>  	FIELD(HOST_CS_SELECTOR, host_cs_selector),
>  	FIELD(HOST_SS_SELECTOR, host_ss_selector),
> @@ -767,6 +771,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
>  	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
>  	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
>  	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
> +	FIELD64(PML_ADDRESS, pml_address),
>  	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
>  	FIELD64(GUEST_IA32_PAT, guest_ia32_pat),
>  	FIELD64(GUEST_IA32_EFER, guest_ia32_efer),
> @@ -1349,6 +1354,11 @@ static inline bool nested_cpu_has_xsaves(struct vmcs12 *vmcs12)
>  		vmx_xsaves_supported();
>  }
>
> +static inline bool nested_cpu_has_pml(struct vmcs12 *vmcs12)
> +{
> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_PML);
> +}
> +
>  static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>  {
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
> @@ -9368,13 +9378,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
>  		struct x86_exception *fault)
>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 exit_reason;
> +	unsigned long exit_qualification = vcpu->arch.exit_qualification;
>
> -	if (fault->error_code & PFERR_RSVD_MASK)
> +	if (vmx->nested.pml_full) {
> +		exit_reason = EXIT_REASON_PML_FULL;
> +		vmx->nested.pml_full = false;
> +		exit_qualification &= INTR_INFO_UNBLOCK_NMI;

Sorry I cannot recall the details. probably better to add a comment to 
indicate why mask out INTR_INFO_UNBLOCK_NMI?


> +	} else if (fault->error_code & PFERR_RSVD_MASK)
>  		exit_reason = EXIT_REASON_EPT_MISCONFIG;
>  	else
>  		exit_reason = EXIT_REASON_EPT_VIOLATION;
> -	nested_vmx_vmexit(vcpu, exit_reason, 0, vcpu->arch.exit_qualification);
> +
> +	nested_vmx_vmexit(vcpu, exit_reason, 0, exit_qualification);
>  	vmcs12->guest_physical_address = fault->address;
>  }
>
> @@ -9717,6 +9734,22 @@ static int nested_vmx_check_msr_switch_controls(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>
> +static int nested_vmx_check_pml_controls(struct kvm_vcpu *vcpu,
> +					 struct vmcs12 *vmcs12)
> +{
> +	u64 address = vmcs12->pml_address;
> +	int maxphyaddr = cpuid_maxphyaddr(vcpu);
> +
> +	if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_PML)) {
> +		if (!nested_cpu_has_ept(vmcs12) ||
> +		    !IS_ALIGNED(address, 4096)  ||
> +		    address >> maxphyaddr)
> +			return -EINVAL;
> +	}

Do we also need to check whether EPT A/D has been enabled for vmcs12 to 
make vmentry work? I cannot recall details but probably not necessary.

> +
> +	return 0;
> +}
> +
>  static int nested_vmx_msr_check_common(struct kvm_vcpu *vcpu,
>  				       struct vmx_msr_entry *e)
>  {
> @@ -10252,6 +10285,9 @@ static int check_vmentry_prereqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  	if (nested_vmx_check_msr_switch_controls(vcpu, vmcs12))
>  		return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
>
> +	if (nested_vmx_check_pml_controls(vcpu, vmcs12))
> +		return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
> +
>  	if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
>  				vmx->nested.nested_vmx_procbased_ctls_low,
>  				vmx->nested.nested_vmx_procbased_ctls_high) ||
> @@ -11146,6 +11182,46 @@ static void vmx_flush_log_dirty(struct kvm *kvm)
>  	kvm_flush_pml_buffers(kvm);
>  }
>
> +static int vmx_write_pml_buffer(struct kvm_vcpu *vcpu)
> +{
> +	struct vmcs12 *vmcs12;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	gpa_t gpa;
> +	struct page *page = NULL;
> +	u64 *pml_address;
> +
> +	if (is_guest_mode(vcpu)) {
> +		WARN_ON_ONCE(vmx->nested.pml_full);
> +
> +		/*
> +		 * Check if PML is enabled for the nested guest.
> +		 * Whether eptp bit 6 is set is already checked
> +		 * as part of A/D emulation.
> +		 */
> +		vmcs12 = get_vmcs12(vcpu);
> +		if (!nested_cpu_has_pml(vmcs12))
> +			return 0;

Since above nested_vmx_check_pml_controls doesn't check EPT A/D bit in 
L1, seems we need to add this check here.

> +
> +		if (vmcs12->guest_pml_index > PML_ENTITY_NUM) {
> +			vmx->nested.pml_full = true;
> +			return 1;
> +		}

Is the purpose of returning 1 to make upper layer code to inject PML 
full VMEXIt to L1 in nested_ept_inject_page_fault?

> +
> +		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
> +
> +		page = nested_get_page(vcpu, vmcs12->pml_address);
> +		if (!page)
> +			return 0;

If PML is enabled in L1, I think nested_get_page should never return a 
NULL PML page (unless L1 does something wrong)? Probably better to 
return 1 rather than 0, and handle error in nested_ept_inject_page_fault 
according to vmcs12->pml_address?

> +
> +		pml_address = kmap(page);
> +		pml_address[vmcs12->guest_pml_index--] = gpa;

This gpa is L2 guest's GPA. Do we also need to mark L1's GPA (which is 
related to L2 guest's GPA above) in to dirty-log? Or has this already 
been done?

Thanks,
-Kai

> +		kunmap(page);
> +		nested_release_page_clean(page);
> +	}
> +
> +	return 0;
> +}
> +
>  static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
>  					   struct kvm_memory_slot *memslot,
>  					   gfn_t offset, unsigned long mask)
> @@ -11505,6 +11581,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
>  	.flush_log_dirty = vmx_flush_log_dirty,
>  	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
> +	.write_log_dirty = vmx_write_pml_buffer,
>
>  	.pre_block = vmx_pre_block,
>  	.post_block = vmx_post_block,
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation
  2017-05-05 19:25 ` [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation Bandan Das
@ 2017-05-10 10:49   ` Huang, Kai
  2017-05-10 15:53     ` Bandan Das
  0 siblings, 1 reply; 15+ messages in thread
From: Huang, Kai @ 2017-05-10 10:49 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: pbonzini, linux-kernel



On 5/6/2017 7:25 AM, Bandan Das wrote:
> When KVM updates accessed/dirty bits, this hook can be used
> to invoke an arch specific function that implements/emulates
> dirty logging such as PML.
>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu.c              | 15 +++++++++++++++
>  arch/x86/kvm/mmu.h              |  1 +
>  arch/x86/kvm/paging_tmpl.h      |  4 ++++
>  4 files changed, 22 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f5bddf92..9c761fe 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1020,6 +1020,8 @@ struct kvm_x86_ops {
>  	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
>  					   struct kvm_memory_slot *slot,
>  					   gfn_t offset, unsigned long mask);
> +	int (*write_log_dirty)(struct kvm_vcpu *vcpu);

Hi,

Thanks for adding PML to nested support!

IMHO this callback is only used for write L2's dirty gpa to L1's PML 
buffer, so probably it's better to change the name to something like: 
nested_write_log_dirty.

Thanks,
-Kai

> +
>  	/* pmu operations of sub-arch */
>  	const struct kvm_pmu_ops *pmu_ops;
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 5586765..5d3376f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1498,6 +1498,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>
> +/**
> + * kvm_arch_write_log_dirty - emulate dirty page logging
> + * @vcpu: Guest mode vcpu
> + *
> + * Emulate arch specific page modification logging for the
> + * nested hypervisor
> + */
> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_x86_ops->write_log_dirty)
> +		return kvm_x86_ops->write_log_dirty(vcpu);
> +
> +	return 0;
> +}

kvm_nested_arch_write_log_dirty?

> +
>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>  				    struct kvm_memory_slot *slot, u64 gfn)
>  {
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index d8ccb32..2797580 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -202,4 +202,5 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>  void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>  				    struct kvm_memory_slot *slot, u64 gfn);
> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>  #endif
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 314d207..5624174 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -226,6 +226,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>  		if (level == walker->level && write_fault &&
>  				!(pte & PT_GUEST_DIRTY_MASK)) {
>  			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
> +#if PTTYPE == PTTYPE_EPT
> +			if (kvm_arch_write_log_dirty(vcpu))
> +				return -EINVAL;
> +#endif
>  			pte |= PT_GUEST_DIRTY_MASK;
>  		}
>  		if (pte == orig_pte)
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging
  2017-05-10 10:48   ` Huang, Kai
@ 2017-05-10 14:46     ` Paolo Bonzini
  2017-05-10 16:00       ` Bandan Das
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-10 14:46 UTC (permalink / raw)
  To: Huang, Kai, Bandan Das, kvm; +Cc: linux-kernel



On 10/05/2017 12:48, Huang, Kai wrote:
>>
>> -    if (fault->error_code & PFERR_RSVD_MASK)
>> +    if (vmx->nested.pml_full) {
>> +        exit_reason = EXIT_REASON_PML_FULL;
>> +        vmx->nested.pml_full = false;
>> +        exit_qualification &= INTR_INFO_UNBLOCK_NMI;
> 
> Sorry I cannot recall the details. probably better to add a comment to
> indicate why mask out INTR_INFO_UNBLOCK_NMI?

It only keeps that bit, because it's the only exit qualification bit
defined for PML full vmexits (27.2.2).

>> +    if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_PML)) {
>> +        if (!nested_cpu_has_ept(vmcs12) ||
>> +            !IS_ALIGNED(address, 4096)  ||
>> +            address >> maxphyaddr)
>> +            return -EINVAL;
>> +    }
> 
> Do we also need to check whether EPT A/D has been enabled for vmcs12 to
> make vmentry work? I cannot recall details but probably not necessary.

The SDM doesn't say so.  The "if" above matches the checks in 26.2.1.1
of the manual (Checks on VMX Controls, VM-Execution Control Fields).


>> +        /*
>> +         * Check if PML is enabled for the nested guest.
>> +         * Whether eptp bit 6 is set is already checked
>> +         * as part of A/D emulation.
>> +         */
>> +        vmcs12 = get_vmcs12(vcpu);
>> +        if (!nested_cpu_has_pml(vmcs12))
>> +            return 0;
> 
> Since above nested_vmx_check_pml_controls doesn't check EPT A/D bit in
> L1, seems we need to add this check here.

If EPT A/D is disabled we cannot get here (update_accessed_dirty_bits is
never called).

>> +
>> +        if (vmcs12->guest_pml_index > PML_ENTITY_NUM) {
>> +            vmx->nested.pml_full = true;
>> +            return 1;
>> +        }
> 
> Is the purpose of returning 1 to make upper layer code to inject PML
> full VMEXIt to L1 in nested_ept_inject_page_fault?

Yes, it triggers a fault
>> +
>> +        gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
>> +
>> +        page = nested_get_page(vcpu, vmcs12->pml_address);
>> +        if (!page)
>> +            return 0;
> 
> If PML is enabled in L1, I think nested_get_page should never return a
> NULL PML page (unless L1 does something wrong)? Probably better to
> return 1 rather than 0, and handle error in nested_ept_inject_page_fault
> according to vmcs12->pml_address?

This happens if the PML address is invalid (where on real hardware, the
write would just be "eaten") or MMIO (where we expect to diverge from
real hardware behavior).

>> +
>> +        pml_address = kmap(page);
>> +        pml_address[vmcs12->guest_pml_index--] = gpa;
> 
> This gpa is L2 guest's GPA. Do we also need to mark L1's GPA (which is
> related to L2 guest's GPA above) in to dirty-log? Or has this already
> been done?

L1's PML contains L1 host physical addresses, i.e. L0 guest physical
addresses.  This GPA comes from vmcs02 and hence it is L0's GPA.

L0's HPA is marked by hardware through PML, as usual.  If L0 has EPT A/D
but not PML, it can still provide emulated PML to L1, but L0's HPA will
be marked as dirty via write protection.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation
  2017-05-10 10:49   ` Huang, Kai
@ 2017-05-10 15:53     ` Bandan Das
  2017-05-10 23:23       ` Huang, Kai
  0 siblings, 1 reply; 15+ messages in thread
From: Bandan Das @ 2017-05-10 15:53 UTC (permalink / raw)
  To: Huang, Kai; +Cc: kvm, pbonzini, linux-kernel

Hi Kai,

"Huang, Kai" <kai.huang@linux.intel.com> writes:

> On 5/6/2017 7:25 AM, Bandan Das wrote:
>> When KVM updates accessed/dirty bits, this hook can be used
>> to invoke an arch specific function that implements/emulates
>> dirty logging such as PML.
>>
>> Signed-off-by: Bandan Das <bsd@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>  arch/x86/kvm/mmu.c              | 15 +++++++++++++++
>>  arch/x86/kvm/mmu.h              |  1 +
>>  arch/x86/kvm/paging_tmpl.h      |  4 ++++
>>  4 files changed, 22 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f5bddf92..9c761fe 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1020,6 +1020,8 @@ struct kvm_x86_ops {
>>  	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
>>  					   struct kvm_memory_slot *slot,
>>  					   gfn_t offset, unsigned long mask);
>> +	int (*write_log_dirty)(struct kvm_vcpu *vcpu);
>
> Hi,
>
> Thanks for adding PML to nested support!
>
> IMHO this callback is only used for write L2's dirty gpa to L1's PML
> buffer, so probably it's better to change the name to something like:
> nested_write_log_dirty.

The name was meant more to signify what it does: i.e. write dirty log rather
than where in the hierarchy it's being used :) But I guess, a nested_ prefix
doesn't hurt either.

Bandan

> Thanks,
> -Kai
>
>> +
>>  	/* pmu operations of sub-arch */
>>  	const struct kvm_pmu_ops *pmu_ops;
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 5586765..5d3376f 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1498,6 +1498,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>  		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>  }
>>
>> +/**
>> + * kvm_arch_write_log_dirty - emulate dirty page logging
>> + * @vcpu: Guest mode vcpu
>> + *
>> + * Emulate arch specific page modification logging for the
>> + * nested hypervisor
>> + */
>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
>> +{
>> +	if (kvm_x86_ops->write_log_dirty)
>> +		return kvm_x86_ops->write_log_dirty(vcpu);
>> +
>> +	return 0;
>> +}
>
> kvm_nested_arch_write_log_dirty?
>
>> +
>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>  				    struct kvm_memory_slot *slot, u64 gfn)
>>  {
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index d8ccb32..2797580 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -202,4 +202,5 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>  void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>  				    struct kvm_memory_slot *slot, u64 gfn);
>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 314d207..5624174 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -226,6 +226,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>>  		if (level == walker->level && write_fault &&
>>  				!(pte & PT_GUEST_DIRTY_MASK)) {
>>  			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
>> +#if PTTYPE == PTTYPE_EPT
>> +			if (kvm_arch_write_log_dirty(vcpu))
>> +				return -EINVAL;
>> +#endif
>>  			pte |= PT_GUEST_DIRTY_MASK;
>>  		}
>>  		if (pte == orig_pte)
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging
  2017-05-10 14:46     ` Paolo Bonzini
@ 2017-05-10 16:00       ` Bandan Das
  2017-05-10 23:21         ` Huang, Kai
  0 siblings, 1 reply; 15+ messages in thread
From: Bandan Das @ 2017-05-10 16:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Huang, Kai, kvm, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:
...
>> Is the purpose of returning 1 to make upper layer code to inject PML
>> full VMEXIt to L1 in nested_ept_inject_page_fault?
>
> Yes, it triggers a fault
>>> +
>>> +        gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
>>> +
>>> +        page = nested_get_page(vcpu, vmcs12->pml_address);
>>> +        if (!page)
>>> +            return 0;
>> 
>> If PML is enabled in L1, I think nested_get_page should never return a
>> NULL PML page (unless L1 does something wrong)? Probably better to
>> return 1 rather than 0, and handle error in nested_ept_inject_page_fault
>> according to vmcs12->pml_address?
>
> This happens if the PML address is invalid (where on real hardware, the
> write would just be "eaten") or MMIO (where we expect to diverge from

Yes, that was my motivation. On real hardware, the hypervisor would still
run except that the PML buffer is corrupt.

Bandan

> real hardware behavior).
>
>>> +
>>> +        pml_address = kmap(page);
>>> +        pml_address[vmcs12->guest_pml_index--] = gpa;
>> 
>> This gpa is L2 guest's GPA. Do we also need to mark L1's GPA (which is
>> related to L2 guest's GPA above) in to dirty-log? Or has this already
>> been done?
>
> L1's PML contains L1 host physical addresses, i.e. L0 guest physical
> addresses.  This GPA comes from vmcs02 and hence it is L0's GPA.
>
> L0's HPA is marked by hardware through PML, as usual.  If L0 has EPT A/D
> but not PML, it can still provide emulated PML to L1, but L0's HPA will
> be marked as dirty via write protection.
>
> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging
  2017-05-10 16:00       ` Bandan Das
@ 2017-05-10 23:21         ` Huang, Kai
  0 siblings, 0 replies; 15+ messages in thread
From: Huang, Kai @ 2017-05-10 23:21 UTC (permalink / raw)
  To: Bandan Das, Paolo Bonzini; +Cc: kvm, linux-kernel



On 5/11/2017 4:00 AM, Bandan Das wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> ...
>>> Is the purpose of returning 1 to make upper layer code to inject PML
>>> full VMEXIt to L1 in nested_ept_inject_page_fault?
>>
>> Yes, it triggers a fault
>>>> +
>>>> +        gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
>>>> +
>>>> +        page = nested_get_page(vcpu, vmcs12->pml_address);
>>>> +        if (!page)
>>>> +            return 0;
>>>
>>> If PML is enabled in L1, I think nested_get_page should never return a
>>> NULL PML page (unless L1 does something wrong)? Probably better to
>>> return 1 rather than 0, and handle error in nested_ept_inject_page_fault
>>> according to vmcs12->pml_address?
>>
>> This happens if the PML address is invalid (where on real hardware, the
>> write would just be "eaten") or MMIO (where we expect to diverge from
>
> Yes, that was my motivation. On real hardware, the hypervisor would still
> run except that the PML buffer is corrupt.

Right. Fine to me. :)

>
> Bandan
>
>> real hardware behavior).
>>
>>>> +
>>>> +        pml_address = kmap(page);
>>>> +        pml_address[vmcs12->guest_pml_index--] = gpa;
>>>
>>> This gpa is L2 guest's GPA. Do we also need to mark L1's GPA (which is
>>> related to L2 guest's GPA above) in to dirty-log? Or has this already
>>> been done?
>>
>> L1's PML contains L1 host physical addresses, i.e. L0 guest physical
>> addresses.  This GPA comes from vmcs02 and hence it is L0's GPA.

Do you mean pml_address? I was talking about gpa got from 
vmcs_read64(GUEST_PHYSICAL_ADDRESS). From hardware's point of view, PML 
always logs "GPA" into PML buffer so I was saying the gpa from 
vmcs_read64(GUEST_PHYSICAL_ADDRESS) should be L2 guest's PA. Anyway this 
is not important now. :)

>>
>> L0's HPA is marked by hardware through PML, as usual.  If L0 has EPT A/D
>> but not PML, it can still provide emulated PML to L1, but L0's HPA will
>> be marked as dirty via write protection.

Yes this is what I was thinking. For L0 PML takes care of L1 
hpyervisor's dirty page, while write protection takes care of dirty page 
from L2. No problem.

Thanks,
-Kai

>>
>> Paolo
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation
  2017-05-10 15:53     ` Bandan Das
@ 2017-05-10 23:23       ` Huang, Kai
  2017-05-11 18:36         ` Bandan Das
  0 siblings, 1 reply; 15+ messages in thread
From: Huang, Kai @ 2017-05-10 23:23 UTC (permalink / raw)
  To: Bandan Das; +Cc: kvm, pbonzini, linux-kernel



On 5/11/2017 3:53 AM, Bandan Das wrote:
> Hi Kai,
>
> "Huang, Kai" <kai.huang@linux.intel.com> writes:
>
>> On 5/6/2017 7:25 AM, Bandan Das wrote:
>>> When KVM updates accessed/dirty bits, this hook can be used
>>> to invoke an arch specific function that implements/emulates
>>> dirty logging such as PML.
>>>
>>> Signed-off-by: Bandan Das <bsd@redhat.com>
>>> ---
>>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>>  arch/x86/kvm/mmu.c              | 15 +++++++++++++++
>>>  arch/x86/kvm/mmu.h              |  1 +
>>>  arch/x86/kvm/paging_tmpl.h      |  4 ++++
>>>  4 files changed, 22 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index f5bddf92..9c761fe 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1020,6 +1020,8 @@ struct kvm_x86_ops {
>>>  	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
>>>  					   struct kvm_memory_slot *slot,
>>>  					   gfn_t offset, unsigned long mask);
>>> +	int (*write_log_dirty)(struct kvm_vcpu *vcpu);
>>
>> Hi,
>>
>> Thanks for adding PML to nested support!
>>
>> IMHO this callback is only used for write L2's dirty gpa to L1's PML
>> buffer, so probably it's better to change the name to something like:
>> nested_write_log_dirty.
>
> The name was meant more to signify what it does: i.e. write dirty log rather
> than where in the hierarchy it's being used :) But I guess, a nested_ prefix
> doesn't hurt either.

Hi Bandan,

I was just suggesting. You and Paolo still make the decision :)

Thanks,
-Kai
>
> Bandan
>
>> Thanks,
>> -Kai
>>
>>> +
>>>  	/* pmu operations of sub-arch */
>>>  	const struct kvm_pmu_ops *pmu_ops;
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 5586765..5d3376f 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -1498,6 +1498,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>>  		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>>  }
>>>
>>> +/**
>>> + * kvm_arch_write_log_dirty - emulate dirty page logging
>>> + * @vcpu: Guest mode vcpu
>>> + *
>>> + * Emulate arch specific page modification logging for the
>>> + * nested hypervisor
>>> + */
>>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
>>> +{
>>> +	if (kvm_x86_ops->write_log_dirty)
>>> +		return kvm_x86_ops->write_log_dirty(vcpu);
>>> +
>>> +	return 0;
>>> +}
>>
>> kvm_nested_arch_write_log_dirty?
>>
>>> +
>>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>>  				    struct kvm_memory_slot *slot, u64 gfn)
>>>  {
>>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>>> index d8ccb32..2797580 100644
>>> --- a/arch/x86/kvm/mmu.h
>>> +++ b/arch/x86/kvm/mmu.h
>>> @@ -202,4 +202,5 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>>  void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>>  				    struct kvm_memory_slot *slot, u64 gfn);
>>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>>  #endif
>>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>>> index 314d207..5624174 100644
>>> --- a/arch/x86/kvm/paging_tmpl.h
>>> +++ b/arch/x86/kvm/paging_tmpl.h
>>> @@ -226,6 +226,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>>>  		if (level == walker->level && write_fault &&
>>>  				!(pte & PT_GUEST_DIRTY_MASK)) {
>>>  			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
>>> +#if PTTYPE == PTTYPE_EPT
>>> +			if (kvm_arch_write_log_dirty(vcpu))
>>> +				return -EINVAL;
>>> +#endif
>>>  			pte |= PT_GUEST_DIRTY_MASK;
>>>  		}
>>>  		if (pte == orig_pte)
>>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation
  2017-05-10 23:23       ` Huang, Kai
@ 2017-05-11 18:36         ` Bandan Das
  0 siblings, 0 replies; 15+ messages in thread
From: Bandan Das @ 2017-05-11 18:36 UTC (permalink / raw)
  To: Huang, Kai; +Cc: kvm, pbonzini, linux-kernel

"Huang, Kai" <kai.huang@linux.intel.com> writes:

...
> Hi Bandan,
>
> I was just suggesting. You and Paolo still make the decision :)

Sure Kai, I don't mind the name change at all.
The maintainer has already picked this up and I don't think
the function name change is worth submitting a follow up.
Thank you very much for the review! :)

Bandan

> Thanks,
> -Kai
>>
>> Bandan
>>
>>> Thanks,
>>> -Kai
>>>
>>>> +
>>>>  	/* pmu operations of sub-arch */
>>>>  	const struct kvm_pmu_ops *pmu_ops;
>>>>
>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>> index 5586765..5d3376f 100644
>>>> --- a/arch/x86/kvm/mmu.c
>>>> +++ b/arch/x86/kvm/mmu.c
>>>> @@ -1498,6 +1498,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>>>  		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>>>  }
>>>>
>>>> +/**
>>>> + * kvm_arch_write_log_dirty - emulate dirty page logging
>>>> + * @vcpu: Guest mode vcpu
>>>> + *
>>>> + * Emulate arch specific page modification logging for the
>>>> + * nested hypervisor
>>>> + */
>>>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (kvm_x86_ops->write_log_dirty)
>>>> +		return kvm_x86_ops->write_log_dirty(vcpu);
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> kvm_nested_arch_write_log_dirty?
>>>
>>>> +
>>>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>>>  				    struct kvm_memory_slot *slot, u64 gfn)
>>>>  {
>>>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>>>> index d8ccb32..2797580 100644
>>>> --- a/arch/x86/kvm/mmu.h
>>>> +++ b/arch/x86/kvm/mmu.h
>>>> @@ -202,4 +202,5 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>>>  void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>>>>  bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
>>>>  				    struct kvm_memory_slot *slot, u64 gfn);
>>>> +int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>>>  #endif
>>>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>>>> index 314d207..5624174 100644
>>>> --- a/arch/x86/kvm/paging_tmpl.h
>>>> +++ b/arch/x86/kvm/paging_tmpl.h
>>>> @@ -226,6 +226,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>>>>  		if (level == walker->level && write_fault &&
>>>>  				!(pte & PT_GUEST_DIRTY_MASK)) {
>>>>  			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
>>>> +#if PTTYPE == PTTYPE_EPT
>>>> +			if (kvm_arch_write_log_dirty(vcpu))
>>>> +				return -EINVAL;
>>>> +#endif
>>>>  			pte |= PT_GUEST_DIRTY_MASK;
>>>>  		}
>>>>  		if (pte == orig_pte)
>>>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-05-11 18:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-05 19:25 [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Bandan Das
2017-05-05 19:25 ` [PATCH v2 1/3] kvm: x86: Add a hook for arch specific dirty logging emulation Bandan Das
2017-05-10 10:49   ` Huang, Kai
2017-05-10 15:53     ` Bandan Das
2017-05-10 23:23       ` Huang, Kai
2017-05-11 18:36         ` Bandan Das
2017-05-05 19:25 ` [PATCH v2 2/3] nVMX: Implement emulated Page Modification Logging Bandan Das
2017-05-10 10:48   ` Huang, Kai
2017-05-10 14:46     ` Paolo Bonzini
2017-05-10 16:00       ` Bandan Das
2017-05-10 23:21         ` Huang, Kai
2017-05-05 19:25 ` [PATCH v2 3/3] nVMX: Advertise PML to L1 hypervisor Bandan Das
2017-05-09 15:22 ` [PATCH v2 0/3] nVMX: Emulated Page Modification Logging for Nested Virtualization Paolo Bonzini
2017-05-09 16:03   ` Bandan Das
2017-05-09 16:04     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).