* [PATCH 0/2 v2] KVM: nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area
@ 2019-10-15 0:04 Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 1/2 " Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading Krish Sadhukhan
0 siblings, 2 replies; 4+ messages in thread
From: Krish Sadhukhan @ 2019-10-15 0:04 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, jmattson
v1 -> v2:
1. In patch# 1, the invalid VM-entry MSR-load area for vmcs02 is
now a system-wide entity. It is allocated and initialized
during VMX initialization. The exit qualification is now
contained in a 32-bit variable in 'struct nested_vmx'.
2. Patch# 2 is new. It rolls back MSR updates when VM-entry
fails due to invalid VM-entry MSR-load area.
Some VM-entry checks can be offloaded from KVM to hardware. But if we want to
do that, the current implementation of KVM creates a priority issue where the
order in which VM-entry checks need to be performed according to the SDM, is
not maintained. VM-entry fails in nested_vmx_enter_non_root_mode() if an error
is encountered while processing the entries in VM-entry MSR-load area. This
leads to VM-exit due to a VM-entry check that is supposed to be done after
any guest-state checks done in hardware. This patch fixes this priority issue
so that checks that can be offloaded to hardware can now be offloaded.
[PATCH 1/2 v2] nVMX: Defer error from VM-entry MSR-load area to until
[PATCH 2/2 v2] nVMX: Rollback MSR-load if VM-entry fails due to VM-entry
arch/x86/kvm/vmx/nested.c | 84 +++++++++++++++++++++++++++++++++++++++++++----
arch/x86/kvm/vmx/nested.h | 29 ++++++++++++++--
arch/x86/kvm/vmx/vmx.c | 18 ++++++++++
arch/x86/kvm/vmx/vmx.h | 14 ++++++++
4 files changed, 136 insertions(+), 9 deletions(-)
Krish Sadhukhan (2):
nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area
nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2 v2] KVM: nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area
2019-10-15 0:04 [PATCH 0/2 v2] KVM: nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area Krish Sadhukhan
@ 2019-10-15 0:04 ` Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading Krish Sadhukhan
1 sibling, 0 replies; 4+ messages in thread
From: Krish Sadhukhan @ 2019-10-15 0:04 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, jmattson
According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
performed in a certain order. Checks on MSRs that are loaded on VM-entry
from VM-entry MSR-load area, should be done after verifying VMCS controls,
host-state area and guest-state area. As KVM relies on CPU hardware to
perform some of these checks, we need to defer VM-exit due to invalid
VM-entry MSR-load area to until after CPU hardware completes the earlier
checks and is ready to do VMLAUNCH/VMRESUME.
In order to defer errors arising from invalid VM-entry MSR-load area in
vmcs12, we set up a single invalid entry, which is illegal according to
section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
due to MSR loading" after it completes checks on VMCS controls, host-state
area and guest-state area. We reflect a synthesized Exit Qualification to
our guest.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
arch/x86/kvm/vmx/nested.h | 15 +++++++++++++--
arch/x86/kvm/vmx/vmx.c | 18 ++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 6 ++++++
4 files changed, 68 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e76eb4f07f6c..cebdcb105ea8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3029,6 +3029,8 @@ static u8 vmx_has_apicv_interrupt(struct kvm_vcpu *vcpu)
static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12);
+extern struct vmx_msr_entry *vmcs12_invalid_msr_load_area;
+
/*
* If from_vmentry is false, this is being called from state restore (either RSM
* or KVM_SET_NESTED_STATE). Otherwise it's called from vmlaunch/vmresume.
@@ -3100,12 +3102,38 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
goto vmentry_fail_vmexit_guest_mode;
if (from_vmentry) {
- exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
exit_qual = nested_vmx_load_msr(vcpu,
vmcs12->vm_entry_msr_load_addr,
vmcs12->vm_entry_msr_load_count);
- if (exit_qual)
- goto vmentry_fail_vmexit_guest_mode;
+ if (exit_qual) {
+ /*
+ * According to section “VM Entries” in Intel SDM
+ * vol 3C, VM-entry checks are performed in a certain
+ * order. Checks on MSRs that are loaded on VM-entry
+ * from VM-entry MSR-load area, should be done after
+ * verifying VMCS controls, host-state area and
+ * guest-state area. As KVM relies on CPU hardware to
+ * perform some of these checks, we need to defer
+ * VM-exit due to invalid VM-entry MSR-load area to
+ * until after CPU hardware completes the earlier
+ * checks and is ready to do VMLAUNCH/VMRESUME.
+ *
+ * In order to defer errors arising from invalid
+ * VM-entry MSR-load area in vmcs12, we set up a
+ * single invalid entry, which is illegal according
+ * to section "Loading MSRs in Intel SDM vol 3C, in
+ * VM-entry MSR-load area of vmcs02. This will cause
+ * the CPU hardware to VM-exit with "VM-entry
+ * failure due to MSR loading" after it completes
+ * checks on VMCS controls, host-state area and
+ * guest-state area.
+ */
+ vmx->nested.invalid_msr_load_exit_qual = exit_qual;
+ vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
+ vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
+ __pa(vmcs12_invalid_msr_load_area));
+ vmx->nested.dirty_vmcs12 = true;
+ }
} else {
/*
* The MMU is not initialized to point at the right entities yet and
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 187d39bf0bf1..bb51ec8cf7da 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
u32 exit_reason)
{
+ u32 exit_qual;
u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+ struct vmx_msr_entry *addr;
/*
* At this point, the exit interruption info in exit_intr_info
@@ -81,8 +83,17 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
}
- nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
- vmcs_readl(EXIT_QUALIFICATION));
+ exit_qual = vmcs_readl(EXIT_QUALIFICATION);
+
+ addr = __va(vmcs_read64(VM_ENTRY_MSR_LOAD_ADDR));
+ if (addr && addr->index == MSR_FS_BASE &&
+ (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
+ EXIT_REASON_MSR_LOAD_FAIL))) {
+ exit_qual = (to_vmx(vcpu))->nested.invalid_msr_load_exit_qual;
+ }
+
+ nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
+
return 1;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e7970a2e8eae..7ece11322430 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7914,6 +7914,13 @@ static void vmx_cleanup_l1d_flush(void)
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
}
+/*
+ * This is used to set up an invalid VM-entry MSR-load area for vmcs02
+ * if an error is detected while processing the entries in VM-entry
+ * MSR-load area of vmcs12.
+ */
+struct vmx_msr_entry *vmcs12_invalid_msr_load_area = NULL;
+
static void vmx_exit(void)
{
#ifdef CONFIG_KEXEC_CORE
@@ -7947,6 +7954,9 @@ static void vmx_exit(void)
}
#endif
vmx_cleanup_l1d_flush();
+
+ if (vmcs12_invalid_msr_load_area)
+ kfree(vmcs12_invalid_msr_load_area);
}
module_exit(vmx_exit);
@@ -8012,6 +8022,14 @@ static int __init vmx_init(void)
#endif
vmx_check_vmcs12_offsets();
+ vmcs12_invalid_msr_load_area =
+ kzalloc(sizeof(struct vmx_msr_entry), GFP_KERNEL_ACCOUNT);
+ if (!vmcs12_invalid_msr_load_area) {
+ vmx_exit();
+ return 15;
+ }
+ vmcs12_invalid_msr_load_area->index = MSR_FS_BASE;
+
return 0;
}
module_init(vmx_init);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bee16687dc0b..ee7f40abd199 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -183,6 +183,12 @@ struct nested_vmx {
gpa_t hv_evmcs_vmptr;
struct kvm_host_map hv_evmcs_map;
struct hv_enlightened_vmcs *hv_evmcs;
+
+ /*
+ * This field is used for Exit Qualification when VM-entry fails
+ * due to invalid VM-entry MSR-load area in vmcs12.
+ */
+ u32 invalid_msr_load_exit_qual;
};
struct vcpu_vmx {
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading
2019-10-15 0:04 [PATCH 0/2 v2] KVM: nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 1/2 " Krish Sadhukhan
@ 2019-10-15 0:04 ` Krish Sadhukhan
2019-10-15 22:28 ` Jim Mattson
1 sibling, 1 reply; 4+ messages in thread
From: Krish Sadhukhan @ 2019-10-15 0:04 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, jmattson
If VM-entry fails due to VM-entry MSR-loading, the MSRs of the nested guests
need to be rolled back to their previous state in order for the guest to not
be in an inconsistent state.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
---
arch/x86/kvm/vmx/nested.c | 50 +++++++++++++++++++++++++++++++++++----
arch/x86/kvm/vmx/nested.h | 26 +++++++++++++++-----
arch/x86/kvm/vmx/vmx.h | 8 +++++++
3 files changed, 74 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cebdcb105ea8..bd8e7af5c1e5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -191,7 +191,7 @@ static int nested_vmx_failValid(struct kvm_vcpu *vcpu,
return kvm_skip_emulated_instruction(vcpu);
}
-static void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
+void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
{
/* TODO: not to reset guest simply here. */
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -894,11 +894,13 @@ static u32 nested_vmx_max_atomic_switch_msrs(struct kvm_vcpu *vcpu)
* as possible, process all valid entries before failing rather than precheck
* for a capacity violation.
*/
-static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count,
+ bool save)
{
u32 i;
struct vmx_msr_entry e;
u32 max_msr_list_size = nested_vmx_max_atomic_switch_msrs(vcpu);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
for (i = 0; i < count; i++) {
if (unlikely(i >= max_msr_list_size))
@@ -917,6 +919,16 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
__func__, i, e.index, e.reserved);
goto fail;
}
+ if (save) {
+ vmx->nested.vm_entry_msr_load_backup[i].index = e.index;
+ if (kvm_get_msr(vcpu, e.index,
+ &(vmx->nested.vm_entry_msr_load_backup[i].data))) {
+ pr_debug_ratelimited(
+ "%s cannot read MSR (%u, 0x%x)\n",
+ __func__, i, e.index);
+ goto fail;
+ }
+ }
if (kvm_set_msr(vcpu, e.index, e.value)) {
pr_debug_ratelimited(
"%s cannot write MSR (%u, 0x%x, 0x%llx)\n",
@@ -926,6 +938,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
}
return 0;
fail:
+ kfree(vmx->nested.vm_entry_msr_load_backup);
return i + 1;
}
@@ -973,6 +986,26 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
return 0;
}
+int nested_vmx_rollback_msr(struct kvm_vcpu *vcpu, u32 count)
+{
+ u32 i;
+ struct msr_data msr;
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ for (i = 0; i < count; i++) {
+ msr.host_initiated = false;
+ msr.index = vmx->nested.vm_entry_msr_load_backup[i].index;
+ msr.data = vmx->nested.vm_entry_msr_load_backup[i].data;
+ if (kvm_set_msr(vcpu, msr.index, msr.data)) {
+ pr_debug_ratelimited(
+ "%s WRMSR failed (%u, 0x%x, 0x%llx)\n",
+ __func__, i, msr.index, msr.data);
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
{
unsigned long invalid_mask;
@@ -3102,9 +3135,18 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
goto vmentry_fail_vmexit_guest_mode;
if (from_vmentry) {
+ u32 count = vmcs12->vm_entry_msr_load_count;
+
+ /* Save guest MSRs before we load them */
+ vmx->nested.vm_entry_msr_load_backup =
+ kcalloc(count, sizeof(struct msr_data), GFP_KERNEL_ACCOUNT);
+ if (!vmx->nested.vm_entry_msr_load_backup)
+ goto vmentry_fail_vmexit_guest_mode;
+
exit_qual = nested_vmx_load_msr(vcpu,
vmcs12->vm_entry_msr_load_addr,
- vmcs12->vm_entry_msr_load_count);
+ vmcs12->vm_entry_msr_load_count,
+ true);
if (exit_qual) {
/*
* According to section “VM Entries” in Intel SDM
@@ -3940,7 +3982,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
vmx_update_msr_bitmap(vcpu);
if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
- vmcs12->vm_exit_msr_load_count))
+ vmcs12->vm_exit_msr_load_count, false))
nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_MSR_FAIL);
}
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index bb51ec8cf7da..f951b2b338d2 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,6 +17,8 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry);
bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason);
void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
u32 exit_intr_info, unsigned long exit_qualification);
+void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator);
+int nested_vmx_rollback_msr(struct kvm_vcpu *vcpu, u32 count);
void nested_sync_vmcs12_to_shadow(struct kvm_vcpu *vcpu);
int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata);
@@ -66,7 +68,6 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
{
u32 exit_qual;
u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- struct vmx_msr_entry *addr;
/*
* At this point, the exit interruption info in exit_intr_info
@@ -85,11 +86,24 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
exit_qual = vmcs_readl(EXIT_QUALIFICATION);
- addr = __va(vmcs_read64(VM_ENTRY_MSR_LOAD_ADDR));
- if (addr && addr->index == MSR_FS_BASE &&
- (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
- EXIT_REASON_MSR_LOAD_FAIL))) {
- exit_qual = (to_vmx(vcpu))->nested.invalid_msr_load_exit_qual;
+ if (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
+ EXIT_REASON_MSR_LOAD_FAIL)) {
+
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ struct vmx_msr_entry *addr;
+
+ if (nested_vmx_rollback_msr(vcpu,
+ vmcs12->vm_entry_msr_load_count)) {
+ nested_vmx_abort(vcpu,
+ VMX_ABORT_SAVE_GUEST_MSR_FAIL);
+
+ kfree(to_vmx(vcpu)->nested.vm_entry_msr_load_backup);
+ }
+
+ addr = __va(vmcs_read64(VM_ENTRY_MSR_LOAD_ADDR));
+ if (addr && addr->index == MSR_FS_BASE)
+ exit_qual =
+ (to_vmx(vcpu))->nested.invalid_msr_load_exit_qual;
}
nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index ee7f40abd199..9a7c118036be 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -189,6 +189,14 @@ struct nested_vmx {
* due to invalid VM-entry MSR-load area in vmcs12.
*/
u32 invalid_msr_load_exit_qual;
+
+ /*
+ * This is used for backing up the MSRs of nested guests when
+ * those MSRs are loaded from VM-entry MSR-load area on VM-entry.
+ * If VM-entry fails due to VM-entry MSR-loading, we roll back
+ * the MSRs to the values saved here.
+ */
+ struct msr_data *vm_entry_msr_load_backup;
};
struct vcpu_vmx {
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading
2019-10-15 0:04 ` [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading Krish Sadhukhan
@ 2019-10-15 22:28 ` Jim Mattson
0 siblings, 0 replies; 4+ messages in thread
From: Jim Mattson @ 2019-10-15 22:28 UTC (permalink / raw)
To: Krish Sadhukhan; +Cc: kvm list, Paolo Bonzini, Radim Krčmář
On Mon, Oct 14, 2019 at 5:40 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> If VM-entry fails due to VM-entry MSR-loading, the MSRs of the nested guests
> need to be rolled back to their previous state in order for the guest to not
> be in an inconsistent state.
This change seems overly simplistic, and it also breaks the existing ABI.
> Suggested-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> ---
> arch/x86/kvm/vmx/nested.c | 50 +++++++++++++++++++++++++++++++++++----
> arch/x86/kvm/vmx/nested.h | 26 +++++++++++++++-----
> arch/x86/kvm/vmx/vmx.h | 8 +++++++
> 3 files changed, 74 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cebdcb105ea8..bd8e7af5c1e5 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -191,7 +191,7 @@ static int nested_vmx_failValid(struct kvm_vcpu *vcpu,
> return kvm_skip_emulated_instruction(vcpu);
> }
>
> -static void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
> +void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
> {
> /* TODO: not to reset guest simply here. */
> kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> @@ -894,11 +894,13 @@ static u32 nested_vmx_max_atomic_switch_msrs(struct kvm_vcpu *vcpu)
> * as possible, process all valid entries before failing rather than precheck
> * for a capacity violation.
> */
> -static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> +static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count,
> + bool save)
> {
> u32 i;
> struct vmx_msr_entry e;
> u32 max_msr_list_size = nested_vmx_max_atomic_switch_msrs(vcpu);
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> for (i = 0; i < count; i++) {
> if (unlikely(i >= max_msr_list_size))
> @@ -917,6 +919,16 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> __func__, i, e.index, e.reserved);
> goto fail;
> }
> + if (save) {
> + vmx->nested.vm_entry_msr_load_backup[i].index = e.index;
> + if (kvm_get_msr(vcpu, e.index,
> + &(vmx->nested.vm_entry_msr_load_backup[i].data))) {
> + pr_debug_ratelimited(
> + "%s cannot read MSR (%u, 0x%x)\n",
> + __func__, i, e.index);
> + goto fail;
This breaks the ABI, by requiring that all MSRs in the MSR-load list
have to be readable. Some, like IA32_PRED_CMD, are not.
> + }
> + }
> if (kvm_set_msr(vcpu, e.index, e.value)) {
> pr_debug_ratelimited(
> "%s cannot write MSR (%u, 0x%x, 0x%llx)\n",
> @@ -926,6 +938,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> }
> return 0;
> fail:
> + kfree(vmx->nested.vm_entry_msr_load_backup);
> return i + 1;
> }
>
> @@ -973,6 +986,26 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> return 0;
> }
>
> +int nested_vmx_rollback_msr(struct kvm_vcpu *vcpu, u32 count)
> +{
> + u32 i;
> + struct msr_data msr;
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + for (i = 0; i < count; i++) {
I wonder if this loop should go in the other direction, in case there
are dependencies among the MSR settings.
> + msr.host_initiated = false;
> + msr.index = vmx->nested.vm_entry_msr_load_backup[i].index;
> + msr.data = vmx->nested.vm_entry_msr_load_backup[i].data;
> + if (kvm_set_msr(vcpu, msr.index, msr.data)) {
> + pr_debug_ratelimited(
> + "%s WRMSR failed (%u, 0x%x, 0x%llx)\n",
> + __func__, i, msr.index, msr.data);
> + return -EINVAL;
This doesn't work with time-related MSRs, like
IA32_TIME_STAMP_COUNTER. Rather than "rolling back" to an earlier
value, you need to be able to negate the effect of the load that
should never have happened. Similarly, I don't think this works with
IA32_TSC_DEADLINE, if the original deadline has passed before you
saved it. I believe that writing a deadline in the past will result in
a spurious interrupt.
> + }
> + }
> + return 0;
> +}
> +
> static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
> {
> unsigned long invalid_mask;
> @@ -3102,9 +3135,18 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
> goto vmentry_fail_vmexit_guest_mode;
>
> if (from_vmentry) {
> + u32 count = vmcs12->vm_entry_msr_load_count;
> +
> + /* Save guest MSRs before we load them */
> + vmx->nested.vm_entry_msr_load_backup =
> + kcalloc(count, sizeof(struct msr_data), GFP_KERNEL_ACCOUNT);
> + if (!vmx->nested.vm_entry_msr_load_backup)
> + goto vmentry_fail_vmexit_guest_mode;
> +
Should the backup memory be allocated in advance, so that we don't
have this unarchitected VM-entry failure? If not, should this be
deferred until after the attempted VM-entry to vmcs02, to avoid
introducing yet another priority inversion?
> exit_qual = nested_vmx_load_msr(vcpu,
> vmcs12->vm_entry_msr_load_addr,
> - vmcs12->vm_entry_msr_load_count);
> + vmcs12->vm_entry_msr_load_count,
> + true);
> if (exit_qual) {
> /*
> * According to section “VM Entries” in Intel SDM
> @@ -3940,7 +3982,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> vmx_update_msr_bitmap(vcpu);
>
> if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
> - vmcs12->vm_exit_msr_load_count))
> + vmcs12->vm_exit_msr_load_count, false))
> nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_MSR_FAIL);
> }
>
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index bb51ec8cf7da..f951b2b338d2 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -17,6 +17,8 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry);
> bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason);
> void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
> u32 exit_intr_info, unsigned long exit_qualification);
> +void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator);
> +int nested_vmx_rollback_msr(struct kvm_vcpu *vcpu, u32 count);
> void nested_sync_vmcs12_to_shadow(struct kvm_vcpu *vcpu);
> int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
> int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata);
> @@ -66,7 +68,6 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
> {
> u32 exit_qual;
> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> - struct vmx_msr_entry *addr;
>
> /*
> * At this point, the exit interruption info in exit_intr_info
> @@ -85,11 +86,24 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>
> exit_qual = vmcs_readl(EXIT_QUALIFICATION);
>
> - addr = __va(vmcs_read64(VM_ENTRY_MSR_LOAD_ADDR));
> - if (addr && addr->index == MSR_FS_BASE &&
> - (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
> - EXIT_REASON_MSR_LOAD_FAIL))) {
> - exit_qual = (to_vmx(vcpu))->nested.invalid_msr_load_exit_qual;
> + if (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
> + EXIT_REASON_MSR_LOAD_FAIL)) {
> +
> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> + struct vmx_msr_entry *addr;
> +
> + if (nested_vmx_rollback_msr(vcpu,
> + vmcs12->vm_entry_msr_load_count)) {
> + nested_vmx_abort(vcpu,
> + VMX_ABORT_SAVE_GUEST_MSR_FAIL);
> +
> + kfree(to_vmx(vcpu)->nested.vm_entry_msr_load_backup);
> + }
Are we leaking the backup memory when the rollback succeeds?
> +
> + addr = __va(vmcs_read64(VM_ENTRY_MSR_LOAD_ADDR));
> + if (addr && addr->index == MSR_FS_BASE)
> + exit_qual =
> + (to_vmx(vcpu))->nested.invalid_msr_load_exit_qual;
> }
>
> nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index ee7f40abd199..9a7c118036be 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -189,6 +189,14 @@ struct nested_vmx {
> * due to invalid VM-entry MSR-load area in vmcs12.
> */
> u32 invalid_msr_load_exit_qual;
> +
> + /*
> + * This is used for backing up the MSRs of nested guests when
> + * those MSRs are loaded from VM-entry MSR-load area on VM-entry.
> + * If VM-entry fails due to VM-entry MSR-loading, we roll back
> + * the MSRs to the values saved here.
> + */
> + struct msr_data *vm_entry_msr_load_backup;
> };
>
> struct vcpu_vmx {
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-10-15 22:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15 0:04 [PATCH 0/2 v2] KVM: nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 1/2 " Krish Sadhukhan
2019-10-15 0:04 ` [PATCH 2/2 v2] KVM: nVMX: Rollback MSR-load if VM-entry fails due to VM-entry MSR-loading Krish Sadhukhan
2019-10-15 22:28 ` Jim Mattson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).