* [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti @ 2018-01-27 8:50 Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 1/3] KVM: nVMX: Eliminate vmcs02 pool Paolo Bonzini ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Paolo Bonzini @ 2018-01-27 8:50 UTC (permalink / raw) To: linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed David and others, the following changes since commit ba804bb4b72e57374b5f567b783aa0298fba0ce6: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-01-26 09:03:16 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git msr-bitmaps for you to fetch changes up to cf8870b9e2a7a35e596a03b903d0f5a06cd2ee3c: KVM: VMX: make MSR bitmaps per-VCPU (2018-01-26 22:59:32 +0100) The patches are on top of Linus's tree and I checked that they apply okay on top of the latest 4.14 tree as well as required for tip x86/pti. One extra commit is needed that is pretty safe and would have been merged next week. Radim, please pull this into kvm.git too. The merge is a bit messy, so I've placed my resolution on kvm.git, branch refs/heads/msr-bitmaps-merge-resolution. It's based on kvm/queue, assuming that all of kvm/queue will get into the pull requests for the 4.16 merge window (and thus soonish in kvm/next). Hopefully the next few releases will be much more uneventful... Thanks, Paolo Jim Mattson (1): KVM: nVMX: Eliminate vmcs02 pool Paolo Bonzini (2): KVM: VMX: introduce alloc_loaded_vmcs KVM: VMX: make MSR bitmaps per-VCPU arch/x86/kvm/vmx.c | 437 ++++++++++++++++++++++------------------------------- 1 file changed, 183 insertions(+), 254 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/3] KVM: nVMX: Eliminate vmcs02 pool 2018-01-27 8:50 [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti Paolo Bonzini @ 2018-01-27 8:50 ` Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs Paolo Bonzini ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2018-01-27 8:50 UTC (permalink / raw) To: linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed From: Jim Mattson <jmattson@google.com> The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Ameya More <ameya.more@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> --- arch/x86/kvm/vmx.c | 146 +++++++++-------------------------------------------- 1 file changed, 23 insertions(+), 123 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c829d89e2e63..ad6a883b7a32 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -185,7 +185,6 @@ extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 -#define VMCS02_POOL_SIZE 1 struct vmcs { u32 revision_id; @@ -226,7 +225,7 @@ struct shared_msr_entry { * stored in guest memory specified by VMPTRLD, but is opaque to the guest, * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. * More than one of these structures may exist, if L1 runs multiple L2 guests. - * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the * underlying hardware which will be used to run L2. * This structure is packed to ensure that its layout is identical across * machines (necessary for live migration). @@ -409,13 +408,6 @@ struct __packed vmcs12 { */ #define VMCS12_SIZE 0x1000 -/* Used to remember the last vmcs02 used for some recently used vmcs12s */ -struct vmcs02_list { - struct list_head list; - gpa_t vmptr; - struct loaded_vmcs vmcs02; -}; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -440,15 +432,15 @@ struct nested_vmx { */ bool sync_shadow_vmcs; - /* vmcs02_list cache of VMCSs recently used to run L2 guests */ - struct list_head vmcs02_pool; - int vmcs02_num; bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; + + struct loaded_vmcs vmcs02; + /* - * Guest pages referred to in vmcs02 with host-physical pointers, so - * we must keep them pinned while L2 runs. + * Guest pages referred to in the vmcs02 with host-physical + * pointers, so we must keep them pinned while L2 runs. */ struct page *apic_access_page; struct page *virtual_apic_page; @@ -6974,94 +6966,6 @@ static int handle_monitor(struct kvm_vcpu *vcpu) } /* - * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. - * We could reuse a single VMCS for all the L2 guests, but we also want the - * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this - * allows keeping them loaded on the processor, and in the future will allow - * optimizations where prepare_vmcs02 doesn't need to set all the fields on - * every entry if they never change. - * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE - * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first. - * - * The following functions allocate and free a vmcs02 in this pool. - */ - -/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */ -static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmx->nested.current_vmptr) { - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { - /* Recycle the least recently used VMCS. */ - item = list_last_entry(&vmx->nested.vmcs02_pool, - struct vmcs02_list, list); - item->vmptr = vmx->nested.current_vmptr; - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - /* Create a new VMCS */ - item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL); - if (!item) - return NULL; - item->vmcs02.vmcs = alloc_vmcs(); - item->vmcs02.shadow_vmcs = NULL; - if (!item->vmcs02.vmcs) { - kfree(item); - return NULL; - } - loaded_vmcs_init(&item->vmcs02); - item->vmptr = vmx->nested.current_vmptr; - list_add(&(item->list), &(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num++; - return &item->vmcs02; -} - -/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */ -static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmptr) { - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - return; - } -} - -/* - * Free all VMCSs saved for this vcpu, except the one pointed by - * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs - * must be &vmx->vmcs01. - */ -static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item, *n; - - WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01); - list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) { - /* - * Something will leak if the above WARN triggers. Better than - * a use-after-free. - */ - if (vmx->loaded_vmcs == &item->vmcs02) - continue; - - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - } -} - -/* * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(), * set the success or error code of an emulated VMX instruction, as specified * by Vol 2B, VMX Instruction Reference, "Conventions". @@ -7242,6 +7146,12 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + vmx->nested.vmcs02.vmcs = alloc_vmcs(); + vmx->nested.vmcs02.shadow_vmcs = NULL; + if (!vmx->nested.vmcs02.vmcs) + goto out_vmcs02; + loaded_vmcs_init(&vmx->nested.vmcs02); + if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); @@ -7264,9 +7174,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) vmx->vmcs01.shadow_vmcs = shadow_vmcs; } - INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num = 0; - hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; @@ -7281,6 +7188,9 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) free_page((unsigned long)vmx->nested.msr_bitmap); out_msr_bitmap: + free_loaded_vmcs(&vmx->nested.vmcs02); + +out_vmcs02: return -ENOMEM; } @@ -7434,7 +7344,7 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->vmcs01.shadow_vmcs = NULL; } kfree(vmx->nested.cached_vmcs12); - /* Unpin physical memory we referred to in current vmcs02 */ + /* Unpin physical memory we referred to in the vmcs02 */ if (vmx->nested.apic_access_page) { kvm_release_page_dirty(vmx->nested.apic_access_page); vmx->nested.apic_access_page = NULL; @@ -7450,7 +7360,7 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->nested.pi_desc = NULL; } - nested_free_all_saved_vmcss(vmx); + free_loaded_vmcs(&vmx->nested.vmcs02); } /* Emulate the VMXOFF instruction */ @@ -7493,8 +7403,6 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) vmptr + offsetof(struct vmcs12, launch_state), &zero, sizeof(zero)); - nested_free_vmcs02(vmx, vmptr); - nested_vmx_succeed(vcpu); return kvm_skip_emulated_instruction(vcpu); } @@ -8406,10 +8314,11 @@ static bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason) /* * The host physical addresses of some pages of guest memory - * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU - * may write to these pages via their host physical address while - * L2 is running, bypassing any address-translation-based dirty - * tracking (e.g. EPT write protection). + * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC + * Page). The CPU may write to these pages via their host + * physical address while L2 is running, bypassing any + * address-translation-based dirty tracking (e.g. EPT write + * protection). * * Mark them dirty on every exit from L2 to prevent them from * getting out of sync with dirty tracking. @@ -10903,20 +10812,15 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - struct loaded_vmcs *vmcs02; u32 msr_entry_idx; u32 exit_qual; - vmcs02 = nested_get_current_vmcs02(vmx); - if (!vmcs02) - return -ENOMEM; - enter_guest_mode(vcpu); if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); - vmx_switch_vmcs(vcpu, vmcs02); + vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); vmx_segment_cache_clear(vmx); if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) { @@ -11534,10 +11438,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vm_exit_controls_reset_shadow(vmx); vmx_segment_cache_clear(vmx); - /* if no vmcs02 cache requested, remove the one we used */ - if (VMCS02_POOL_SIZE == 0) - nested_free_vmcs02(vmx, vmx->nested.current_vmptr); - /* Update any VMCS fields that might have changed while L2 ran */ vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs 2018-01-27 8:50 [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 1/3] KVM: nVMX: Eliminate vmcs02 pool Paolo Bonzini @ 2018-01-27 8:50 ` Paolo Bonzini 2018-01-29 10:31 ` David Hildenbrand 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini 2018-01-29 12:53 ` [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti David Woodhouse 3 siblings, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2018-01-27 8:50 UTC (permalink / raw) To: linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/vmx.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ad6a883b7a32..ab4b9bc99a52 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3829,11 +3829,6 @@ static struct vmcs *alloc_vmcs_cpu(int cpu) return vmcs; } -static struct vmcs *alloc_vmcs(void) -{ - return alloc_vmcs_cpu(raw_smp_processor_id()); -} - static void free_vmcs(struct vmcs *vmcs) { free_pages((unsigned long)vmcs, vmcs_config.order); @@ -3852,6 +3847,22 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } +static struct vmcs *alloc_vmcs(void) +{ + return alloc_vmcs_cpu(raw_smp_processor_id()); +} + +static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) +{ + loaded_vmcs->vmcs = alloc_vmcs(); + if (!loaded_vmcs->vmcs) + return -ENOMEM; + + loaded_vmcs->shadow_vmcs = NULL; + loaded_vmcs_init(loaded_vmcs); + return 0; +} + static void free_kvm_area(void) { int cpu; @@ -7145,12 +7156,11 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + int r; - vmx->nested.vmcs02.vmcs = alloc_vmcs(); - vmx->nested.vmcs02.shadow_vmcs = NULL; - if (!vmx->nested.vmcs02.vmcs) + r = alloc_loaded_vmcs(&vmx->nested.vmcs02); + if (r < 0) goto out_vmcs02; - loaded_vmcs_init(&vmx->nested.vmcs02); if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = @@ -9545,13 +9555,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx->guest_msrs) goto free_pml; - vmx->loaded_vmcs = &vmx->vmcs01; - vmx->loaded_vmcs->vmcs = alloc_vmcs(); - vmx->loaded_vmcs->shadow_vmcs = NULL; - if (!vmx->loaded_vmcs->vmcs) + err = alloc_loaded_vmcs(&vmx->vmcs01); + if (err < 0) goto free_msrs; - loaded_vmcs_init(vmx->loaded_vmcs); + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); vmx->vcpu.cpu = cpu; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs 2018-01-27 8:50 ` [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs Paolo Bonzini @ 2018-01-29 10:31 ` David Hildenbrand 0 siblings, 0 replies; 14+ messages in thread From: David Hildenbrand @ 2018-01-29 10:31 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed On 27.01.2018 09:50, Paolo Bonzini wrote: > Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also > allocate an MSR bitmap there. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/vmx.c | 36 ++++++++++++++++++++++-------------- > 1 file changed, 22 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index ad6a883b7a32..ab4b9bc99a52 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -3829,11 +3829,6 @@ static struct vmcs *alloc_vmcs_cpu(int cpu) > return vmcs; > } > > -static struct vmcs *alloc_vmcs(void) > -{ > - return alloc_vmcs_cpu(raw_smp_processor_id()); > -} > - > static void free_vmcs(struct vmcs *vmcs) > { > free_pages((unsigned long)vmcs, vmcs_config.order); > @@ -3852,6 +3847,22 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) > WARN_ON(loaded_vmcs->shadow_vmcs != NULL); > } > > +static struct vmcs *alloc_vmcs(void) > +{ > + return alloc_vmcs_cpu(raw_smp_processor_id()); > +} > + > +static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) > +{ > + loaded_vmcs->vmcs = alloc_vmcs(); > + if (!loaded_vmcs->vmcs) > + return -ENOMEM; > + > + loaded_vmcs->shadow_vmcs = NULL; > + loaded_vmcs_init(loaded_vmcs); > + return 0; > +} > + > static void free_kvm_area(void) > { > int cpu; > @@ -7145,12 +7156,11 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > struct vmcs *shadow_vmcs; > + int r; > > - vmx->nested.vmcs02.vmcs = alloc_vmcs(); > - vmx->nested.vmcs02.shadow_vmcs = NULL; > - if (!vmx->nested.vmcs02.vmcs) > + r = alloc_loaded_vmcs(&vmx->nested.vmcs02); > + if (r < 0) > goto out_vmcs02; > - loaded_vmcs_init(&vmx->nested.vmcs02); > > if (cpu_has_vmx_msr_bitmap()) { > vmx->nested.msr_bitmap = > @@ -9545,13 +9555,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > if (!vmx->guest_msrs) > goto free_pml; > > - vmx->loaded_vmcs = &vmx->vmcs01; > - vmx->loaded_vmcs->vmcs = alloc_vmcs(); > - vmx->loaded_vmcs->shadow_vmcs = NULL; > - if (!vmx->loaded_vmcs->vmcs) > + err = alloc_loaded_vmcs(&vmx->vmcs01); > + if (err < 0) > goto free_msrs; > - loaded_vmcs_init(vmx->loaded_vmcs); > > + vmx->loaded_vmcs = &vmx->vmcs01; > cpu = get_cpu(); > vmx_vcpu_load(&vmx->vcpu, cpu); > vmx->vcpu.cpu = cpu; > Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-27 8:50 [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 1/3] KVM: nVMX: Eliminate vmcs02 pool Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs Paolo Bonzini @ 2018-01-27 8:50 ` Paolo Bonzini 2018-01-29 10:35 ` David Hildenbrand ` (2 more replies) 2018-01-29 12:53 ` [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti David Woodhouse 3 siblings, 3 replies; 14+ messages in thread From: Paolo Bonzini @ 2018-01-27 8:50 UTC (permalink / raw) To: linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed Place the MSR bitmap in struct loaded_vmcs, and update it in place every time the x2apic or APICv state can change. This is rare and the loop can handle 64 MSRs per iteration, in a similar fashion as nested_vmx_prepare_msr_bitmap. This prepares for choosing, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/vmx.c | 267 +++++++++++++++++++++++++++++------------------------ 1 file changed, 144 insertions(+), 123 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ab4b9bc99a52..34551f293881 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -111,6 +111,14 @@ static bool __read_mostly enable_pml = 1; module_param_named(pml, enable_pml, bool, S_IRUGO); +#define MSR_TYPE_R 1 +#define MSR_TYPE_W 2 +#define MSR_TYPE_RW 3 + +#define MSR_BITMAP_MODE_X2APIC 1 +#define MSR_BITMAP_MODE_X2APIC_APICV 2 +#define MSR_BITMAP_MODE_LM 4 + #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ @@ -209,6 +217,7 @@ struct loaded_vmcs { int soft_vnmi_blocked; ktime_t entry_time; s64 vnmi_blocked_time; + unsigned long *msr_bitmap; struct list_head loaded_vmcss_on_cpu_link; }; @@ -449,8 +458,6 @@ struct nested_vmx { bool pi_pending; u16 posted_intr_nv; - unsigned long *msr_bitmap; - struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -573,6 +580,7 @@ struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; u8 fail; + u8 msr_bitmap_mode; u32 exit_intr_info; u32 idt_vectoring_info; ulong rflags; @@ -927,6 +935,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -946,12 +955,6 @@ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, enum { VMX_IO_BITMAP_A, VMX_IO_BITMAP_B, - VMX_MSR_BITMAP_LEGACY, - VMX_MSR_BITMAP_LONGMODE, - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV, - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV, - VMX_MSR_BITMAP_LEGACY_X2APIC, - VMX_MSR_BITMAP_LONGMODE_X2APIC, VMX_VMREAD_BITMAP, VMX_VMWRITE_BITMAP, VMX_BITMAP_NR @@ -961,12 +964,6 @@ enum { #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A]) #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B]) -#define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY]) -#define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE]) -#define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV]) -#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV]) -#define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC]) -#define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC]) #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP]) #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP]) @@ -2564,36 +2561,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to) vmx->guest_msrs[from] = tmp; } -static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) -{ - unsigned long *msr_bitmap; - - if (is_guest_mode(vcpu)) - msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; - else if (cpu_has_secondary_exec_ctrls() && - (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & - SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { - if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv; - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; - } - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode; - else - msr_bitmap = vmx_msr_bitmap_legacy; - } - - vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); -} - /* * Set up the vmcs to automatically save and restore system * msrs. Don't touch the 64-bit msrs if the guest is in legacy @@ -2634,7 +2601,7 @@ static void setup_msrs(struct vcpu_vmx *vmx) vmx->save_nmsrs = save_nmsrs; if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(&vmx->vcpu); + vmx_update_msr_bitmap(&vmx->vcpu); } /* @@ -3844,6 +3811,8 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs_clear(loaded_vmcs); free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; + if (loaded_vmcs->msr_bitmap) + free_page((unsigned long)loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } @@ -3860,7 +3829,18 @@ static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs->shadow_vmcs = NULL; loaded_vmcs_init(loaded_vmcs); + + if (cpu_has_vmx_msr_bitmap()) { + loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); + if (!loaded_vmcs->msr_bitmap) + goto out_vmcs; + memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); + } return 0; + +out_vmcs: + free_loaded_vmcs(loaded_vmcs); + return -ENOMEM; } static void free_kvm_area(void) @@ -4921,10 +4901,8 @@ static void free_vpid(int vpid) spin_unlock(&vmx_vpid_lock); } -#define MSR_TYPE_R 1 -#define MSR_TYPE_W 2 -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type) +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) { int f = sizeof(unsigned long); @@ -4958,6 +4936,50 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, } } +static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) +{ + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return; + + /* + * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals + * have the write-low and read-high bitmap offsets the wrong way round. + * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff. + */ + if (msr <= 0x1fff) { + if (type & MSR_TYPE_R) + /* read-low */ + __set_bit(msr, msr_bitmap + 0x000 / f); + + if (type & MSR_TYPE_W) + /* write-low */ + __set_bit(msr, msr_bitmap + 0x800 / f); + + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + if (type & MSR_TYPE_R) + /* read-high */ + __set_bit(msr, msr_bitmap + 0x400 / f); + + if (type & MSR_TYPE_W) + /* write-high */ + __set_bit(msr, msr_bitmap + 0xc00 / f); + + } +} + +static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type, bool value) +{ + if (value) + vmx_enable_intercept_for_msr(msr_bitmap, msr, type); + else + vmx_disable_intercept_for_msr(msr_bitmap, msr, type); +} + /* * If a msr is allowed by L0, we should check whether it is allowed by L1. * The corresponding bit will be cleared unless both of L0 and L1 allow it. @@ -5004,28 +5026,68 @@ static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, } } -static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) +static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) { - if (!longmode_only) - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, - msr, MSR_TYPE_R | MSR_TYPE_W); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, - msr, MSR_TYPE_R | MSR_TYPE_W); + u8 mode = 0; + + if (cpu_has_secondary_exec_ctrls() && + (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { + mode |= MSR_BITMAP_MODE_X2APIC; + if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) + mode |= MSR_BITMAP_MODE_X2APIC_APICV; + } + + if (is_long_mode(vcpu)) + mode |= MSR_BITMAP_MODE_LM; + + return mode; } -static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active) +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) + +static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, + u8 mode) { - if (apicv_active) { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv, - msr, type); - } else { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, type); + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; } + + if (mode & MSR_BITMAP_MODE_X2APIC) { + /* + * TPR reads and writes can be virtualized even if virtual interrupt + * delivery is not in use. + */ + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); + } + } +} + +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; + u8 mode = vmx_msr_bitmap_mode(vcpu); + u8 changed = mode ^ vmx->msr_bitmap_mode; + + if (!changed) + return; + + vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, + !(mode & MSR_BITMAP_MODE_LM)); + + if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) + vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); + + vmx->msr_bitmap_mode = mode; } static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu) @@ -5277,7 +5339,7 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) } if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static u32 vmx_exec_control(struct vcpu_vmx *vmx) @@ -5464,7 +5526,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); } if (cpu_has_vmx_msr_bitmap()) - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); + vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ @@ -6747,7 +6809,7 @@ void vmx_enable_tdp(void) static __init int hardware_setup(void) { - int r = -ENOMEM, i, msr; + int r = -ENOMEM, i; rdmsrl_safe(MSR_EFER, &host_efer); @@ -6767,9 +6829,6 @@ static __init int hardware_setup(void) memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - if (setup_vmcs_config(&vmcs_config) < 0) { r = -EIO; goto out; @@ -6838,42 +6897,8 @@ static __init int hardware_setup(void) kvm_tsc_scaling_ratio_frac_bits = 48; } - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); - - memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic_apicv, - vmx_msr_bitmap_longmode, PAGE_SIZE); - memcpy(vmx_msr_bitmap_legacy_x2apic, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic, - vmx_msr_bitmap_longmode, PAGE_SIZE); - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ - for (msr = 0x800; msr <= 0x8ff; msr++) { - if (msr == 0x839 /* TMCCT */) - continue; - vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true); - } - - /* - * TPR reads and writes can be virtualized even if virtual interrupt - * delivery is not in use. - */ - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true); - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false); - - /* EOI */ - vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true); - /* SELF-IPI */ - vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true); - if (enable_ept) vmx_enable_tdp(); else @@ -7162,13 +7187,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) if (r < 0) goto out_vmcs02; - if (cpu_has_vmx_msr_bitmap()) { - vmx->nested.msr_bitmap = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx->nested.msr_bitmap) - goto out_msr_bitmap; - } - vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL); if (!vmx->nested.cached_vmcs12) goto out_cached_vmcs12; @@ -7195,9 +7213,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) kfree(vmx->nested.cached_vmcs12); out_cached_vmcs12: - free_page((unsigned long)vmx->nested.msr_bitmap); - -out_msr_bitmap: free_loaded_vmcs(&vmx->nested.vmcs02); out_vmcs02: @@ -7343,10 +7358,6 @@ static void free_nested(struct vcpu_vmx *vmx) free_vpid(vmx->nested.vpid02); vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; - if (vmx->nested.msr_bitmap) { - free_page((unsigned long)vmx->nested.msr_bitmap); - vmx->nested.msr_bitmap = NULL; - } if (enable_shadow_vmcs) { vmx_disable_shadow_vmcs(vmx); vmcs_clear(vmx->vmcs01.shadow_vmcs); @@ -8862,7 +8873,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) } vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) @@ -9523,6 +9534,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + unsigned long *msr_bitmap; int cpu; if (!vmx) @@ -9559,6 +9571,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (err < 0) goto free_msrs; + msr_bitmap = vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); + vmx->msr_bitmap_mode = 0; + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, int msr; struct page *page; unsigned long *msr_bitmap_l1; - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* This shortcut is ok because we support only x2APIC MSRs so far. */ if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) @@ -11397,7 +11418,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, vmcs_write64(GUEST_IA32_DEBUGCTL, 0); if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini @ 2018-01-29 10:35 ` David Hildenbrand 2018-01-30 13:07 ` [v2,3/3] " Mihai Carabas 2018-01-30 16:23 ` [PATCH v2 3/3] " Radim Krčmář 2 siblings, 0 replies; 14+ messages in thread From: David Hildenbrand @ 2018-01-29 10:35 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed On 27.01.2018 09:50, Paolo Bonzini wrote: > Place the MSR bitmap in struct loaded_vmcs, and update it in place > every time the x2apic or APICv state can change. This is rare and > the loop can handle 64 MSRs per iteration, in a similar fashion as > nested_vmx_prepare_msr_bitmap. > > This prepares for choosing, on a per-VM basis, whether to intercept > the SPEC_CTRL and PRED_CMD MSRs. > > Suggested-by: Jim Mattson <jmattson@google.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- I really like this change and didn't spot anything obvious. Acked-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2,3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini 2018-01-29 10:35 ` David Hildenbrand @ 2018-01-30 13:07 ` Mihai Carabas 2018-01-30 16:23 ` [PATCH v2 3/3] " Radim Krčmář 2 siblings, 0 replies; 14+ messages in thread From: Mihai Carabas @ 2018-01-30 13:07 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: Radim Krčmář, David Woodhouse, KarimAllah Ahmed, Konrad Rzeszutek Wilk Hello Paolo, On 27.01.2018 10:50, Paolo Bonzini wrote: > Place the MSR bitmap in struct loaded_vmcs, and update it in place > every time the x2apic or APICv state can change. This is rare and > the loop can handle 64 MSRs per iteration, in a similar fashion as > nested_vmx_prepare_msr_bitmap. I've back-ported this patch set on 4.1 and made some successful tests. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> > > This prepares for choosing, on a per-VM basis, whether to intercept > the SPEC_CTRL and PRED_CMD MSRs. > > Suggested-by: Jim Mattson <jmattson@google.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Acked-by: David Hildenbrand <david@redhat.com> > --- > arch/x86/kvm/vmx.c | 267 +++++++++++++++++++++++++++++------------------------ > 1 file changed, 144 insertions(+), 123 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index ab4b9bc99a52..34551f293881 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -111,6 +111,14 @@ > static bool __read_mostly enable_pml = 1; > module_param_named(pml, enable_pml, bool, S_IRUGO); > > +#define MSR_TYPE_R 1 > +#define MSR_TYPE_W 2 > +#define MSR_TYPE_RW 3 > + > +#define MSR_BITMAP_MODE_X2APIC 1 > +#define MSR_BITMAP_MODE_X2APIC_APICV 2 > +#define MSR_BITMAP_MODE_LM 4 > + > #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL > > /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ > @@ -209,6 +217,7 @@ struct loaded_vmcs { > int soft_vnmi_blocked; > ktime_t entry_time; > s64 vnmi_blocked_time; > + unsigned long *msr_bitmap; > struct list_head loaded_vmcss_on_cpu_link; > }; > > @@ -449,8 +458,6 @@ struct nested_vmx { > bool pi_pending; > u16 posted_intr_nv; > > - unsigned long *msr_bitmap; > - > struct hrtimer preemption_timer; > bool preemption_timer_expired; > > @@ -573,6 +580,7 @@ struct vcpu_vmx { > struct kvm_vcpu vcpu; > unsigned long host_rsp; > u8 fail; > + u8 msr_bitmap_mode; > u32 exit_intr_info; > u32 idt_vectoring_info; > ulong rflags; > @@ -927,6 +935,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, > static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); > static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > u16 error_code); > +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > @@ -946,12 +955,6 @@ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > enum { > VMX_IO_BITMAP_A, > VMX_IO_BITMAP_B, > - VMX_MSR_BITMAP_LEGACY, > - VMX_MSR_BITMAP_LONGMODE, > - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV, > - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV, > - VMX_MSR_BITMAP_LEGACY_X2APIC, > - VMX_MSR_BITMAP_LONGMODE_X2APIC, > VMX_VMREAD_BITMAP, > VMX_VMWRITE_BITMAP, > VMX_BITMAP_NR > @@ -961,12 +964,6 @@ enum { > > #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A]) > #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B]) > -#define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY]) > -#define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE]) > -#define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV]) > -#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV]) > -#define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC]) > -#define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC]) > #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP]) > #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP]) > > @@ -2564,36 +2561,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to) > vmx->guest_msrs[from] = tmp; > } > > -static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) > -{ > - unsigned long *msr_bitmap; > - > - if (is_guest_mode(vcpu)) > - msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; > - else if (cpu_has_secondary_exec_ctrls() && > - (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & > - SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { > - if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { > - if (is_long_mode(vcpu)) > - msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv; > - else > - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv; > - } else { > - if (is_long_mode(vcpu)) > - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; > - else > - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; > - } > - } else { > - if (is_long_mode(vcpu)) > - msr_bitmap = vmx_msr_bitmap_longmode; > - else > - msr_bitmap = vmx_msr_bitmap_legacy; > - } > - > - vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); > -} > - > /* > * Set up the vmcs to automatically save and restore system > * msrs. Don't touch the 64-bit msrs if the guest is in legacy > @@ -2634,7 +2601,7 @@ static void setup_msrs(struct vcpu_vmx *vmx) > vmx->save_nmsrs = save_nmsrs; > > if (cpu_has_vmx_msr_bitmap()) > - vmx_set_msr_bitmap(&vmx->vcpu); > + vmx_update_msr_bitmap(&vmx->vcpu); > } > > /* > @@ -3844,6 +3811,8 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) > loaded_vmcs_clear(loaded_vmcs); > free_vmcs(loaded_vmcs->vmcs); > loaded_vmcs->vmcs = NULL; > + if (loaded_vmcs->msr_bitmap) > + free_page((unsigned long)loaded_vmcs->msr_bitmap); > WARN_ON(loaded_vmcs->shadow_vmcs != NULL); > } > > @@ -3860,7 +3829,18 @@ static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) > > loaded_vmcs->shadow_vmcs = NULL; > loaded_vmcs_init(loaded_vmcs); > + > + if (cpu_has_vmx_msr_bitmap()) { > + loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); > + if (!loaded_vmcs->msr_bitmap) > + goto out_vmcs; > + memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); > + } > return 0; > + > +out_vmcs: > + free_loaded_vmcs(loaded_vmcs); > + return -ENOMEM; > } > > static void free_kvm_area(void) > @@ -4921,10 +4901,8 @@ static void free_vpid(int vpid) > spin_unlock(&vmx_vpid_lock); > } > > -#define MSR_TYPE_R 1 > -#define MSR_TYPE_W 2 > -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > - u32 msr, int type) > +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type) > { > int f = sizeof(unsigned long); > > @@ -4958,6 +4936,50 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > } > } > > +static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type) > +{ > + int f = sizeof(unsigned long); > + > + if (!cpu_has_vmx_msr_bitmap()) > + return; > + > + /* > + * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals > + * have the write-low and read-high bitmap offsets the wrong way round. > + * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff. > + */ > + if (msr <= 0x1fff) { > + if (type & MSR_TYPE_R) > + /* read-low */ > + __set_bit(msr, msr_bitmap + 0x000 / f); > + > + if (type & MSR_TYPE_W) > + /* write-low */ > + __set_bit(msr, msr_bitmap + 0x800 / f); > + > + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { > + msr &= 0x1fff; > + if (type & MSR_TYPE_R) > + /* read-high */ > + __set_bit(msr, msr_bitmap + 0x400 / f); > + > + if (type & MSR_TYPE_W) > + /* write-high */ > + __set_bit(msr, msr_bitmap + 0xc00 / f); > + > + } > +} > + > +static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type, bool value) > +{ > + if (value) > + vmx_enable_intercept_for_msr(msr_bitmap, msr, type); > + else > + vmx_disable_intercept_for_msr(msr_bitmap, msr, type); > +} > + > /* > * If a msr is allowed by L0, we should check whether it is allowed by L1. > * The corresponding bit will be cleared unless both of L0 and L1 allow it. > @@ -5004,28 +5026,68 @@ static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, > } > } > > -static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) > +static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) > { > - if (!longmode_only) > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, > - msr, MSR_TYPE_R | MSR_TYPE_W); > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, > - msr, MSR_TYPE_R | MSR_TYPE_W); > + u8 mode = 0; > + > + if (cpu_has_secondary_exec_ctrls() && > + (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & > + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { > + mode |= MSR_BITMAP_MODE_X2APIC; > + if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) > + mode |= MSR_BITMAP_MODE_X2APIC_APICV; > + } > + > + if (is_long_mode(vcpu)) > + mode |= MSR_BITMAP_MODE_LM; > + > + return mode; > } > > -static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active) > +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) > + > +static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, > + u8 mode) > { > - if (apicv_active) { > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv, > - msr, type); > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv, > - msr, type); > - } else { > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, > - msr, type); > - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, > - msr, type); > + int msr; > + > + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { > + unsigned word = msr / BITS_PER_LONG; > + msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; > + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; > } > + > + if (mode & MSR_BITMAP_MODE_X2APIC) { > + /* > + * TPR reads and writes can be virtualized even if virtual interrupt > + * delivery is not in use. > + */ > + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); > + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { > + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); > + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); > + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); > + } > + } > +} > + > +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; > + u8 mode = vmx_msr_bitmap_mode(vcpu); > + u8 changed = mode ^ vmx->msr_bitmap_mode; > + > + if (!changed) > + return; > + > + vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, > + !(mode & MSR_BITMAP_MODE_LM)); > + > + if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) > + vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); > + > + vmx->msr_bitmap_mode = mode; > } > > static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu) > @@ -5277,7 +5339,7 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) > } > > if (cpu_has_vmx_msr_bitmap()) > - vmx_set_msr_bitmap(vcpu); > + vmx_update_msr_bitmap(vcpu); > } > > static u32 vmx_exec_control(struct vcpu_vmx *vmx) > @@ -5464,7 +5526,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) > vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); > } > if (cpu_has_vmx_msr_bitmap()) > - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); > + vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); > > vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ > > @@ -6747,7 +6809,7 @@ void vmx_enable_tdp(void) > > static __init int hardware_setup(void) > { > - int r = -ENOMEM, i, msr; > + int r = -ENOMEM, i; > > rdmsrl_safe(MSR_EFER, &host_efer); > > @@ -6767,9 +6829,6 @@ static __init int hardware_setup(void) > > memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); > > - memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); > - memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); > - > if (setup_vmcs_config(&vmcs_config) < 0) { > r = -EIO; > goto out; > @@ -6838,42 +6897,8 @@ static __init int hardware_setup(void) > kvm_tsc_scaling_ratio_frac_bits = 48; > } > > - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); > - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); > - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); > - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); > - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); > - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); > - > - memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, > - vmx_msr_bitmap_legacy, PAGE_SIZE); > - memcpy(vmx_msr_bitmap_longmode_x2apic_apicv, > - vmx_msr_bitmap_longmode, PAGE_SIZE); > - memcpy(vmx_msr_bitmap_legacy_x2apic, > - vmx_msr_bitmap_legacy, PAGE_SIZE); > - memcpy(vmx_msr_bitmap_longmode_x2apic, > - vmx_msr_bitmap_longmode, PAGE_SIZE); > - > set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ > > - for (msr = 0x800; msr <= 0x8ff; msr++) { > - if (msr == 0x839 /* TMCCT */) > - continue; > - vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true); > - } > - > - /* > - * TPR reads and writes can be virtualized even if virtual interrupt > - * delivery is not in use. > - */ > - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true); > - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false); > - > - /* EOI */ > - vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true); > - /* SELF-IPI */ > - vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true); > - > if (enable_ept) > vmx_enable_tdp(); > else > @@ -7162,13 +7187,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) > if (r < 0) > goto out_vmcs02; > > - if (cpu_has_vmx_msr_bitmap()) { > - vmx->nested.msr_bitmap = > - (unsigned long *)__get_free_page(GFP_KERNEL); > - if (!vmx->nested.msr_bitmap) > - goto out_msr_bitmap; > - } > - > vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL); > if (!vmx->nested.cached_vmcs12) > goto out_cached_vmcs12; > @@ -7195,9 +7213,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) > kfree(vmx->nested.cached_vmcs12); > > out_cached_vmcs12: > - free_page((unsigned long)vmx->nested.msr_bitmap); > - > -out_msr_bitmap: > free_loaded_vmcs(&vmx->nested.vmcs02); > > out_vmcs02: > @@ -7343,10 +7358,6 @@ static void free_nested(struct vcpu_vmx *vmx) > free_vpid(vmx->nested.vpid02); > vmx->nested.posted_intr_nv = -1; > vmx->nested.current_vmptr = -1ull; > - if (vmx->nested.msr_bitmap) { > - free_page((unsigned long)vmx->nested.msr_bitmap); > - vmx->nested.msr_bitmap = NULL; > - } > if (enable_shadow_vmcs) { > vmx_disable_shadow_vmcs(vmx); > vmcs_clear(vmx->vmcs01.shadow_vmcs); > @@ -8862,7 +8873,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) > } > vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); > > - vmx_set_msr_bitmap(vcpu); > + vmx_update_msr_bitmap(vcpu); > } > > static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) > @@ -9523,6 +9534,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > { > int err; > struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); > + unsigned long *msr_bitmap; > int cpu; > > if (!vmx) > @@ -9559,6 +9571,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > if (err < 0) > goto free_msrs; > > + msr_bitmap = vmx->vmcs01.msr_bitmap; > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); > + vmx->msr_bitmap_mode = 0; > + > vmx->loaded_vmcs = &vmx->vmcs01; > cpu = get_cpu(); > vmx_vcpu_load(&vmx->vcpu, cpu); > @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > int msr; > struct page *page; > unsigned long *msr_bitmap_l1; > - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; > + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; > > /* This shortcut is ok because we support only x2APIC MSRs so far. */ > if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) > @@ -11397,7 +11418,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > vmcs_write64(GUEST_IA32_DEBUGCTL, 0); > > if (cpu_has_vmx_msr_bitmap()) > - vmx_set_msr_bitmap(vcpu); > + vmx_update_msr_bitmap(vcpu); > > if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, > vmcs12->vm_exit_msr_load_count)) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini 2018-01-29 10:35 ` David Hildenbrand 2018-01-30 13:07 ` [v2,3/3] " Mihai Carabas @ 2018-01-30 16:23 ` Radim Krčmář 2018-01-30 16:30 ` David Woodhouse 2018-01-31 17:37 ` Paolo Bonzini 2 siblings, 2 replies; 14+ messages in thread From: Radim Krčmář @ 2018-01-30 16:23 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, David Woodhouse, KarimAllah Ahmed 2018-01-27 09:50+0100, Paolo Bonzini: > Place the MSR bitmap in struct loaded_vmcs, and update it in place > every time the x2apic or APICv state can change. This is rare and > the loop can handle 64 MSRs per iteration, in a similar fashion as > nested_vmx_prepare_msr_bitmap. > > This prepares for choosing, on a per-VM basis, whether to intercept > the SPEC_CTRL and PRED_CMD MSRs. > > Suggested-by: Jim Mattson <jmattson@google.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > int msr; > struct page *page; > unsigned long *msr_bitmap_l1; > - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; > + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; The physical address of the nested msr_bitmap is never loaded into vmcs. The resolution you provided had extra hunk in prepare_vmcs02_full(): + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); I have queued that as: + if (cpu_has_vmx_msr_bitmap()) + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); but it should be a part of the patch or a followup fix. Is the branch already merged into PTI? Thanks. > > /* This shortcut is ok because we support only x2APIC MSRs so far. */ > if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) > @@ -11397,7 +11418,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > vmcs_write64(GUEST_IA32_DEBUGCTL, 0); > > if (cpu_has_vmx_msr_bitmap()) > - vmx_set_msr_bitmap(vcpu); > + vmx_update_msr_bitmap(vcpu); > > if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, > vmcs12->vm_exit_msr_load_count)) > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-30 16:23 ` [PATCH v2 3/3] " Radim Krčmář @ 2018-01-30 16:30 ` David Woodhouse 2018-01-31 17:37 ` Paolo Bonzini 1 sibling, 0 replies; 14+ messages in thread From: David Woodhouse @ 2018-01-30 16:30 UTC (permalink / raw) To: Radim Krčmář, Paolo Bonzini Cc: linux-kernel, kvm, KarimAllah Ahmed [-- Attachment #1: Type: text/plain, Size: 798 bytes --] On Tue, 2018-01-30 at 17:23 +0100, Radim Krčmář wrote: > > The physical address of the nested msr_bitmap is never loaded into vmcs. > > The resolution you provided had extra hunk in prepare_vmcs02_full(): > > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); > > I have queued that as: > > + if (cpu_has_vmx_msr_bitmap()) > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); > > but it should be a part of the patch or a followup fix. > > Is the branch already merged into PTI? No, we've never seen a 4.14-based branch that could be merged. I made one myself for the moment but assumed there would be one from Paulo that was then pulled into both tip/x86/pti and the kvm.git tree. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-30 16:23 ` [PATCH v2 3/3] " Radim Krčmář 2018-01-30 16:30 ` David Woodhouse @ 2018-01-31 17:37 ` Paolo Bonzini 2018-01-31 18:14 ` Radim Krčmář 1 sibling, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2018-01-31 17:37 UTC (permalink / raw) To: Radim Krčmář Cc: linux-kernel, kvm, David Woodhouse, KarimAllah Ahmed On 30/01/2018 11:23, Radim Krčmář wrote: > 2018-01-27 09:50+0100, Paolo Bonzini: >> Place the MSR bitmap in struct loaded_vmcs, and update it in place >> every time the x2apic or APICv state can change. This is rare and >> the loop can handle 64 MSRs per iteration, in a similar fashion as >> nested_vmx_prepare_msr_bitmap. >> >> This prepares for choosing, on a per-VM basis, whether to intercept >> the SPEC_CTRL and PRED_CMD MSRs. >> >> Suggested-by: Jim Mattson <jmattson@google.com> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, >> int msr; >> struct page *page; >> unsigned long *msr_bitmap_l1; >> - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; >> + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; > > The physical address of the nested msr_bitmap is never loaded into vmcs. > > The resolution you provided had extra hunk in prepare_vmcs02_full(): > > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); > > I have queued that as: > > + if (cpu_has_vmx_msr_bitmap()) > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); Hmm you're right, it should be in prepare_vmcs02() here (4.15-based), and then moved to prepare_vmcs02_full() as part of the conflict resolution. I'll send a v3. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU 2018-01-31 17:37 ` Paolo Bonzini @ 2018-01-31 18:14 ` Radim Krčmář 0 siblings, 0 replies; 14+ messages in thread From: Radim Krčmář @ 2018-01-31 18:14 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, David Woodhouse, KarimAllah Ahmed 2018-01-31 12:37-0500, Paolo Bonzini: > On 30/01/2018 11:23, Radim Krčmář wrote: > > 2018-01-27 09:50+0100, Paolo Bonzini: > >> Place the MSR bitmap in struct loaded_vmcs, and update it in place > >> every time the x2apic or APICv state can change. This is rare and > >> the loop can handle 64 MSRs per iteration, in a similar fashion as > >> nested_vmx_prepare_msr_bitmap. > >> > >> This prepares for choosing, on a per-VM basis, whether to intercept > >> the SPEC_CTRL and PRED_CMD MSRs. > >> > >> Suggested-by: Jim Mattson <jmattson@google.com> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > >> --- > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > >> int msr; > >> struct page *page; > >> unsigned long *msr_bitmap_l1; > >> - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; > >> + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; > > > > The physical address of the nested msr_bitmap is never loaded into vmcs. > > > > The resolution you provided had extra hunk in prepare_vmcs02_full(): > > > > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); > > > > I have queued that as: > > > > + if (cpu_has_vmx_msr_bitmap()) > > + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); > > Hmm you're right, it should be in prepare_vmcs02() here (4.15-based), > and then moved to prepare_vmcs02_full() as part of the conflict resolution. It also makes sense to have it in nested_get_vmcs12_pages, where we call nested_vmx_prepare_msr_bitmap() and disable MSR bitmaps. > I'll send a v3. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti 2018-01-27 8:50 [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti Paolo Bonzini ` (2 preceding siblings ...) 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini @ 2018-01-29 12:53 ` David Woodhouse 2018-01-29 14:28 ` Paolo Bonzini 3 siblings, 1 reply; 14+ messages in thread From: David Woodhouse @ 2018-01-29 12:53 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: Radim Krčmář, KarimAllah Ahmed [-- Attachment #1: Type: text/plain, Size: 782 bytes --] On Sat, 2018-01-27 at 09:50 +0100, Paolo Bonzini wrote: > David and others, > > the following changes since commit ba804bb4b72e57374b5f567b783aa0298fba0ce6: > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-01-26 09:03:16 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git msr-bitmaps Hm, we are pushing the other bits through tip/x86/pti, which is still based on 4.14 so that everything can be backported easily. I was expecting to be able to pull a clean 4.14-based tree which you had *also* pulled into the latest kvm.git and resolved any merge issues... but I just pulled a whole bunch of unrelated post-4.14 changes into my working tree. How do you want to handle this? [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti 2018-01-29 12:53 ` [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti David Woodhouse @ 2018-01-29 14:28 ` Paolo Bonzini 2018-01-29 14:57 ` David Woodhouse 0 siblings, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2018-01-29 14:28 UTC (permalink / raw) To: David Woodhouse, linux-kernel, kvm Cc: Radim Krčmář, KarimAllah Ahmed On 29/01/2018 13:53, David Woodhouse wrote: > Hm, we are pushing the other bits through tip/x86/pti, which is still > based on 4.14 so that everything can be backported easily. I was > expecting to be able to pull a clean 4.14-based tree Anything 4.14-based would have had conflicts all over due to the changes that have already gone in for tip/x86/pti. These three patches do cherry-pick cleanly on top of 4.14. If you give me the tree and commit id that you want me to use as a base, I can rebase and give you a new topic branch. Thanks, Paolo > which you had > *also* pulled into the latest kvm.git and resolved any merge issues... > but I just pulled a whole bunch of unrelated post-4.14 changes into my > working tree. > > How do you want to handle this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti 2018-01-29 14:28 ` Paolo Bonzini @ 2018-01-29 14:57 ` David Woodhouse 0 siblings, 0 replies; 14+ messages in thread From: David Woodhouse @ 2018-01-29 14:57 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: Radim Krčmář, KarimAllah Ahmed [-- Attachment #1: Type: text/plain, Size: 1272 bytes --] On Mon, 2018-01-29 at 15:28 +0100, Paolo Bonzini wrote: > On 29/01/2018 13:53, David Woodhouse wrote: > > > > Hm, we are pushing the other bits through tip/x86/pti, which is still > > based on 4.14 so that everything can be backported easily. I was > > expecting to be able to pull a clean 4.14-based tree > > Anything 4.14-based would have had conflicts all over due to the changes > that have already gone in for tip/x86/pti. These three patches do > cherry-pick cleanly on top of 4.14. > > If you give me the tree and commit id that you want me to use as a base, > I can rebase and give you a new topic branch. I've made a 'msr-bitmap' branch in my linux-retpoline.git tree¹ which looks more like I expected. It's based on tip/x86/pti. In the ibpb branch (on which I've just done a pass over Karim's patches and would appreciate more feedback while he fixes up some remaining details and prepares to send it out again) I have deliberately done a merge from that branch, with the intention that I can go back and do a pull from your branch instead. All the IBRS bits are now in the 'ibrs' branch. As discussed, I'll keep rebasing those on top of what we have, for now. ¹ http://git.infradead.org/users/dwmw2/linux-retpoline.git [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-01-31 18:14 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-27 8:50 [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 1/3] KVM: nVMX: Eliminate vmcs02 pool Paolo Bonzini 2018-01-27 8:50 ` [PATCH v2 2/3] KVM: VMX: introduce alloc_loaded_vmcs Paolo Bonzini 2018-01-29 10:31 ` David Hildenbrand 2018-01-27 8:50 ` [PATCH v2 3/3] KVM: VMX: make MSR bitmaps per-VCPU Paolo Bonzini 2018-01-29 10:35 ` David Hildenbrand 2018-01-30 13:07 ` [v2,3/3] " Mihai Carabas 2018-01-30 16:23 ` [PATCH v2 3/3] " Radim Krčmář 2018-01-30 16:30 ` David Woodhouse 2018-01-31 17:37 ` Paolo Bonzini 2018-01-31 18:14 ` Radim Krčmář 2018-01-29 12:53 ` [PATCH v2 0/3] Per-VCPU MSR bitmaps patches - topic branch for x86/pti David Woodhouse 2018-01-29 14:28 ` Paolo Bonzini 2018-01-29 14:57 ` David Woodhouse
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.