All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: "Nadav Har'El" <nyh@il.ibm.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH 14/24] Prepare vmcs02 from vmcs01 and vmcs12
Date: Mon, 14 Jun 2010 14:11:41 +0300	[thread overview]
Message-ID: <4C160E6D.3080609@redhat.com> (raw)
In-Reply-To: <201006131229.o5DCTih4013041@rice.haifa.ibm.com>

On 06/13/2010 03:29 PM, Nadav Har'El wrote:
> This patch contains code to prepare the VMCS which can be used to actually
> run the L2 guest, vmcs02. prepare_vmcs02 appropriately merges the information
> in shadow_vmcs that L1 built for L2 (vmcs12), and that in the VMCS that we
> built for L1 (vmcs01).
>
> VMREAD/WRITE can only access one VMCS at a time (the "current" VMCS), which
> makes it difficult for us to read from vmcs01 while writing to vmcs12. This
> is why we first make a copy of vmcs01 in memory (l1_shadow_vmcs) and then
> read that memory copy while writing to vmcs12.
>
> Signed-off-by: Nadav Har'El<nyh@il.ibm.com>
> ---
> --- .before/arch/x86/kvm/vmx.c	2010-06-13 15:01:29.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c	2010-06-13 15:01:29.000000000 +0300
> @@ -849,6 +849,36 @@ static inline bool report_flexpriority(v
>   	return flexpriority_enabled;
>   }
>
> +static inline bool nested_cpu_has_vmx_tpr_shadow(struct kvm_vcpu *vcpu)
> +{
> +	return cpu_has_vmx_tpr_shadow()&&
> +		get_shadow_vmcs(vcpu)->cpu_based_vm_exec_control&
> +		CPU_BASED_TPR_SHADOW;
>    

Operator precedence is with you, but the line width limit is not.  Use 
parentheses to improve readability.

> @@ -1292,6 +1322,39 @@ static void vmx_load_host_state(struct v
>   	preempt_enable();
>   }
>
> +int load_vmcs_host_state(struct shadow_vmcs *src)
> +{
> +	vmcs_write16(HOST_ES_SELECTOR, src->host_es_selector);
> +	vmcs_write16(HOST_CS_SELECTOR, src->host_cs_selector);
> +	vmcs_write16(HOST_SS_SELECTOR, src->host_ss_selector);
> +	vmcs_write16(HOST_DS_SELECTOR, src->host_ds_selector);
> +	vmcs_write16(HOST_FS_SELECTOR, src->host_fs_selector);
> +	vmcs_write16(HOST_GS_SELECTOR, src->host_gs_selector);
> +	vmcs_write16(HOST_TR_SELECTOR, src->host_tr_selector);
>    

Why do we need to go through a shadow_vmcs for host fields?  Instead of 
cloning a vmcs, you can call a common init routing to initialize the 
host fields.

> +
> +	vmcs_write64(TSC_OFFSET, src->tsc_offset);
>    

Don't you need to adjust for our TSC_OFFSET?

> +/* prepare_vmcs_02 is called in when the L1 guest hypervisor runs its nested
> + * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
> + * with L0's wishes for its guest (vmsc01), so we can run the L2 guest in a
> + * way that will both be appropriate to L1's requests, and our needs.
> + */
> +int prepare_vmcs_02(struct kvm_vcpu *vcpu,
> +	struct shadow_vmcs *vmcs12, struct shadow_vmcs *vmcs01)
> +{
> +	u32 exec_control;
> +
> +	load_vmcs_common(vmcs12);
> +
> +	vmcs_write64(VMCS_LINK_POINTER, vmcs12->vmcs_link_pointer);
>    

Not sure about this.  We don't use the vmcs link pointer, it's better to 
keep it at its default of -1ull.

> +	vmcs_write64(IO_BITMAP_A, vmcs01->io_bitmap_a);
> +	vmcs_write64(IO_BITMAP_B, vmcs01->io_bitmap_b);
>    

I guess merging the io bitmaps doesn't make much sense, at least at this 
stage.

> +	if (cpu_has_vmx_msr_bitmap())
> +		vmcs_write64(MSR_BITMAP, vmcs01->msr_bitmap);
>    

However, merging the msr bitmaps is critical.  Perhaps you do it in a 
later patch.

> +
> +	if (vmcs12->vm_entry_msr_load_count>  0 ||
> +			vmcs12->vm_exit_msr_load_count>  0 ||
> +			vmcs12->vm_exit_msr_store_count>  0) {
> +		printk(KERN_WARNING
> +			"%s: VMCS MSR_{LOAD,STORE} unsupported\n", __func__);
>    

Unfortunate, since kvm has started to use this feature.

For all unsupported mandatory features, we need reporting that is always 
enabled (i.e. no dprintk()), but to avoid flooding dmesg, use 
printk_ratelimit() or report just once.  Also, it's better to kill the 
guest (KVM_REQ_TRIPLE_FAULT, or a VM instruction error) than to let it 
continue incorrectly.

> +	}
> +
> +	if (nested_cpu_has_vmx_tpr_shadow(vcpu)) {
> +		struct page *page =
> +			nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
> +		if (!page)
> +			return 1;
>    

?

> +		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, page_to_phys(page));
> +		kvm_release_page_clean(page);
>    

Hm.  The host can move this page around.  If it happens not to be mapped 
into the guest, we won't get any notification.  So we need to keep a 
reference to this page, or else force an exit from the mmu notifier 
callbacks if it is removed by the host.

> +	}
> +
> +	if (nested_vm_need_virtualize_apic_accesses(vcpu)) {
> +		struct page *page =
> +			nested_get_page(vcpu, vmcs12->apic_access_addr);
> +		if (!page)
> +			return 1;
> +		vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
> +		kvm_release_page_clean(page);
> +	}
>    

Ditto.

> +
> +	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
> +		     (vmcs01->pin_based_vm_exec_control |
> +		      vmcs12->pin_based_vm_exec_control));
>    

We don't really want the guest's pin-based controls to influence our own 
(it doesn't really matter since ours are always set).  Rather, they 
should influence the interface between the local APIC and the guest.

Where do you check if the values are valid?  Otherwise the guest can 
easily crash where it expects a VM entry failure.


> +	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK,
> +		     (vmcs01->page_fault_error_code_mask&
> +		      vmcs12->page_fault_error_code_mask));
> +	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH,
> +		     (vmcs01->page_fault_error_code_match&
> +		      vmcs12->page_fault_error_code_match));
>    

Bit 14 in the exception bitmap also plays a part, I think it 
significantly changes the picture if set in the guest (the host will 
have it clear always IIRC).  Can perhaps avoid by clearing the whole 
thing if the guest is inverting the match.+
> +	if (enable_ept) {
> +		if (!nested_cpu_has_vmx_ept(vcpu)) {
> +			vmcs_write64(EPT_POINTER, vmcs01->ept_pointer);
> +			vmcs_write64(GUEST_PDPTR0, vmcs01->guest_pdptr0);
> +			vmcs_write64(GUEST_PDPTR1, vmcs01->guest_pdptr1);
> +			vmcs_write64(GUEST_PDPTR2, vmcs01->guest_pdptr2);
> +			vmcs_write64(GUEST_PDPTR3, vmcs01->guest_pdptr3);
> +		}
>    

Currently we don't support guest ept, so the second condition can be 
avoided.

> +	}
> +
> +	exec_control = vmcs01->cpu_based_vm_exec_control;
> +	exec_control&= ~CPU_BASED_VIRTUAL_INTR_PENDING;
> +	exec_control&= ~CPU_BASED_VIRTUAL_NMI_PENDING;
> +	exec_control&= ~CPU_BASED_TPR_SHADOW;
> +	exec_control |= vmcs12->cpu_based_vm_exec_control;
> +	if (!vm_need_tpr_shadow(vcpu->kvm) ||
> +	    vmcs12->virtual_apic_page_addr == 0) {
> +		exec_control&= ~CPU_BASED_TPR_SHADOW;
>    

Why?

> +#ifdef CONFIG_X86_64
> +		exec_control |= CPU_BASED_CR8_STORE_EXITING |
> +			CPU_BASED_CR8_LOAD_EXITING;
> +#endif
> +	} else if (exec_control&  CPU_BASED_TPR_SHADOW) {
> +#ifdef CONFIG_X86_64
> +		exec_control&= ~CPU_BASED_CR8_STORE_EXITING;
> +		exec_control&= ~CPU_BASED_CR8_LOAD_EXITING;
> +#endif
>    

This should be part of the general checks on valid control values.

> +	}
> +	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_control);
> +
> +	/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
> +	 * bitwise-or of what L1 wants to trap for L2, and what we want to
> +	 * trap. However, vmx_fpu_activate/deactivate may have happened after
> +	 * we saved vmcs01, so we shouldn't trust its TS and NM_VECTOR bits
> +	 * and need to base them again on fpu_active. Note that CR0.TS also
> +	 * needs updating - we do this after this function returns (in
> +	 * nested_vmx_run).
> +	 */
> +	vmcs_write32(EXCEPTION_BITMAP,
> +		     ((vmcs01->exception_bitmap&~(1u<<NM_VECTOR)) |
> +		      (vcpu->fpu_active ? 0 : (1u<<NM_VECTOR)) |
> +		      vmcs12->exception_bitmap));
>    

Perhaps moving this to update_exception_bitmap will make this clearer.

> +	vmcs_writel(CR0_GUEST_HOST_MASK, vmcs12->cr0_guest_host_mask |
> +			(vcpu->fpu_active ? 0 : X86_CR0_TS));
> +	vcpu->arch.cr0_guest_owned_bits = ~(vmcs12->cr0_guest_host_mask |
> +			(vcpu->fpu_active ? 0 : X86_CR0_TS));
>    

I'm worried that managing this in two separate places will cause problems.

> +
> +	vmcs_write32(VM_EXIT_CONTROLS,
> +		     (vmcs01->vm_exit_controls&
> +			(~(VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT)))
> +		       | vmcs12->vm_exit_controls);
>    

Why do you drop PAT load/save?

> +
> +	vmcs_write32(VM_ENTRY_CONTROLS,
> +		     (vmcs01->vm_entry_controls&
> +			(~(VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE)))
> +		      | vmcs12->vm_entry_controls);
> +
> +	vmcs_writel(CR4_GUEST_HOST_MASK,
> +		    (vmcs01->cr4_guest_host_mask&
> +		     vmcs12->cr4_guest_host_mask));
> +
> +	return 0;
> +}
> +
>   static struct kvm_x86_ops vmx_x86_ops = {
>   	.cpu_has_kvm_support = cpu_has_kvm_support,
>   	.disabled_by_bios = vmx_disabled_by_bios,
>    


-- 
error compiling committee.c: too many arguments to function


  reply	other threads:[~2010-06-14 11:11 UTC|newest]

Thread overview: 147+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-13 12:22 [PATCH 0/24] Nested VMX, v5 Nadav Har'El
2010-06-13 12:23 ` [PATCH 1/24] Move nested option from svm.c to x86.c Nadav Har'El
2010-06-14  8:11   ` Avi Kivity
2010-06-15 14:27     ` Nadav Har'El
2010-06-13 12:23 ` [PATCH 2/24] Add VMX and SVM to list of supported cpuid features Nadav Har'El
2010-06-14  8:13   ` Avi Kivity
2010-06-15 14:31     ` Nadav Har'El
2010-06-13 12:24 ` [PATCH 3/24] Implement VMXON and VMXOFF Nadav Har'El
2010-06-14  8:21   ` Avi Kivity
2010-06-16 11:14     ` Nadav Har'El
2010-06-16 11:26       ` Avi Kivity
2010-06-15 20:18   ` Marcelo Tosatti
2010-06-16  7:50     ` Nadav Har'El
2010-06-13 12:24 ` [PATCH 4/24] Allow setting the VMXE bit in CR4 Nadav Har'El
2010-06-15 11:09   ` Gleb Natapov
2010-06-15 14:44     ` Nadav Har'El
2010-06-13 12:25 ` [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2010-06-14  8:33   ` Avi Kivity
2010-06-14  8:49     ` Nadav Har'El
2010-06-14 12:35       ` Avi Kivity
2010-06-16 12:24     ` Nadav Har'El
2010-06-16 13:10       ` Avi Kivity
2010-06-22 14:54     ` Nadav Har'El
2010-06-22 16:53       ` Nadav Har'El
2010-06-23  8:07         ` Avi Kivity
2010-08-08 15:09           ` Nadav Har'El
2010-08-10  3:24             ` Avi Kivity
2010-06-23  7:57       ` Avi Kivity
2010-06-23  9:15         ` Alexander Graf
2010-06-23  9:24           ` Avi Kivity
2010-06-23 12:07         ` Nadav Har'El
2010-06-23 12:13           ` Avi Kivity
2010-06-13 12:25 ` [PATCH 6/24] Implement reading and writing of VMX MSRs Nadav Har'El
2010-06-14  8:42   ` Avi Kivity
2010-06-23  8:13     ` Nadav Har'El
2010-06-23  8:24       ` Avi Kivity
2010-06-13 12:26 ` [PATCH 7/24] Understanding guest pointers to vmcs12 structures Nadav Har'El
2010-06-14  8:48   ` Avi Kivity
2010-08-02 12:25     ` Nadav Har'El
2010-08-02 13:38       ` Avi Kivity
2010-06-15 12:14   ` Gleb Natapov
2010-08-01 15:16     ` Nadav Har'El
2010-08-01 15:25       ` Gleb Natapov
2010-08-02  8:57         ` Nadav Har'El
2010-06-13 12:26 ` [PATCH 8/24] Hold a vmcs02 for each vmcs12 Nadav Har'El
2010-06-14  8:57   ` Avi Kivity
2010-07-06  9:50   ` Dong, Eddie
2010-08-02 13:38     ` Nadav Har'El
2010-06-13 12:27 ` [PATCH 9/24] Implement VMCLEAR Nadav Har'El
2010-06-14  9:03   ` Avi Kivity
2010-06-15 13:47   ` Gleb Natapov
2010-06-15 13:50     ` Avi Kivity
2010-06-15 13:54       ` Gleb Natapov
2010-08-05 11:50         ` Nadav Har'El
2010-08-05 11:53           ` Gleb Natapov
2010-08-05 12:01             ` Nadav Har'El
2010-08-05 12:05               ` Avi Kivity
2010-08-05 12:10                 ` Nadav Har'El
2010-08-05 12:13                   ` Avi Kivity
2010-08-05 12:29                     ` Nadav Har'El
2010-08-05 12:03           ` Avi Kivity
2010-07-06  2:56   ` Dong, Eddie
2010-08-03 12:12     ` Nadav Har'El
2010-06-13 12:27 ` [PATCH 10/24] Implement VMPTRLD Nadav Har'El
2010-06-14  9:07   ` Avi Kivity
2010-08-05 11:13     ` Nadav Har'El
2010-06-16 13:36   ` Gleb Natapov
2010-07-06  3:09   ` Dong, Eddie
2010-08-05 11:35     ` Nadav Har'El
2010-06-13 12:28 ` [PATCH 11/24] Implement VMPTRST Nadav Har'El
2010-06-14  9:15   ` Avi Kivity
2010-06-16 13:53     ` Gleb Natapov
2010-06-16 15:33       ` Nadav Har'El
2010-06-13 12:28 ` [PATCH 12/24] Add VMCS fields to the vmcs12 Nadav Har'El
2010-06-14  9:24   ` Avi Kivity
2010-06-16 14:18   ` Gleb Natapov
2010-06-13 12:29 ` [PATCH 13/24] Implement VMREAD and VMWRITE Nadav Har'El
2010-06-14  9:36   ` Avi Kivity
2010-06-16 14:48     ` Gleb Natapov
2010-08-04 13:42       ` Nadav Har'El
2010-08-04 16:09     ` Nadav Har'El
2010-08-04 16:41       ` Avi Kivity
2010-06-16 15:03   ` Gleb Natapov
2010-08-04 11:46     ` Nadav Har'El
2010-06-13 12:29 ` [PATCH 14/24] Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2010-06-14 11:11   ` Avi Kivity [this message]
2010-06-17  8:50   ` Gleb Natapov
2010-07-06  6:25   ` Dong, Eddie
2010-06-13 12:30 ` [PATCH 15/24] Move register-syncing to a function Nadav Har'El
2010-06-13 12:30 ` [PATCH 16/24] Implement VMLAUNCH and VMRESUME Nadav Har'El
2010-06-14 11:41   ` Avi Kivity
2010-09-26 11:14     ` Nadav Har'El
2010-09-26 12:56       ` Avi Kivity
2010-09-26 13:06         ` Nadav Har'El
2010-09-26 13:51           ` Avi Kivity
2010-06-17 10:59   ` Gleb Natapov
2010-09-16 16:06     ` Nadav Har'El
2010-06-13 12:31 ` [PATCH 17/24] No need for handle_vmx_insn function any more Nadav Har'El
2010-06-13 12:31 ` [PATCH 18/24] Exiting from L2 to L1 Nadav Har'El
2010-06-14 12:04   ` Avi Kivity
2010-09-12 14:05     ` Nadav Har'El
2010-09-12 14:29       ` Avi Kivity
2010-09-12 17:05         ` Nadav Har'El
2010-09-12 17:21           ` Avi Kivity
2010-09-12 19:51             ` Nadav Har'El
2010-09-13  8:48               ` Avi Kivity
2010-09-13  5:53             ` Sheng Yang
2010-09-13  8:52               ` Avi Kivity
2010-09-13  9:01                 ` Nadav Har'El
2010-09-13  9:34                   ` Avi Kivity
2010-09-14 13:07     ` Nadav Har'El
2010-06-13 12:32 ` [PATCH 19/24] Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2010-06-14 12:24   ` Avi Kivity
2010-09-16 14:42     ` Nadav Har'El
2010-06-13 12:32 ` [PATCH 20/24] Correct handling of interrupt injection Nadav Har'El
2010-06-14 12:29   ` Avi Kivity
2010-06-14 12:48     ` Avi Kivity
2010-09-16 15:25     ` Nadav Har'El
2010-06-13 12:33 ` [PATCH 21/24] Correct handling of exception injection Nadav Har'El
2010-06-13 12:33 ` [PATCH 22/24] Correct handling of idt vectoring info Nadav Har'El
2010-06-17 11:58   ` Gleb Natapov
2010-09-20  6:37     ` Nadav Har'El
2010-09-20  9:34       ` Gleb Natapov
2010-09-20 10:03         ` Nadav Har'El
2010-09-20 10:11           ` Avi Kivity
2010-09-22 23:15             ` Nadav Har'El
2010-09-26 15:14               ` Avi Kivity
2010-09-26 15:18                 ` Gleb Natapov
2010-09-20 10:20           ` Gleb Natapov
2010-06-13 12:34 ` [PATCH 23/24] Handling of CR0.TS and #NM for Lazy FPU loading Nadav Har'El
2010-06-13 12:34 ` [PATCH 24/24] Miscellenous small corrections Nadav Har'El
2010-06-14 12:34 ` [PATCH 0/24] Nested VMX, v5 Avi Kivity
2010-06-14 13:03   ` Nadav Har'El
2010-06-15 10:00     ` Avi Kivity
2010-10-17 12:03       ` Nadav Har'El
2010-10-17 12:10         ` Avi Kivity
2010-10-17 12:39           ` Nadav Har'El
2010-10-17 13:35             ` Avi Kivity
2010-07-09  8:59 ` Dong, Eddie
2010-07-11  8:27   ` Nadav Har'El
2010-07-11 11:05     ` Alexander Graf
2010-07-11 12:49       ` Nadav Har'El
2010-07-11 13:12         ` Avi Kivity
2010-07-11 15:39           ` Nadav Har'El
2010-07-11 15:45             ` Avi Kivity
2010-07-11 13:20     ` Avi Kivity
2010-07-15  3:27 ` Sheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C160E6D.3080609@redhat.com \
    --to=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@il.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.