All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nadav Har'El" <nyh@math.technion.ac.il>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH 18/24] Exiting from L2 to L1
Date: Sun, 12 Sep 2010 16:05:30 +0200	[thread overview]
Message-ID: <20100912140530.GA26346@fermat.math.technion.ac.il> (raw)
In-Reply-To: <4C161AB8.4060905@redhat.com>

Hi,

Continuing to work on the nested VMX patches,

On Mon, Jun 14, 2010, Avi Kivity wrote about "Re: [PATCH 18/24] Exiting from L2 to L1":
> On 06/13/2010 03:31 PM, Nadav Har'El wrote:
>...
> >+/* prepare_vmcs_12 is called when the nested L2 guest exits and we want to
> >+ * prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12), and this
> >+ * function updates it to reflect the state of the registers during the 
> >exit,
>...
> >+	vmcs12->tsc_offset = vmcs_read64(TSC_OFFSET);
> >   
> 
> TSC_OFFSET cannot have changed.

Right. I cleaned up this function now, to only copy the fields that could
have changed, namely fields listed as guest-state or exit-information fields
in the spec. Control fields like this TSC_OFFSET and more examples you found
below, indeed could not have changed while L2 was running or during the exit.

> >+	vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> >   
> Without msr bitmaps, cannot change.

I added a TODO before this (and a couple of others) for future optimization.
I'm not even convinced how much quicker it is to check the MSR bitmap before
doing vmcs_read64, vs just to going ahead and vmreading it in any case.

> >+    vmcs12->vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);
>
> Can this change?

Well, according to the spec, (SDM vol 3B), VMCS link pointer is a guest-state
field, but it is listed as being for "future expansion". I guess that with
current hardware, it cannot change, but for future hardware it might. I'm
not sure if it's wiser to ignore this field for now (and shave a bit off
the l2->l1 switch time), or just copy it anyway, as I do now.
What would you prefer?

> >+	if (vmcs_config.vmentry_ctrl&  VM_ENTRY_LOAD_IA32_PAT)
> >+		vmcs12->guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);
> >   
> 
> Should check for VM_EXIT_SAVE_IA32_PAT, no?

You're absolutely right. Fixed.

> >+	vmcs12->vm_entry_intr_info_field =
> >+		vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
> >   
> 
> Autocleared, no need to read.

Well, we need to clear the "valid bit" on exit, so we don't mistakenly inject
the same interrupt twice. There were two ways to do it: 1. clear it ourselves,
or 2. copy the value from vmcs02 where the processor already cleared it.
There are pros and cons for each approach, but I'll change like you suggest,
to clear it ourselves:

	vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;

> >+	vmcs12->vm_instruction_error = vmcs_read32(VM_INSTRUCTION_ERROR);
>
> We don't want to pass this to the guest?

I didn't quite understand your question, but now that I look at it, I see
that VM_INSTRUCTION_ERROR has nothing to do with exits, but is only modified
when running VMX instructions, and our emulation of VMX instructions already
sets it appropriately, so no sense of copying it here.

> 
> >+	vmcs12->vm_exit_reason  = vmcs_read32(VM_EXIT_REASON);
> >+	vmcs12->vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> >+	vmcs12->vm_exit_intr_error_code = 
> >vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
> >+	vmcs12->idt_vectoring_info_field =
> >+		vmcs_read32(IDT_VECTORING_INFO_FIELD);
> >+	vmcs12->idt_vectoring_error_code =
> >+		vmcs_read32(IDT_VECTORING_ERROR_CODE);
> >+	vmcs12->vm_exit_instruction_len = 
> >vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> >+	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> >   
> 
> For the above, if the host handles the exit, we must not clobber guest 
> fields.  A subsequent guest vmread will see the changed values even 
> though from its point of view a vmexit has not occurred.
> 
> But no, that can't happen, since a vmread needs to have a vmexit first 
> to happen.  Still, best to delay this.+

All this code is in prepare_vmcs_12, which only gets called if we exit from
L2 to L1 - it doesn't get called when we exit from L2 to L0 (when the host
handles the exit).

As long as a certain field gets written to on *every* exit, and not just
on some of them, I believe we can safely copy the current values in vmcs02
to vmcs12, knowing these are current values from the current exit, and not
some old values we shouldn't be copying.

You may have a point (if that was your point?) that some fields are not
always written to - e.g., for most exits vm_exit_intr_info doesn't get
written to and just one "valid" bit is cleared. As the code is now, we copy
vmcs02's field, which might have been written earlier (e.g., during
an exit to L0) and not now, and an observant L1 might notice this value
it should not have seen. However, I don't see any problem with that, because
the "valid" bit would be correctly turned off, and the spec says that all
other bits are undefined (for irrelevant exits) and copying-old-values is one
legal setting for undefined bits...

> >+	/* If any of the CRO_GUEST_HOST_MASK bits are off, the L2 guest may
> >+	 * have changed some cr0 bits without us ever saving them in the 
> >shadow
> >+	 * vmcs. So we need to save these changes now.
>...
> >+
> >+	vmcs12->guest_cr4 = vmcs_readl(GUEST_CR4);
> 
> Can't we have the same issue with cr4?

I guess we can. I didn't think this (giving guest full control over cr4 bits)
was happening in KVM, but maybe I was wrong, or maybe this will happen in the
future, so no reason not to do it for cr4 as well. So I'll do it for cr4 as
well.

> Better to have some helpers to do the common magic, and not encode the 
> special knowledge about TS into it (make it generic).

I thought that since in current KVM code the only cr0_guest_owned_bits bit
that could possibly be turned on was TS, then I should only deal with that
bit. But you're right, no reason not to make it more general, to look for
any bits which L0 traps but L1 didn't think it was trapping. As in:

	long bits;
	bits = vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask;
	vmcs12->guest_cr0 = (vmcs_readl(GUEST_CR0) & bits) |
				(vmcs_readl(CR0_READ_SHADOW) & ~bits);

(the "bits" lists all the bits which are already fine in guest_cr0, i.e.,
either guest_owned_bit (so L2 wrote to it directly) or guest_host_mask
(so L1 didn't expect them to be updated in guest_cr0 anyway). All other
bits need to be copied from the read_shadow).

I don't know how to put this into a helper function, because these two
statements have so many dependencies on the word "cr0" that making one
that would work for either cr0 or cr4 seems too difficult to be worth the
trouble.

This reply is getting long, so I'll leave it about prepare_vmcs_12 and
will reply to your comments about the rest of this patch in a separate mail.

Thanks,
Nadav.

-- 
Nadav Har'El                        |       Sunday, Sep 12 2010, 4 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |An Apple a day, keeps Windows away.
http://nadav.harel.org.il           |

  reply	other threads:[~2010-09-12 14:05 UTC|newest]

Thread overview: 147+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-13 12:22 [PATCH 0/24] Nested VMX, v5 Nadav Har'El
2010-06-13 12:23 ` [PATCH 1/24] Move nested option from svm.c to x86.c Nadav Har'El
2010-06-14  8:11   ` Avi Kivity
2010-06-15 14:27     ` Nadav Har'El
2010-06-13 12:23 ` [PATCH 2/24] Add VMX and SVM to list of supported cpuid features Nadav Har'El
2010-06-14  8:13   ` Avi Kivity
2010-06-15 14:31     ` Nadav Har'El
2010-06-13 12:24 ` [PATCH 3/24] Implement VMXON and VMXOFF Nadav Har'El
2010-06-14  8:21   ` Avi Kivity
2010-06-16 11:14     ` Nadav Har'El
2010-06-16 11:26       ` Avi Kivity
2010-06-15 20:18   ` Marcelo Tosatti
2010-06-16  7:50     ` Nadav Har'El
2010-06-13 12:24 ` [PATCH 4/24] Allow setting the VMXE bit in CR4 Nadav Har'El
2010-06-15 11:09   ` Gleb Natapov
2010-06-15 14:44     ` Nadav Har'El
2010-06-13 12:25 ` [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2010-06-14  8:33   ` Avi Kivity
2010-06-14  8:49     ` Nadav Har'El
2010-06-14 12:35       ` Avi Kivity
2010-06-16 12:24     ` Nadav Har'El
2010-06-16 13:10       ` Avi Kivity
2010-06-22 14:54     ` Nadav Har'El
2010-06-22 16:53       ` Nadav Har'El
2010-06-23  8:07         ` Avi Kivity
2010-08-08 15:09           ` Nadav Har'El
2010-08-10  3:24             ` Avi Kivity
2010-06-23  7:57       ` Avi Kivity
2010-06-23  9:15         ` Alexander Graf
2010-06-23  9:24           ` Avi Kivity
2010-06-23 12:07         ` Nadav Har'El
2010-06-23 12:13           ` Avi Kivity
2010-06-13 12:25 ` [PATCH 6/24] Implement reading and writing of VMX MSRs Nadav Har'El
2010-06-14  8:42   ` Avi Kivity
2010-06-23  8:13     ` Nadav Har'El
2010-06-23  8:24       ` Avi Kivity
2010-06-13 12:26 ` [PATCH 7/24] Understanding guest pointers to vmcs12 structures Nadav Har'El
2010-06-14  8:48   ` Avi Kivity
2010-08-02 12:25     ` Nadav Har'El
2010-08-02 13:38       ` Avi Kivity
2010-06-15 12:14   ` Gleb Natapov
2010-08-01 15:16     ` Nadav Har'El
2010-08-01 15:25       ` Gleb Natapov
2010-08-02  8:57         ` Nadav Har'El
2010-06-13 12:26 ` [PATCH 8/24] Hold a vmcs02 for each vmcs12 Nadav Har'El
2010-06-14  8:57   ` Avi Kivity
2010-07-06  9:50   ` Dong, Eddie
2010-08-02 13:38     ` Nadav Har'El
2010-06-13 12:27 ` [PATCH 9/24] Implement VMCLEAR Nadav Har'El
2010-06-14  9:03   ` Avi Kivity
2010-06-15 13:47   ` Gleb Natapov
2010-06-15 13:50     ` Avi Kivity
2010-06-15 13:54       ` Gleb Natapov
2010-08-05 11:50         ` Nadav Har'El
2010-08-05 11:53           ` Gleb Natapov
2010-08-05 12:01             ` Nadav Har'El
2010-08-05 12:05               ` Avi Kivity
2010-08-05 12:10                 ` Nadav Har'El
2010-08-05 12:13                   ` Avi Kivity
2010-08-05 12:29                     ` Nadav Har'El
2010-08-05 12:03           ` Avi Kivity
2010-07-06  2:56   ` Dong, Eddie
2010-08-03 12:12     ` Nadav Har'El
2010-06-13 12:27 ` [PATCH 10/24] Implement VMPTRLD Nadav Har'El
2010-06-14  9:07   ` Avi Kivity
2010-08-05 11:13     ` Nadav Har'El
2010-06-16 13:36   ` Gleb Natapov
2010-07-06  3:09   ` Dong, Eddie
2010-08-05 11:35     ` Nadav Har'El
2010-06-13 12:28 ` [PATCH 11/24] Implement VMPTRST Nadav Har'El
2010-06-14  9:15   ` Avi Kivity
2010-06-16 13:53     ` Gleb Natapov
2010-06-16 15:33       ` Nadav Har'El
2010-06-13 12:28 ` [PATCH 12/24] Add VMCS fields to the vmcs12 Nadav Har'El
2010-06-14  9:24   ` Avi Kivity
2010-06-16 14:18   ` Gleb Natapov
2010-06-13 12:29 ` [PATCH 13/24] Implement VMREAD and VMWRITE Nadav Har'El
2010-06-14  9:36   ` Avi Kivity
2010-06-16 14:48     ` Gleb Natapov
2010-08-04 13:42       ` Nadav Har'El
2010-08-04 16:09     ` Nadav Har'El
2010-08-04 16:41       ` Avi Kivity
2010-06-16 15:03   ` Gleb Natapov
2010-08-04 11:46     ` Nadav Har'El
2010-06-13 12:29 ` [PATCH 14/24] Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2010-06-14 11:11   ` Avi Kivity
2010-06-17  8:50   ` Gleb Natapov
2010-07-06  6:25   ` Dong, Eddie
2010-06-13 12:30 ` [PATCH 15/24] Move register-syncing to a function Nadav Har'El
2010-06-13 12:30 ` [PATCH 16/24] Implement VMLAUNCH and VMRESUME Nadav Har'El
2010-06-14 11:41   ` Avi Kivity
2010-09-26 11:14     ` Nadav Har'El
2010-09-26 12:56       ` Avi Kivity
2010-09-26 13:06         ` Nadav Har'El
2010-09-26 13:51           ` Avi Kivity
2010-06-17 10:59   ` Gleb Natapov
2010-09-16 16:06     ` Nadav Har'El
2010-06-13 12:31 ` [PATCH 17/24] No need for handle_vmx_insn function any more Nadav Har'El
2010-06-13 12:31 ` [PATCH 18/24] Exiting from L2 to L1 Nadav Har'El
2010-06-14 12:04   ` Avi Kivity
2010-09-12 14:05     ` Nadav Har'El [this message]
2010-09-12 14:29       ` Avi Kivity
2010-09-12 17:05         ` Nadav Har'El
2010-09-12 17:21           ` Avi Kivity
2010-09-12 19:51             ` Nadav Har'El
2010-09-13  8:48               ` Avi Kivity
2010-09-13  5:53             ` Sheng Yang
2010-09-13  8:52               ` Avi Kivity
2010-09-13  9:01                 ` Nadav Har'El
2010-09-13  9:34                   ` Avi Kivity
2010-09-14 13:07     ` Nadav Har'El
2010-06-13 12:32 ` [PATCH 19/24] Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2010-06-14 12:24   ` Avi Kivity
2010-09-16 14:42     ` Nadav Har'El
2010-06-13 12:32 ` [PATCH 20/24] Correct handling of interrupt injection Nadav Har'El
2010-06-14 12:29   ` Avi Kivity
2010-06-14 12:48     ` Avi Kivity
2010-09-16 15:25     ` Nadav Har'El
2010-06-13 12:33 ` [PATCH 21/24] Correct handling of exception injection Nadav Har'El
2010-06-13 12:33 ` [PATCH 22/24] Correct handling of idt vectoring info Nadav Har'El
2010-06-17 11:58   ` Gleb Natapov
2010-09-20  6:37     ` Nadav Har'El
2010-09-20  9:34       ` Gleb Natapov
2010-09-20 10:03         ` Nadav Har'El
2010-09-20 10:11           ` Avi Kivity
2010-09-22 23:15             ` Nadav Har'El
2010-09-26 15:14               ` Avi Kivity
2010-09-26 15:18                 ` Gleb Natapov
2010-09-20 10:20           ` Gleb Natapov
2010-06-13 12:34 ` [PATCH 23/24] Handling of CR0.TS and #NM for Lazy FPU loading Nadav Har'El
2010-06-13 12:34 ` [PATCH 24/24] Miscellenous small corrections Nadav Har'El
2010-06-14 12:34 ` [PATCH 0/24] Nested VMX, v5 Avi Kivity
2010-06-14 13:03   ` Nadav Har'El
2010-06-15 10:00     ` Avi Kivity
2010-10-17 12:03       ` Nadav Har'El
2010-10-17 12:10         ` Avi Kivity
2010-10-17 12:39           ` Nadav Har'El
2010-10-17 13:35             ` Avi Kivity
2010-07-09  8:59 ` Dong, Eddie
2010-07-11  8:27   ` Nadav Har'El
2010-07-11 11:05     ` Alexander Graf
2010-07-11 12:49       ` Nadav Har'El
2010-07-11 13:12         ` Avi Kivity
2010-07-11 15:39           ` Nadav Har'El
2010-07-11 15:45             ` Avi Kivity
2010-07-11 13:20     ` Avi Kivity
2010-07-15  3:27 ` Sheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100912140530.GA26346@fermat.math.technion.ac.il \
    --to=nyh@math.technion.ac.il \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.