From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Nadav Har'El" Subject: Re: [PATCH 18/24] Exiting from L2 to L1 Date: Sun, 12 Sep 2010 19:05:03 +0200 Message-ID: <20100912170503.GA7828@fermat.math.technion.ac.il> References: <1276431753-nyh@il.ibm.com> <201006131231.o5DCVlKB013102@rice.haifa.ibm.com> <4C161AB8.4060905@redhat.com> <20100912140530.GA26346@fermat.math.technion.ac.il> <4C8CE3E2.7060708@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from mailgw12.technion.ac.il ([132.68.225.12]:32946 "EHLO mailgw12.technion.ac.il" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753100Ab0ILRFJ (ORCPT ); Sun, 12 Sep 2010 13:05:09 -0400 Content-Disposition: inline In-Reply-To: <4C8CE3E2.7060708@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Sun, Sep 12, 2010, Avi Kivity wrote about "Re: [PATCH 18/24] Exiting from L2 to L1": > Great. Hopefully you can commit some time to it or you'll spend a lot > of cycles just catching up. Right. I will. > Joerg just merged nested npt; as far as I can tell it is 100% in line > with nested ept, but please take a look as well. Indeed. Making nested EPT work based on the nested NPT work is one of the first things I plan to do after the basic nested VMX patches are accepted. As you know, we already have a version of nested EPT working in our testing code, but I'll need to tweak it a bit to use the common nested NPT code. > >>>+ vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); > >>> > >>Without msr bitmaps, cannot change. > >I added a TODO before this (and a couple of others) for future > >optimization. > >I'm not even convinced how much quicker it is to check the MSR bitmap > >before > >doing vmcs_read64, vs just to going ahead and vmreading it in any case. > > IIRC we don't support msr bitmaps now, so no need to check anything. I do think we support msr bitmaps... E.g., we have nested_vmx_exit_handled_msr() to check whether L1 requires an exit for a certain MSR access. Where don't we support them? (but I'm not denying the possiblity that this support still has holes or bugs). > In general, avoid vmcs reads as much as possible. Just think of your > code running in a guest! Yes. On the other hand, I don't want to be sorry in the future when I want to support some feature, but because I wanted to shave off 1% of the L2->L1 switching time, and 0.01% of the total run time (and I'm just making numbers up...), I now need to find a dozen places where things need to change to support this feature. On the other hand, this will likely happen anyway ;-) > > >>>+ vmcs12->vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER); > >>Can this change? >.. > If it changes in the future, it can only be under the influence of some > control or at least guest-discoverable capability. Since we don't > expose such control or capability, there's no need to copy it. You convinced me. Removed it. > >>>+ vmcs12->vm_entry_intr_info_field = > >>>+ vmcs_read32(VM_ENTRY_INTR_INFO_FIELD); > >>> > >>Autocleared, no need to read. > >Well, we need to clear the "valid bit" on exit, so we don't mistakenly > >inject > >the same interrupt twice. > > I don't think so. We write this field as part of guest entry (that is, > after the decision re which vmcs to use, yes?), so guest entry will > always follow writing this field. Since guest entry clears the field, > reading it after an exit will necessarily return 0. Well, you obviously know the KVM code much better than I do, but from what I saw, I thought (maybe I misunderstood) that in normal (non-nested) KVM, this field only gets written on injection, not on every entry, so the code relies on the fact that the processor turns off the "valid" bit during exit, to avoid the same event being injected when the same field value is used for another entry. I can only find code which resets this field in vmx_vcpu_reset(), but that doesn't get called on every entry, right? Or am I missing something? > What can happen is that the contents of the field is transferred to the > IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field. > > (question: on a failed vmentry, is this field cleared?) I don't know the answer :-) > >There were two ways to do it: 1. clear it ourselves, > >or 2. copy the value from vmcs02 where the processor already cleared it. > >There are pros and cons for each approach, but I'll change like you > >suggest, > >to clear it ourselves: > > > > vmcs12->vm_entry_intr_info_field&= ~INTR_INFO_VALID_MASK; > > That's really a temporary variable, I don't think you need to touch it. But we need to emulate the correct VMX behavior. According to the spec, the "valid" bit on this field needs to be cleared on vmexit, so we need to do it also on emulated exits from L2 to L1. If we're sure that we already cleared it earlier, then fine, but if not (and like I said, I failed to find this code), we need to do it now, on exit - either by explictly clearing the bit or by copying a value where the processor cleared this bit (arguably, the former is more correct emulation). > I didn't mean register independent helper; one function for cr0 and one > function for cr4 so the reader can just see the name and pretend to > understand what it does, instead of seeing a bunch of incomprehensible > bitwise operations. Ok, done: /* * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date * because L2 may have changed some cr0 bits directly (see CRO_GUEST_HOST_MASK) * without L0 trapping the change and updating vmcs12. * This function returns the value we should put in vmcs12.guest_cr0. It's not * enough to just return the current (vmcs02) GUEST_CR0. This may not be the * guest cr0 that L1 thought it was giving its L2 guest - it is possible that * L1 wished to allow its guest to set a cr0 bit directly, but we (L0) asked * to trap this change and instead set just the read shadow. If this is the * case, we need to copy these read-shadow bits back to vmcs12.guest_cr0, where * L1 believes they already are. */ static inline unsigned long vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs_fields *vmcs12){ unsigned long guest_cr0_bits = vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask; return (vmcs_readl(GUEST_CR0) & guest_cr0_bits) | (vmcs_readl(CR0_READ_SHADOW) & ~guest_cr0_bits); } and the call becomes just: vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12); which is easy to glance over (but doesn't say much about what it is doing). -- Nadav Har'El | Sunday, Sep 12 2010, 4 Tishri 5771 nyh@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |Sign above a shop selling burglar alarms: http://nadav.harel.org.il |"For the man who has everything"