From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: [PATCH 17/31] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Date: Tue, 24 May 2011 18:52:10 +0800 Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BFA34EE@shsmsx502.ccr.corp.intel.com> References: <1305575004-nyh@il.ibm.com> <201105161952.p4GJqbek001851@rice.haifa.ibm.com> <625BA99ED14B2D499DC4E29D8138F1505C9BFA33C3@shsmsx502.ccr.corp.intel.com> <20110524091918.GA136@fermat.math.technion.ac.il> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: "kvm@vger.kernel.org" , "gleb@redhat.com" , "avi@redhat.com" To: Nadav Har'El Return-path: Received: from mga01.intel.com ([192.55.52.88]:45666 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751833Ab1EXKyd convert rfc822-to-8bit (ORCPT ); Tue, 24 May 2011 06:54:33 -0400 In-Reply-To: <20110524091918.GA136@fermat.math.technion.ac.il> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: > From: Nadav Har'El > Sent: Tuesday, May 24, 2011 5:19 PM > > On Tue, May 24, 2011, Tian, Kevin wrote about "RE: [PATCH 17/31] nVMX: > Prepare vmcs02 from vmcs01 and vmcs12": > > > +static inline unsigned long guest_readable_cr4(struct vmcs12 *fields) > > > +{ > > > + return (fields->guest_cr4 & ~fields->cr4_guest_host_mask) | > > > + (fields->cr4_read_shadow & > fields->cr4_guest_host_mask); > > > +} > > > + > > > > will guest_ prefix look confusing here? The 'guest' has a broad range which > makes > > above two functions look like they can be used in non-nested case. Should we > stick > > to nested_prefix for nested specific facilities? > > I don't know, I thought it made calls like > > vmcs_writel(CR0_READ_SHADOW, guest_readable_cr0(vmcs12)); > > readable, and the comments (and the parameters) make it obvious it's for > nested only. > > I now renamed these functions nested_read_cr0(), nested_read_cr4() - I hope > you like these names better. yes. > > > > + if (is_guest_mode(&vmx->vcpu)) > > > + vmx->vcpu.arch.cr4_guest_owned_bits &= > > > + > ~get_vmcs12(&vmx->vcpu)->cr4_guest_host_mask; > > > > why not is_nested_mode()? :-P > > I assume you're wondering why the function is called is_guest_mode(), and > not is_nested_mode()? yes > > This name was chosen by Avi Kivity in November last year, for the function > previously introduced by Joerg Roedel. My original code (before Joerg added > this function to x86.c) indeed used the term "nested_mode", not > "guest_mode". > > In January, I pointed to the possibility of confusion between the new > is_guest_mode() and other things called "guest mode", and Avi Kivity said > he will rename it to is_nested_guest() - see > http://lkml.indiana.edu/hypermail/linux/kernel/1101.1/01418.html > But as you can see, he never did this renaming. > > That being said, after half a year, I got used to the name is_guest_mode(), > and am no longer convinced it should be changed. It checks whether the vcpu > (not the underlying CPU) is in Intel-SDM-terminology "guest mode". Just like > is_long_mode() checks if the vcpu is in long mode. So I'm fine with leaving > its current name. well, it's a small issue, and I'm fine with leaving it though I don't like 'guest' here. :-) > > > > +static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 > *vmcs12) > > > +{ > >... > > > + if (!vmx->rdtscp_enabled) > > > + exec_control &= ~SECONDARY_EXEC_RDTSCP; > > > + /* Take the following fields only from vmcs12 */ > > > + exec_control &= > ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; > > > + if (nested_cpu_has(vmcs12, > > > + > CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) > > > + exec_control |= > vmcs12->secondary_vm_exec_control; > > > > should this 2nd exec_control be merged in clear case-by-case flavor? > > > > what about L0 sets "virtualize x2APIC" bit while L1 doesn't? > > > > Or what about L0 disables EPT while L1 sets it? > > > > I think it's better to scrutinize every 2nd exec_control feature with a > > clear policy: > > - whether we want to use the stricter policy which is only set when both L0 > and > > L1 set it > > - whether we want to use L1 setting absolutely regardless of L0 setting like > > what you did for virtualize APIC access > > Please note that most of the examples you give cannot happen in practice, > because we tell L1 (via MSR) which features it is allowed to use, and we > fail entry if it tries to use disallowed features (before ever reaching > the merge code you're commenting on). So we don't allow L1, for example, > to use the EPT feature (and when nested-EPT support is added, we won't > allow L1 to use EPT if L0 didn't). The general thinking was that for most > fields that we do explicitly allow, "OR" is the right choice. This really bases on the value of the control bit. To achieve the strictest setting between L0/L1, sometimes you want to use AND and sometimes you want to use OR. >>From a design p.o.v, it's better not to have such implicit assumption on other places. Just make it clean and correct. Also in your example it doesn't cover the case where L0 sets some bits which are not exposed to L1 via MSR. For example as I said earlier, what about L0 sets virtualize X2APIC mode while it's not enabled by or not exposed to L1. With OR, you then also enable this mode for L2 absolutely, while L1 has no logic to handle it. I'd like to see a clean policy for the known control bits here, even with a strict policy to incur most VM-exits which can be optimized in the future. > > I'll add this to my bugzilla, and think about it again later. > Thanks Kevin