From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling Date: Tue, 24 May 2011 16:20:49 +0800 Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BFA33FE@shsmsx502.ccr.corp.intel.com> References: <20110517181132.GA16262@fermat.math.technion.ac.il> <20110517184336.GA10394@amt.cnet> <20110517193030.GA21656@fermat.math.technion.ac.il> <20110517195253.GB11065@amt.cnet> <20110518055236.GA1230@fermat.math.technion.ac.il> <20110518120801.GA9176@amt.cnet> <20110522085732.GB1116@fermat.math.technion.ac.il> <4DDA81FD.5050203@redhat.com> <20110523185104.GA26899@fermat.math.technion.ac.il> <625BA99ED14B2D499DC4E29D8138F1505C9BEF07A9@shsmsx502.ccr.corp.intel.com> <20110524075656.GA26588@fermat.math.technion.ac.il> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: Avi Kivity , Marcelo Tosatti , "kvm@vger.kernel.org" , "gleb@redhat.com" , "Roedel, Joerg" To: Nadav Har'El Return-path: Received: from mga03.intel.com ([143.182.124.21]:57254 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754343Ab1EXIVt convert rfc822-to-8bit (ORCPT ); Tue, 24 May 2011 04:21:49 -0400 In-Reply-To: <20110524075656.GA26588@fermat.math.technion.ac.il> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: > From: Nadav Har'El [mailto:nyh@math.technion.ac.il] > Sent: Tuesday, May 24, 2011 3:57 PM > > On Tue, May 24, 2011, Tian, Kevin wrote about "RE: [PATCH 08/31] nVMX: Fix > local_vcpus_link handling": > > > +static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs) > > > +{ > > > + vmcs_clear(loaded_vmcs->vmcs); > > > + loaded_vmcs->cpu = -1; > > > + loaded_vmcs->launched = 0; > > > +} > > > + > > > > call it vmcs_init instead since you now remove original vmcs_init invocation, > > which more reflect the necessity of adding VMCLEAR for a new vmcs? > > The best name for this function, I think, would have been loaded_vmcs_clear, > because this function isn't necessarily used to "init" - it's also called to > VMCLEAR an old vmcs (and flush its content back to memory) - in that sense > it is definitely not a "vmcs_init". > > Unfortunately, I already have a whole chain of functions with this name :( > the existing loaded_vmcs_clear() does an IPI to the CPU which has this VMCS > loaded, and causes it to run __loaded_vmcs_clear(), which in turn calls > the above loaded_vmcs_init(). I wish I could call all three functions > loaded_vmcs_clear(), but of course I can't. If anyone reading this has a good > suggestion on how to name these three functions, please let me know. how about loaded_vmcs_reset? > > > > +static void __loaded_vmcs_clear(void *arg) > > > { > > > - struct vcpu_vmx *vmx = arg; > > > + struct loaded_vmcs *loaded_vmcs = arg; > > > int cpu = raw_smp_processor_id(); > > > > > > - if (vmx->vcpu.cpu == cpu) > > > - vmcs_clear(vmx->vmcs); > > > - if (per_cpu(current_vmcs, cpu) == vmx->vmcs) > > > + if (loaded_vmcs->cpu != cpu) > > > + return; /* cpu migration can race with cpu offline */ > > > > what do you mean by "cpu migration" here? why does 'cpu offline' > > matter here regarding to the cpu change? > > __loaded_vmcs_clear() is typically called in one of two cases: "cpu migration" > means that a guest that used to run on one CPU, and had its VMCS loaded > there, > suddenly needs to run on a different CPU, so we need to clear the VMCS on > the old CPU. "cpu offline" means that we want to take a certain CPU offline, > and before we do that we should VMCLEAR all VMCSs which were loaded on it. So here you need explicitly differentiate a vcpu and a real cpu. For the 1st case it's just 'vcpu migration", and the latter it's the real 'cpu offline'. 'cpu migration' is generally a RAS feature in mission critical world. :-) > > The (vmx->cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally never > happen: In the cpu offline path, we only call it for the loaded_vmcss which > we know for sure are loaded on the current cpu. In the cpu migration path, > loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which ensures > that > equality. > > But, there can be a race condition (this was actually explained to me a while > back by Avi - I never seen this happening in practice): Imagine that cpu > migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to > VMCLEAR this vmcs. But before that old CPU gets a chance to act on that IPI, > a decision is made to take it offline, and all loaded_vmcs loaded on it > (including the one in question) are cleared. When that CPU acts on this IPI, > it notices that vmx->cpu.cpu==-1, i.e., != cpu, so it doesn't need to do > anything (in the new version of the code, I made this more explicit, by > returning immediately in this case). the reverse also holds true. Right between the point where cpu_offline hits a loaded_vmcs and the point where it calls __loaded_vmcs_clear, it's possible that the vcpu is migrated to another cpu, and it's likely that migration path (vmx_vcpu_load) has invoked loaded_vmcs_clear but hasn't delete this vmcs from old cpu's linked list. This way later when __loaded_vmcs_clear is invoked on the offlined cpu, there's still chance to observe cpu as -1. > > At least this is the theory. As I said, I didn't see this problem in practice > (unsurprising, since I never offlined any CPU). Maybe Avi or someone else can > comment more about this (vmx->cpu.cpu == cpu) check, which existed before > my patch - in __vcpu_clear(). I agree this check is necessary, but just want you to make the comment clear with right term. > > > > +static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) > > > +{ > > > + if (!loaded_vmcs->vmcs) > > > + return; > > > + loaded_vmcs_clear(loaded_vmcs); > > > + free_vmcs(loaded_vmcs->vmcs); > > > + loaded_vmcs->vmcs = NULL; > > > +} > > > > prefer to not do cleanup work through loaded_vmcs since it's just a pointer > > to a loaded vmcs structure. Though you can carefully arrange the nested > > vmcs cleanup happening before it, it's not very clean and a bit error prone > > simply from this function itself. It's clearer to directly cleanup vmcs01, and > > if you want an assertion could be added to make sure loaded_vmcs doesn't > > point to any stale vmcs02 structure after nested cleanup step. > > I'm afraid I didn't understand what you meant here... Basically, this > free_loaded_vmcs() is just a shortcut for loaded_vmcs_clear() and free_vmcs(), > as doing both is needed in 3 places: nested_free_vmcs02, > nested_free_all_saved_vmcss, vmx_free_vcpu. The same function is needed > for both vmcs01 and vmcs02 VMCSs - in both cases when we don't need them > any > more we need to VMCLEAR them and then free the VMCS memory. Note that > this > function does *not* free the loaded_vmcs structure itself. > > What's wrong with this? > Would you prefer that I remove this function and explictly call > loaded_vmcs_clear() and then free_vmcs() in all three places? > Forgot about it. I originally thought this was only used to free vmcs01, and thus wanted to make that purpose obvious. Thanks Kevin