From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling Date: Tue, 24 May 2011 19:20:13 +0800 Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BFA34F7@shsmsx502.ccr.corp.intel.com> References: <20110517181132.GA16262@fermat.math.technion.ac.il> <20110517184336.GA10394@amt.cnet> <20110517193030.GA21656@fermat.math.technion.ac.il> <20110517195253.GB11065@amt.cnet> <20110518055236.GA1230@fermat.math.technion.ac.il> <20110518120801.GA9176@amt.cnet> <20110522085732.GB1116@fermat.math.technion.ac.il> <4DDA81FD.5050203@redhat.com> <20110523185104.GA26899@fermat.math.technion.ac.il> <625BA99ED14B2D499DC4E29D8138F1505C9BEF07A9@shsmsx502.ccr.corp.intel.com> <20110524075656.GA26588@fermat.math.technion.ac.il> <625BA99ED14B2D499DC4E29D8138F1505C9BFA33FE@shsmsx502.ccr.corp.intel.com> <4DDB9105.9050900@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: Nadav Har'El , Marcelo Tosatti , "kvm@vger.kernel.org" , "gleb@redhat.com" , "Roedel, Joerg" To: Avi Kivity Return-path: Received: from mga02.intel.com ([134.134.136.20]:60843 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754291Ab1EXLUT convert rfc822-to-8bit (ORCPT ); Tue, 24 May 2011 07:20:19 -0400 In-Reply-To: <4DDB9105.9050900@redhat.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: > From: Avi Kivity [mailto:avi@redhat.com] > Sent: Tuesday, May 24, 2011 7:06 PM > > On 05/24/2011 11:20 AM, Tian, Kevin wrote: > > > > > > The (vmx->cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally > never > > > happen: In the cpu offline path, we only call it for the loaded_vmcss which > > > we know for sure are loaded on the current cpu. In the cpu migration > path, > > > loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which > ensures > > > that > > > equality. > > > > > > But, there can be a race condition (this was actually explained to me a > while > > > back by Avi - I never seen this happening in practice): Imagine that cpu > > > migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to > > > VMCLEAR this vmcs. But before that old CPU gets a chance to act on that > IPI, > > > a decision is made to take it offline, and all loaded_vmcs loaded on it > > > (including the one in question) are cleared. When that CPU acts on this > IPI, > > > it notices that vmx->cpu.cpu==-1, i.e., != cpu, so it doesn't need to do > > > anything (in the new version of the code, I made this more explicit, by > > > returning immediately in this case). > > > > the reverse also holds true. Right between the point where cpu_offline hits > > a loaded_vmcs and the point where it calls __loaded_vmcs_clear, it's possible > > that the vcpu is migrated to another cpu, and it's likely that migration path > > (vmx_vcpu_load) has invoked loaded_vmcs_clear but hasn't delete this vmcs > > from old cpu's linked list. This way later when __loaded_vmcs_clear is > > invoked on the offlined cpu, there's still chance to observe cpu as -1. > > I don't think it's possible. Both calls are done with interrupts disabled. If that's the case then there's another potential issue. Deadlock may happen when calling smp_call_function_single with interrupt disabled. Thanks Kevin