From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757295AbbCCWcA (ORCPT ); Tue, 3 Mar 2015 17:32:00 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:42614 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756309AbbCCWb7 (ORCPT ); Tue, 3 Mar 2015 17:31:59 -0500 Date: Tue, 3 Mar 2015 14:31:51 -0800 From: "Paul E. McKenney" To: Boris Ostrovsky Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, x86@kernel.org, Konrad Rzeszutek Wilk , David Vrabel , xen-devel@lists.xenproject.org Subject: Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code Message-ID: <20150303223151.GC15405@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150303174144.GA13139@linux.vnet.ibm.com> <1425404595-17816-1-git-send-email-paulmck@linux.vnet.ibm.com> <1425404595-17816-2-git-send-email-paulmck@linux.vnet.ibm.com> <54F608C4.40405@oracle.com> <20150303194223.GR15405@linux.vnet.ibm.com> <54F615D3.2040802@oracle.com> <20150303212647.GZ15405@linux.vnet.ibm.com> <54F6307A.8040003@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54F6307A.8040003@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15030322-8236-0000-0000-000009EA92F3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote: > On 03/03/2015 04:26 PM, Paul E. McKenney wrote: > >On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote: > >>On 03/03/2015 02:42 PM, Paul E. McKenney wrote: > >>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote: > >>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote: > >>>>> } > >>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu) > >>>>> schedule_timeout(HZ/10); > >>>>> } > >>>>>- cpu_die_common(cpu); > >>>>>+ (void)cpu_wait_death(cpu, 5); > >>>>>+ /* FIXME: Are the below calls really safe in case of timeout? */ > >>>> > >>>>Not for HVM guests (PV guests will only reach this point after > >>>>target cpu has been marked as down by the hypervisor). > >>>> > >>>>We need at least to have a message similar to what native_cpu_die() > >>>>prints on cpu_wait_death() failure. And I think we should not call > >>>>the two routines below (three, actually --- there is also > >>>>xen_teardown_timer() below, which is not part of the diff). > >>>> > >>>>-boris > >>>> > >>>> > >>>>> xen_smp_intr_free(cpu); > >>>>> xen_uninit_lock_cpu(cpu); > >>>So something like this, then? > >>> > >>> if (cpu_wait_death(cpu, 5)) { > >>> xen_smp_intr_free(cpu); > >>> xen_uninit_lock_cpu(cpu); > >>> xen_teardown_timer(cpu); > >>> } > >> else > >> pr_err("CPU %u didn't die...\n", cpu); > >> > >> > >>>Easy change for me to make if so! > >>> > >>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what > >>>would that check be? And also if so, is it OK to online a PV guest's > >>>CPU that timed out during its previous offline? > >> > >>I believe PV VCPUs will always be CPU_DEAD by the time we get here > >>since we are (indirectly) waiting for this in the loop at the > >>beginning of xen_cpu_die(): > >> > >>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, > >>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down, > >>smp_processor_id()' in xen_play_dead(). Which happens after > >>play_dead_common() has marked the cpu as CPU_DEAD. > >> > >>So no test is needed. > >OK, so I have the following patch on top of my previous patch, which > >I will merge if testing goes well. So if a CPU times out going offline, > >the above three functions will not be called, the "didn't die" message > >will be printed, and any future attempt to online that CPU will fail. > >Is that the correct semantics? > > Yes. > > I am not sure whether not ever onlining the CPU is the best outcome > but then I don't think trying to online it again with all interrupts > and such still set up will work well. And it's an improvement over > what we have now anyway (with current code we may clean up things > for a non-dead cpu). Another strategy is to key off of the return value of cpu_check_up_prepare(). If it returns -EBUSY, then the outgoing CPU finished up after the surviving CPU timed out. The CPU trying to bring the new CPU online could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(), and xen_teardown_timer() at that point. But I must defer to you on this sort of thing. Thanx, Paul > Thanks. > -boris > > > > > > Thanx, Paul > > > >------------------------------------------------------------------------ > > > >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > >index e2c7389c58c5..f2a06ff0614d 100644 > >--- a/arch/x86/xen/smp.c > >+++ b/arch/x86/xen/smp.c > >@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu) > > schedule_timeout(HZ/10); > > } > >- (void)cpu_wait_death(cpu, 5); > >- /* FIXME: Are the below calls really safe in case of timeout? */ > >- > >- xen_smp_intr_free(cpu); > >- xen_uninit_lock_cpu(cpu); > >- xen_teardown_timer(cpu); > >+ if (cpu_wait_death(cpu, 5)) { > >+ xen_smp_intr_free(cpu); > >+ xen_uninit_lock_cpu(cpu); > >+ xen_teardown_timer(cpu); > >+ } else { > >+ pr_err("CPU %u didn't die...\n", cpu); > >+ } > > } > > static void xen_play_dead(void) /* used only with HOTPLUG_CPU */ > > >