From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756644AbbCCV06 (ORCPT ); Tue, 3 Mar 2015 16:26:58 -0500 Received: from e37.co.us.ibm.com ([32.97.110.158]:37209 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751617AbbCCV04 (ORCPT ); Tue, 3 Mar 2015 16:26:56 -0500 Date: Tue, 3 Mar 2015 13:26:47 -0800 From: "Paul E. McKenney" To: Boris Ostrovsky Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, x86@kernel.org, Konrad Rzeszutek Wilk , David Vrabel , xen-devel@lists.xenproject.org Subject: Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code Message-ID: <20150303212647.GZ15405@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150303174144.GA13139@linux.vnet.ibm.com> <1425404595-17816-1-git-send-email-paulmck@linux.vnet.ibm.com> <1425404595-17816-2-git-send-email-paulmck@linux.vnet.ibm.com> <54F608C4.40405@oracle.com> <20150303194223.GR15405@linux.vnet.ibm.com> <54F615D3.2040802@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54F615D3.2040802@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15030321-0025-0000-0000-00000903A721 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote: > On 03/03/2015 02:42 PM, Paul E. McKenney wrote: > >On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote: > >>On 03/03/2015 12:42 PM, Paul E. McKenney wrote: > >>> } > >>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu) > >>> schedule_timeout(HZ/10); > >>> } > >>>- cpu_die_common(cpu); > >>>+ (void)cpu_wait_death(cpu, 5); > >>>+ /* FIXME: Are the below calls really safe in case of timeout? */ > >> > >> > >>Not for HVM guests (PV guests will only reach this point after > >>target cpu has been marked as down by the hypervisor). > >> > >>We need at least to have a message similar to what native_cpu_die() > >>prints on cpu_wait_death() failure. And I think we should not call > >>the two routines below (three, actually --- there is also > >>xen_teardown_timer() below, which is not part of the diff). > >> > >>-boris > >> > >> > >>> xen_smp_intr_free(cpu); > >>> xen_uninit_lock_cpu(cpu); > > > >So something like this, then? > > > > if (cpu_wait_death(cpu, 5)) { > > xen_smp_intr_free(cpu); > > xen_uninit_lock_cpu(cpu); > > xen_teardown_timer(cpu); > > } > > else > pr_err("CPU %u didn't die...\n", cpu); > > > > > >Easy change for me to make if so! > > > >Or do I need some other check for HVM-vs.-PV guests, and, if so, what > >would that check be? And also if so, is it OK to online a PV guest's > >CPU that timed out during its previous offline? > > > I believe PV VCPUs will always be CPU_DEAD by the time we get here > since we are (indirectly) waiting for this in the loop at the > beginning of xen_cpu_die(): > > 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, > NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down, > smp_processor_id()' in xen_play_dead(). Which happens after > play_dead_common() has marked the cpu as CPU_DEAD. > > So no test is needed. OK, so I have the following patch on top of my previous patch, which I will merge if testing goes well. So if a CPU times out going offline, the above three functions will not be called, the "didn't die" message will be printed, and any future attempt to online that CPU will fail. Is that the correct semantics? Thanx, Paul ------------------------------------------------------------------------ diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index e2c7389c58c5..f2a06ff0614d 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu) schedule_timeout(HZ/10); } - (void)cpu_wait_death(cpu, 5); - /* FIXME: Are the below calls really safe in case of timeout? */ - - xen_smp_intr_free(cpu); - xen_uninit_lock_cpu(cpu); - xen_teardown_timer(cpu); + if (cpu_wait_death(cpu, 5)) { + xen_smp_intr_free(cpu); + xen_uninit_lock_cpu(cpu); + xen_teardown_timer(cpu); + } else { + pr_err("CPU %u didn't die...\n", cpu); + } } static void xen_play_dead(void) /* used only with HOTPLUG_CPU */