From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Srivatsa S. Bhat" Subject: [UPDATED][PATCH v2 46/52] xen, balloon: Fix CPU hotplug callback registration Date: Sat, 15 Feb 2014 22:21:32 +0530 Message-ID: <52FF9B14.8000308__12051.4291259044$1392483562$gmane$org@linux.vnet.ibm.com> References: <20140214074750.22701.47330.stgit@srivatsabhat.in.ibm.com> <20140214075935.22701.71000.stgit@srivatsabhat.in.ibm.com> <52FE490B.8000908@oracle.com> <52FE493A.2030206@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WEiYG-0007Ic-4i for xen-devel@lists.xenproject.org; Sat, 15 Feb 2014 16:57:16 +0000 Received: from /spool/local by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 16 Feb 2014 02:57:07 +1000 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [9.190.234.120]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id CEBE03578056 for ; Sun, 16 Feb 2014 03:57:02 +1100 (EST) Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s1FGbTAM7995746 for ; Sun, 16 Feb 2014 03:37:30 +1100 Received: from d23av02.au.ibm.com (localhost [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s1FGv0pa015540 for ; Sun, 16 Feb 2014 03:57:01 +1100 In-Reply-To: <52FE493A.2030206@linux.vnet.ibm.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Boris Ostrovsky Cc: linux-arch@vger.kernel.org, ego@linux.vnet.ibm.com, walken@google.com, linux@arm.linux.org.uk, akpm@linux-foundation.org, peterz@infradead.org, rusty@rustcorp.com.au, rjw@rjwysocki.net, oleg@redhat.com, linux-kernel@vger.kernel.org, paulus@samba.org, David Vrabel , tj@kernel.org, xen-devel@lists.xenproject.org, tglx@linutronix.de, paulmck@linux.vnet.ibm.com, mingo@kernel.org List-Id: xen-devel@lists.xenproject.org On 02/14/2014 10:20 PM, Srivatsa S. Bhat wrote: > On 02/14/2014 10:19 PM, Boris Ostrovsky wrote: >> On 02/14/2014 02:59 AM, Srivatsa S. Bhat wrote: >>> Subsystems that want to register CPU hotplug callbacks, as well as >>> perform >>> initialization for the CPUs that are already online, often do it as shown >>> below: >>> [...] >> This looks exactly like the earlier version (i.e the notifier is still >> kept registered on allocation failure and commit message doesn't exactly >> reflect the change). >> > > Sorry, your earlier reply (for some unknown reason) missed the email-threading > and landed elsewhere in my inbox, and hence unfortunately I forgot to take > your suggestions into account while sending out the v2. > > I'll send out an updated version of just this patch, as a reply. Here is the updated patch. Please let me know what you think! ---------------------------------------------------------------------------- From: Srivatsa S. Bhat Subject: [PATCH] xen, balloon: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). The xen balloon driver doesn't take get/put_online_cpus() around this code, but that is also buggy, since it can miss CPU hotplug events in between the initialization and callback registration: for_each_online_cpu(cpu) init_cpu(cpu); ^ | Race window; Can miss CPU hotplug events here. v register_cpu_notifier(&foobar_cpu_notifier); Interestingly, the balloon code in xen can simply be reorganized as shown below, to have a race-free method to register hotplug callbacks, without even taking get/put_online_cpus(). This is because the initialization performed for already online CPUs is exactly the same as that performed for CPUs that come online later. Moreover, the code has checks in place to avoid double initialization. register_cpu_notifier(&foobar_cpu_notifier); get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); put_online_cpus(); A hotplug operation that occurs between registering the notifier and calling get_online_cpus(), won't disrupt anything, because the code takes care to perform the memory allocations only once. So reorganize the balloon code in xen this way to fix the issues with CPU hotplug callback registration. Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Vrabel Cc: Ingo Molnar Cc: xen-devel@lists.xenproject.org Signed-off-by: Srivatsa S. Bhat --- drivers/xen/balloon.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 37d06ea..dd79549 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned long start_pfn, } } +static int alloc_balloon_scratch_page(int cpu) +{ + if (per_cpu(balloon_scratch_page, cpu) != NULL) + return 0; + + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); + if (per_cpu(balloon_scratch_page, cpu) == NULL) { + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu); + return -ENOMEM; + } + + return 0; +} + + static int balloon_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { int cpu = (long)hcpu; switch (action) { case CPU_UP_PREPARE: - if (per_cpu(balloon_scratch_page, cpu) != NULL) - break; - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); - if (per_cpu(balloon_scratch_page, cpu) == NULL) { - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu); + if (alloc_balloon_scratch_page(cpu)) return NOTIFY_BAD; - } break; default: break; @@ -624,15 +634,17 @@ static int __init balloon_init(void) return -ENODEV; if (!xen_feature(XENFEAT_auto_translated_physmap)) { - for_each_online_cpu(cpu) - { - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); - if (per_cpu(balloon_scratch_page, cpu) == NULL) { - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu); + register_cpu_notifier(&balloon_cpu_notifier); + + get_online_cpus(); + for_each_online_cpu(cpu) { + if (alloc_balloon_scratch_page(cpu)) { + put_online_cpus(); + unregister_cpu_notifier(&balloon_cpu_notifier); return -ENOMEM; } } - register_cpu_notifier(&balloon_cpu_notifier); + put_online_cpus(); } pr_info("Initialising balloon driver\n");