linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: paulus@samba.org, oleg@redhat.com, rusty@rustcorp.com.au,
	peterz@infradead.org, tglx@linutronix.de,
	akpm@linux-foundation.org
Cc: mingo@kernel.org, paulmck@linux.vnet.ibm.com, tj@kernel.org,
	walken@google.com, ego@linux.vnet.ibm.com,
	linux@arm.linux.org.uk, linux-kernel@vger.kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com,
	Thomas Gleixner <tglx@linutronix.de>,
	Toshi Kani <toshi.kani@hp.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH 01/51] CPU hotplug: Provide lockless versions of callback registration functions
Date: Thu, 06 Feb 2014 03:34:48 +0530	[thread overview]
Message-ID: <20140205220447.19080.9460.stgit@srivatsabhat.in.ibm.com> (raw)
In-Reply-To: <20140205220251.19080.92336.stgit@srivatsabhat.in.ibm.com>

The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

The deadlock is shown below:

          CPU 0                                         CPU 1
          -----                                         -----

   Acquire cpu_hotplug.lock
   [via get_online_cpus()]

                                              CPU online/offline operation
                                              takes cpu_add_remove_lock
                                              [via cpu_maps_update_begin()]


   Try to acquire
   cpu_add_remove_lock
   [via register_cpu_notifier()]


                                              CPU online/offline operation
                                              tries to acquire cpu_hotplug.lock
                                              [via cpu_hotplug_begin()]


                            *** DEADLOCK! ***

The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.

This can be achieved by writing the callback registration code as follows:

	cpu_maps_update_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* This doesn't take the cpu_add_remove_lock */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_maps_update_done();

Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.

Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.

So, introduce the lockless variants of un/register_cpu_notifier() and also
export the cpu_maps_update_begin/done() APIs for use by modules. This way,
we provide a race-free way to register hotplug callbacks as well as perform
initialization for the CPUs that are already online.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |   36 ++++++++++++++++++++++++++++++++++++
 kernel/cpu.c        |   20 ++++++++++++++++++--
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 03e235ad..eb97e37 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -122,26 +122,46 @@ enum {
 		{ .notifier_call = fn, .priority = pri };	\
 	register_cpu_notifier(&fn##_nb);			\
 }
+
+#define __cpu_notifier(fn, pri) {				\
+	static struct notifier_block fn##_nb =			\
+		{ .notifier_call = fn, .priority = pri };	\
+	__register_cpu_notifier(&fn##_nb);			\
+}
 #else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */
 #define cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */
+
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int __register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+extern void __unregister_cpu_notifier(struct notifier_block *nb);
 #else
 
 #ifndef MODULE
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int __register_cpu_notifier(struct notifier_block *nb);
 #else
 static inline int register_cpu_notifier(struct notifier_block *nb)
 {
 	return 0;
 }
+
+static inline int __register_cpu_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
 #endif
 
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
 {
 }
+
+static inline void __unregister_cpu_notifier(struct notifier_block *nb)
+{
+}
 #endif
 
 int cpu_up(unsigned int cpu);
@@ -152,16 +172,26 @@ extern void cpu_maps_update_done(void);
 #else	/* CONFIG_SMP */
 
 #define cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 
 static inline int register_cpu_notifier(struct notifier_block *nb)
 {
 	return 0;
 }
 
+static inline int __register_cpu_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
 {
 }
 
+static inline void __unregister_cpu_notifier(struct notifier_block *nb)
+{
+}
+
 static inline void cpu_maps_update_begin(void)
 {
 }
@@ -183,8 +213,11 @@ extern void put_online_cpus(void);
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
+#define __hotcpu_notifier(fn, pri)	__cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
+#define __register_hotcpu_notifier(nb)	__register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
+#define __unregister_hotcpu_notifier(nb)	__unregister_cpu_notifier(nb)
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
 
@@ -197,9 +230,12 @@ static inline void cpu_hotplug_done(void) {}
 #define cpu_hotplug_disable()	do { } while (0)
 #define cpu_hotplug_enable()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
+#define __register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+#define __unregister_hotcpu_notifier(nb)	({ (void)(nb); })
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
diff --git a/kernel/cpu.c b/kernel/cpu.c
index deff2e6..12a3a74 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -27,18 +27,22 @@
 static DEFINE_MUTEX(cpu_add_remove_lock);
 
 /*
- * The following two API's must be used when attempting
- * to serialize the updates to cpu_online_mask, cpu_present_mask.
+ * The following two API's must be used when attempting to serialize
+ * the updates to cpu_online_mask, cpu_present_mask. Also, they must
+ * be used to protect CPU hotplug callback (un)registration performed
+ * using __register_cpu_notifier() or __unregister_cpu_notifier().
  */
 void cpu_maps_update_begin(void)
 {
 	mutex_lock(&cpu_add_remove_lock);
 }
+EXPORT_SYMBOL(cpu_maps_update_begin);
 
 void cpu_maps_update_done(void)
 {
 	mutex_unlock(&cpu_add_remove_lock);
 }
+EXPORT_SYMBOL(cpu_maps_update_done);
 
 static RAW_NOTIFIER_HEAD(cpu_chain);
 
@@ -166,6 +170,11 @@ int __ref register_cpu_notifier(struct notifier_block *nb)
 	return ret;
 }
 
+int __ref __register_cpu_notifier(struct notifier_block *nb)
+{
+	return raw_notifier_chain_register(&cpu_chain, nb);
+}
+
 static int __cpu_notify(unsigned long val, void *v, int nr_to_call,
 			int *nr_calls)
 {
@@ -189,6 +198,7 @@ static void cpu_notify_nofail(unsigned long val, void *v)
 	BUG_ON(cpu_notify(val, v));
 }
 EXPORT_SYMBOL(register_cpu_notifier);
+EXPORT_SYMBOL(__register_cpu_notifier);
 
 void __ref unregister_cpu_notifier(struct notifier_block *nb)
 {
@@ -198,6 +208,12 @@ void __ref unregister_cpu_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_cpu_notifier);
 
+void __ref __unregister_cpu_notifier(struct notifier_block *nb)
+{
+	raw_notifier_chain_unregister(&cpu_chain, nb);
+}
+EXPORT_SYMBOL(__unregister_cpu_notifier);
+
 /**
  * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
  * @cpu: a CPU id


  reply	other threads:[~2014-02-05 22:10 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-05 22:04 [PATCH 00/51] CPU hotplug: Fix issues with callback registration Srivatsa S. Bhat
2014-02-05 22:04 ` Srivatsa S. Bhat [this message]
2014-02-06 18:41   ` [PATCH 01/51] CPU hotplug: Provide lockless versions of callback registration functions Oleg Nesterov
2014-02-07 19:11     ` Gautham R Shenoy
2014-02-10  9:15       ` Srivatsa S. Bhat
2014-02-10 10:51         ` Gautham R Shenoy
2014-02-10 11:11           ` Srivatsa S. Bhat
2014-02-10 12:05             ` Gautham R Shenoy
2014-02-10 13:28               ` Srivatsa S. Bhat
2014-02-10 13:30           ` Srivatsa S. Bhat
2014-02-10 15:30           ` Oleg Nesterov
2014-02-10 17:27           ` Balbir Singh
2014-02-11  1:26   ` Toshi Kani
2014-02-11  9:27     ` Srivatsa S. Bhat
2014-02-11 16:33       ` Toshi Kani
2014-02-11 17:18         ` Gautham R Shenoy
2014-02-11 17:35           ` Toshi Kani
2014-02-11 19:20             ` Srivatsa S. Bhat
2014-02-11 20:51               ` Toshi Kani
2014-02-12  6:18                 ` Srivatsa S. Bhat
2014-02-13 10:56                   ` Srivatsa S. Bhat
2014-02-13 20:53                     ` Toshi Kani
2014-02-11 17:15       ` Oleg Nesterov
2014-02-11 19:08         ` Srivatsa S. Bhat
2014-02-13 17:44           ` Oleg Nesterov
2014-02-13 17:54             ` Srivatsa S. Bhat
2014-02-13 11:06     ` Gautham R Shenoy
2014-02-05 22:04 ` [PATCH 02/51] Doc/cpu-hotplug: Specify race-free way to register CPU hotplug callbacks Srivatsa S. Bhat
2014-02-05 22:05 ` [PATCH 03/51] CPU hotplug, perf: Fix CPU hotplug callback registration Srivatsa S. Bhat
2014-02-05 22:05 ` [PATCH 04/51] ia64, salinfo: Fix " Srivatsa S. Bhat
2014-02-05 22:05 ` [PATCH 05/51] ia64, palinfo: Fix CPU " Srivatsa S. Bhat
2014-02-05 22:05 ` [PATCH 06/51] ia64, topology: " Srivatsa S. Bhat
2014-02-05 22:05 ` [PATCH 07/51] ia64, err-inject: " Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 08/51] arm, hw-breakpoint: " Srivatsa S. Bhat
2014-02-06 10:57   ` Will Deacon
2014-02-06 11:25     ` Srivatsa S. Bhat
2014-02-06 11:39       ` Will Deacon
2014-02-06 11:38         ` Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 09/51] arm, kvm: " Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 10/51] s390, cacheinfo: " Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 11/51] s390, smp: " Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 12/51] sparc, sysfs: " Srivatsa S. Bhat
2014-02-05 22:06 ` [PATCH 13/51] powerpc, " Srivatsa S. Bhat
2014-02-14  6:47   ` Madhavan Srinivasan
2014-02-05 22:07 ` [PATCH 14/51] x86, msr: " Srivatsa S. Bhat
2014-02-05 22:07 ` [PATCH 15/51] x86, cpuid: " Srivatsa S. Bhat
2014-02-05 22:07 ` [PATCH 16/51] x86, vsyscall: " Srivatsa S. Bhat
2014-02-10 18:50   ` Gautham R Shenoy
2014-02-11  6:58     ` Srivatsa S. Bhat
2014-02-05 22:07 ` [PATCH 17/51] x86, intel, uncore: " Srivatsa S. Bhat
2014-02-05 22:07 ` [PATCH 18/51] x86, mce: " Srivatsa S. Bhat
2014-02-05 22:08 ` [PATCH 19/51] x86, therm_throt.c: " Srivatsa S. Bhat
2014-02-10 15:53   ` Oleg Nesterov
2014-02-10 17:29     ` Srivatsa S. Bhat
2014-02-10 18:04       ` Srivatsa S. Bhat
2014-02-05 22:08 ` [PATCH 20/51] x86, amd, ibs: " Srivatsa S. Bhat
2014-02-05 22:08 ` [PATCH 21/51] x86, intel, cacheinfo: " Srivatsa S. Bhat
2014-02-05 22:08 ` [PATCH 22/51] x86, intel, rapl: " Srivatsa S. Bhat
2014-02-05 22:08 ` [PATCH 23/51] x86, amd, uncore: " Srivatsa S. Bhat
2014-02-05 22:09 ` [PATCH 24/51] x86, hpet: " Srivatsa S. Bhat
2014-02-10 18:58   ` Gautham R Shenoy
2014-02-11  6:59     ` Srivatsa S. Bhat
2014-02-05 22:09 ` [PATCH 25/51] x86, pci, amd-bus: " Srivatsa S. Bhat
2014-02-05 22:09 ` [PATCH 26/51] x86, oprofile, nmi: " Srivatsa S. Bhat
2014-02-10 19:07   ` Gautham R Shenoy
2014-02-10 19:27     ` Gautham R Shenoy
2014-02-11  7:01       ` Srivatsa S. Bhat
2014-02-05 22:09 ` [PATCH 27/51] x86, kvm: " Srivatsa S. Bhat
2014-02-05 22:09 ` [PATCH 28/51] arm64, hw_breakpoint.c: " Srivatsa S. Bhat
2014-02-06 11:41   ` Will Deacon
2014-02-05 22:09 ` [PATCH 29/51] arm64, debug-monitors: " Srivatsa S. Bhat
2014-02-06 11:41   ` Will Deacon
2014-02-05 22:10 ` [PATCH 30/51] powercap, intel-rapl: " Srivatsa S. Bhat
2014-02-05 22:10 ` [PATCH 31/51] scsi, bnx2i: " Srivatsa S. Bhat
2014-02-05 22:10 ` [PATCH 32/51] scsi, bnx2fc: " Srivatsa S. Bhat
2014-02-05 22:10 ` [PATCH 33/51] scsi, fcoe: " Srivatsa S. Bhat
2014-02-05 22:10 ` [PATCH 34/51] zsmalloc: " Srivatsa S. Bhat
2014-02-05 22:10 ` [PATCH 35/51] acpi-cpufreq: " Srivatsa S. Bhat
2014-02-06 12:43   ` Rafael J. Wysocki
2014-02-06 16:05     ` Srivatsa S. Bhat
2014-02-07  4:09   ` Viresh Kumar
2014-02-05 22:11 ` [PATCH 36/51] drivers/base/topology.c: " Srivatsa S. Bhat
2014-02-05 22:11 ` [PATCH 37/51] clocksource, dummy-timer: " Srivatsa S. Bhat
2014-02-05 22:11 ` [PATCH 38/51] intel-idle: " Srivatsa S. Bhat
2014-02-06 12:43   ` Rafael J. Wysocki
2014-02-06 16:04     ` Srivatsa S. Bhat
2014-02-05 22:11 ` [PATCH 39/51] oprofile, nmi-timer: " Srivatsa S. Bhat
2014-02-05 22:11 ` [PATCH 40/51] octeon, watchdog: " Srivatsa S. Bhat
2014-02-05 22:11 ` [PATCH 41/51] thermal, x86-pkg-temp: " Srivatsa S. Bhat
2014-02-05 22:12 ` [PATCH 42/51] hwmon, coretemp: " Srivatsa S. Bhat
2014-02-06  0:44   ` Guenter Roeck
2014-02-06  1:25     ` Guenter Roeck
2014-02-06 10:03       ` Srivatsa S. Bhat
2014-02-05 22:12 ` [PATCH 43/51] hwmon, via-cputemp: " Srivatsa S. Bhat
2014-02-06  0:44   ` Guenter Roeck
2014-02-06  1:26     ` Guenter Roeck
2014-02-05 22:12 ` [PATCH 44/51] xen, balloon: " Srivatsa S. Bhat
2014-02-05 22:12 ` [PATCH 45/51] md, raid5: " Srivatsa S. Bhat
2014-02-06  1:11   ` NeilBrown
2014-02-06 10:05     ` Srivatsa S. Bhat
2014-02-06 18:43       ` Oleg Nesterov
2014-02-05 22:12 ` [PATCH 46/51] trace, ring-buffer: " Srivatsa S. Bhat
2014-02-05 23:41   ` Steven Rostedt
2014-02-05 22:13 ` [PATCH 47/51] profile: " Srivatsa S. Bhat
2014-02-05 22:13 ` [PATCH 48/51] mm, vmstat: " Srivatsa S. Bhat
2014-02-06 15:35   ` Christoph Lameter
2014-02-07  2:52   ` Yasuaki Ishimatsu
2014-02-05 22:13 ` [PATCH 49/51] mm, zswap: " Srivatsa S. Bhat
2014-02-05 22:13 ` [PATCH 50/51] net/core/flow.c: " Srivatsa S. Bhat
2014-02-07  4:39   ` David Miller
2014-02-07  5:19     ` David Miller
2014-02-05 22:13 ` [PATCH 51/51] net/iucv/iucv.c: " Srivatsa S. Bhat
2014-02-07  4:39   ` David Miller
2014-02-07  5:19     ` David Miller
2014-02-06  9:38 ` [PATCH 00/51] CPU hotplug: Fix issues with " Gautham R Shenoy
2014-02-06 11:04   ` Srivatsa S. Bhat
2014-02-06 11:08     ` Srivatsa S. Bhat
2014-02-06 12:14     ` Gautham R Shenoy
2014-02-06 16:09       ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140205220447.19080.9460.stgit@srivatsabhat.in.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).