All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/20] CPU hotplug updates for v4.1
@ 2015-03-03 17:41 Paul E. McKenney
  2015-03-03 17:42   ` Paul E. McKenney
  0 siblings, 1 reply; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

Hello!

This series updates RCU's handling of CPU hotplug offline operations,
allowing RCU to have a precise notification of when it should start
ignoring a CPU.  This allowed detection of some illegal use of RCU by
offline CPUs, and this series contains fixes for these.  A similar
problem exists for CPU onlining, but will be addressed later.  One
CPU-hotplug dragon at a time.

1.	Add common code for notification from dying CPU.  This is
	part of the fix for issues uncovered by improved detection,
	but is placed first to avoid messing up bisection.

2-4.	Use #1 for x86, blackfin, and metag.  (ARM also has this problem,
	but ARM's maintainers are working on their own fix.)

5.	Remove duplicate offline-CPU callback-list initialization.
	This simplifies later changes to RCU's handling of offlining
	operations.

6.	Improve code readability in rcu_cleanup_dead_cpu().  Simple
	code motion, no semantic change.

7.	Eliminate a boolean variable and "if" statement by rearranging
	sync_rcu_preempt_exp_init()'s checks for CPUs not having blocked
	tasks.

8.	Eliminate empty CONFIG_HOTPLUG_CPU #ifdef.

9.	Add diagnostics to detect when RCU CPU stall warnings have been
	caused by failure to propagate quiescent states up the rcu_node
	combining tree.

10.	Provide CONFIG_RCU_TORTURE_TEST_SLOW_INIT Kconfig option to
	artificially slow down grace-period initialization, thus increasing
	the probability of detecting races with this initialization process.

11.	Update data files to enable CONFIG_RCU_TORTURE_TEST_SLOW_INIT
	by default during rcutorture testing, but leave the default
	time at zero.  This default may be overridden by passing
	"--bootargs rcutree.gp_init_delay=1" to kvm.sh.

12.	Remove event tracing from rcu_cpu_notify(), which is invoked
	by offline CPUs.  (Event tracing uses RCU.)

13.	Change meaning of ->expmask bitmasks to track blocked tasks
	rather than online CPUs.

14.	Move rcu_report_unblock_qs_rnp() to common code.  This will
	make it easier to provide proper locking protection.

15.	Avoid races between CPU-hotplug operations and RCU grace-period
	initialization by processing CPU-hotplug changes only at the
	start of each grace period.  This works because RCU need not
	wait on a CPU that came online after the start of the current
	grace period.

16.	Eliminate the no-longer-needed ->onoff_mutex from the rcu_node
	structure.  This is the only sleeplock acquired during RCU's
	CPU-hotplug processing, which in turn allows rcu_cpu_notify()
	to be invoked from the preemption-disabled idle loop.

17.	Use a per-CPU variable to make the CPU-offline idle-loop
	transition point precise.  (RCU's magic one-jiffy grace-period
	wait for offline CPUs must remain until the analogous online
	issue is addressed.)

18.	Invoke rcu_cpu_notify() with a new CPU_DYING_IDLE op just before
	the idle-loop invocation of arch_cpu_idle_dead().

19.	Now that CPU-hotplug events are applied only during grace-period
	initialization, it is safe to unconditionally enable slow
	grace-period initialization for rcutorture testing.  Note
	that this delay is applied randomly in order to get a good
	mix of fast and slow grace-period initialization.

20.	Add checks that all quiescent states were received at grace-period
	cleanup time.

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/kernel-parameters.txt                     |    6 
 b/arch/blackfin/mach-common/smp.c                         |    6 
 b/arch/metag/kernel/smp.c                                 |    5 
 b/arch/x86/include/asm/cpu.h                              |    2 
 b/arch/x86/include/asm/smp.h                              |    1 
 b/arch/x86/kernel/smpboot.c                               |   29 -
 b/arch/x86/xen/smp.c                                      |   18 
 b/include/linux/cpu.h                                     |   14 
 b/include/linux/rcupdate.h                                |    2 
 b/kernel/cpu.c                                            |    4 
 b/kernel/rcu/tree.c                                       |  279 ++++++++++----
 b/kernel/rcu/tree.h                                       |   11 
 b/kernel/rcu/tree_plugin.h                                |  121 +-----
 b/kernel/rcu/tree_trace.c                                 |    4 
 b/kernel/sched/idle.c                                     |    9 
 b/kernel/smpboot.c                                        |  156 +++++++
 b/lib/Kconfig.debug                                       |   26 +
 b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon |    1 
 18 files changed, 492 insertions(+), 202 deletions(-)


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 01/20] smpboot: Add common code for notification from dying CPU
  2015-03-03 17:41 [PATCH tip/core/rcu 0/20] CPU hotplug updates for v4.1 Paul E. McKenney
@ 2015-03-03 17:42   ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, linux-api, linux-arch

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
(They -can- use SRCU, but not RCU.)  This means that any use of RCU
during or after the call to arch_cpu_idle_dead().  Unfortunately,
commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
read-side critical sections if there is a task waiting to be awakened.

Which, as it turns out, there almost never is.  In my qemu/KVM testing,
the to-be-awakened task is not yet asleep more than 99.5% of the time.
In current mainline, failure is even harder to reproduce, requiring a
virtualized environment that delays the outgoing CPU by at least three
jiffies between the time it exits its stop_machine() task at CPU_DYING
time and the time it calls arch_cpu_idle_dead() from the idle loop.
However, this problem really can occur, especially in virtualized
environments, and therefore really does need to be fixed

This suggests moving back to the polling loop, but using a much shorter
wait, with gentle exponential backoff instead of the old 100-millisecond
wait.  Most of the time, the loop will exit without waiting at all,
and almost all of the remaining uses will wait only five microseconds.
If the outgoing CPU is preempted, a loop will wait one jiffy, then
increase the wait by a factor of 11/10ths, rounding up.  As before, there
is a five-second timeout.

This commit therefore provides common-code infrastructure to do the
dying-to-surviving CPU handoff in a safe manner.  This code also
provides an indication at CPU-online of whether the CPU to be onlined
previously timed out on offline.  The new cpu_check_up_prepare() function
returns -EBUSY if this CPU previously took more than five seconds to
go offline, or -EAGAIN if it has not yet managed to go offline.  The
rationale for -EAGAIN is that it might still be preempted, so an additional
wait might well find it correctly offlined.  Architecture-specific code
can decide how to handle these conditions.  Systems in which CPUs take
themselves completely offline might respond to an -EBUSY return as if
it was a zero (success) return.  Systems in which the surviving CPU must
take some action might take it at this time, or might simply mark the
other CPU as unusable.

Note that architectures that take the easy way out and simply pass the
-EBUSY and -EAGAIN upwards will change the sysfs API.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
---
 include/linux/cpu.h |  12 ++++
 kernel/smpboot.c    | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 168 insertions(+)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4260e8594bd7..4744ef915acd 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -95,6 +95,8 @@ enum {
 					* Called on the new cpu, just before
 					* enabling interrupts. Must not sleep,
 					* must not fail */
+#define CPU_BROKEN		0x000C /* CPU (unsigned)v did not die properly,
+					* perhaps due to preemption. */
 
 /* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
  * operation in progress
@@ -271,4 +273,14 @@ void arch_cpu_idle_enter(void);
 void arch_cpu_idle_exit(void);
 void arch_cpu_idle_dead(void);
 
+DECLARE_PER_CPU(bool, cpu_dead_idle);
+
+int cpu_report_state(int cpu);
+int cpu_check_up_prepare(int cpu);
+void cpu_set_state_online(int cpu);
+#ifdef CONFIG_HOTPLUG_CPU
+bool cpu_wait_death(unsigned int cpu, int seconds);
+bool cpu_report_death(void);
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 40190f28db35..065499f432ec 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -4,6 +4,7 @@
 #include <linux/cpu.h>
 #include <linux/err.h>
 #include <linux/smp.h>
+#include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/slab.h>
@@ -314,3 +315,158 @@ void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
+
+static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
+
+/*
+ * Called to poll specified CPU's state, for example, when waiting for
+ * a CPU to come online.
+ */
+int cpu_report_state(int cpu)
+{
+	return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+}
+
+/*
+ * If CPU has died properly, set its state to CPU_UP_PREPARE and
+ * return success.  Otherwise, return -EBUSY if the CPU died after
+ * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
+ * if cpu_wait_death() timed out and the CPU still hasn't gotten around
+ * to dying.  In the latter two cases, the CPU might not be set up
+ * properly, but it is up to the arch-specific code to decide.
+ * Finally, -EIO indicates an unanticipated problem.
+ *
+ * Note that it is permissible to omit this call entirely, as is
+ * done in architectures that do no CPU-hotplug error checking.
+ */
+int cpu_check_up_prepare(int cpu)
+{
+	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
+		return 0;
+	}
+
+	switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
+
+	case CPU_POST_DEAD:
+
+		/* The CPU died properly, so just start it up again. */
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
+		return 0;
+
+	case CPU_TASKS_FROZEN | CPU_DEAD:
+
+		/*
+		 * Timeout during CPU death, so let caller know.
+		 * The outgoing CPU completed its processing, but after
+		 * cpu_wait_death() timed out and reported the error. The
+		 * caller is free to proceed, in which case the state
+		 * will be reset properly by cpu_set_state_online().
+		 * Proceeding despite this -EBUSY return makes sense
+		 * for systems where the outgoing CPUs take themselves
+		 * offline, with no post-death manipulation required from
+		 * a surviving CPU.
+		 */
+		return -EBUSY;
+
+	case CPU_BROKEN:
+
+		/*
+		 * The most likely reason we got here is that there was
+		 * a timeout during CPU death, and the outgoing CPU never
+		 * did complete its processing.  This could happen on
+		 * a virtualized system if the outgoing VCPU gets preempted
+		 * for more than five seconds, and the user attempts to
+		 * immediately online that same CPU.  Trying again later
+		 * might return -EBUSY above, hence -EAGAIN.
+		 */
+		return -EAGAIN;
+
+	default:
+
+		/* Should not happen.  Famous last words. */
+		return -EIO;
+	}
+}
+
+/*
+ * Mark the specified CPU online.
+ *
+ * Note that it is permissible to omit this call entirely, as is
+ * done in architectures that do no CPU-hotplug error checking.
+ */
+void cpu_set_state_online(int cpu)
+{
+	(void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+/*
+ * Wait for the specified CPU to exit the idle loop and die.
+ */
+bool cpu_wait_death(unsigned int cpu, int seconds)
+{
+	int jf_left = seconds * HZ;
+	int oldstate;
+	bool ret = true;
+	int sleep_jf = 1;
+
+	might_sleep();
+
+	/* The outgoing CPU will normally get done quite quickly. */
+	if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
+		goto update_state;
+	udelay(5);
+
+	/* But if the outgoing CPU dawdles, wait increasingly long times. */
+	while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
+		schedule_timeout_uninterruptible(sleep_jf);
+		jf_left -= sleep_jf;
+		if (jf_left <= 0)
+			break;
+		sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
+	}
+update_state:
+	oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+	if (oldstate == CPU_DEAD) {
+		/* Outgoing CPU died normally, update state. */
+		smp_mb(); /* atomic_read() before update. */
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
+	} else {
+		/* Outgoing CPU still hasn't died, set state accordingly. */
+		if (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
+				   oldstate, CPU_BROKEN) != oldstate)
+			goto update_state;
+		ret = false;
+	}
+	return ret;
+}
+
+/*
+ * Called by the outgoing CPU to report its successful death.  Return
+ * false if this report follows the surviving CPU's timing out.
+ *
+ * A separate "CPU_TASKS_FROZEN | CPU_DEAD" is used when the surviving
+ * CPU timed out.  This approach allows architectures to omit calls to
+ * cpu_check_up_prepare() and cpu_set_state_online() without defeating
+ * the next cpu_wait_death()'s polling loop.
+ */
+bool cpu_report_death(void)
+{
+	int oldstate;
+	int newstate;
+	int cpu = smp_processor_id();
+
+	do {
+		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+		if (oldstate == CPU_ONLINE)
+			newstate = CPU_DEAD;
+		else
+			newstate = CPU_TASKS_FROZEN | CPU_DEAD;
+	} while (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
+				oldstate, newstate) != oldstate);
+	return newstate == CPU_DEAD;
+}
+
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 01/20] smpboot: Add common code for notification from dying CPU
@ 2015-03-03 17:42   ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, linux-api, linux-arch

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
(They -can- use SRCU, but not RCU.)  This means that any use of RCU
during or after the call to arch_cpu_idle_dead().  Unfortunately,
commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
read-side critical sections if there is a task waiting to be awakened.

Which, as it turns out, there almost never is.  In my qemu/KVM testing,
the to-be-awakened task is not yet asleep more than 99.5% of the time.
In current mainline, failure is even harder to reproduce, requiring a
virtualized environment that delays the outgoing CPU by at least three
jiffies between the time it exits its stop_machine() task at CPU_DYING
time and the time it calls arch_cpu_idle_dead() from the idle loop.
However, this problem really can occur, especially in virtualized
environments, and therefore really does need to be fixed

This suggests moving back to the polling loop, but using a much shorter
wait, with gentle exponential backoff instead of the old 100-millisecond
wait.  Most of the time, the loop will exit without waiting at all,
and almost all of the remaining uses will wait only five microseconds.
If the outgoing CPU is preempted, a loop will wait one jiffy, then
increase the wait by a factor of 11/10ths, rounding up.  As before, there
is a five-second timeout.

This commit therefore provides common-code infrastructure to do the
dying-to-surviving CPU handoff in a safe manner.  This code also
provides an indication at CPU-online of whether the CPU to be onlined
previously timed out on offline.  The new cpu_check_up_prepare() function
returns -EBUSY if this CPU previously took more than five seconds to
go offline, or -EAGAIN if it has not yet managed to go offline.  The
rationale for -EAGAIN is that it might still be preempted, so an additional
wait might well find it correctly offlined.  Architecture-specific code
can decide how to handle these conditions.  Systems in which CPUs take
themselves completely offline might respond to an -EBUSY return as if
it was a zero (success) return.  Systems in which the surviving CPU must
take some action might take it at this time, or might simply mark the
other CPU as unusable.

Note that architectures that take the easy way out and simply pass the
-EBUSY and -EAGAIN upwards will change the sysfs API.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
---
 include/linux/cpu.h |  12 ++++
 kernel/smpboot.c    | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 168 insertions(+)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4260e8594bd7..4744ef915acd 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -95,6 +95,8 @@ enum {
 					* Called on the new cpu, just before
 					* enabling interrupts. Must not sleep,
 					* must not fail */
+#define CPU_BROKEN		0x000C /* CPU (unsigned)v did not die properly,
+					* perhaps due to preemption. */
 
 /* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
  * operation in progress
@@ -271,4 +273,14 @@ void arch_cpu_idle_enter(void);
 void arch_cpu_idle_exit(void);
 void arch_cpu_idle_dead(void);
 
+DECLARE_PER_CPU(bool, cpu_dead_idle);
+
+int cpu_report_state(int cpu);
+int cpu_check_up_prepare(int cpu);
+void cpu_set_state_online(int cpu);
+#ifdef CONFIG_HOTPLUG_CPU
+bool cpu_wait_death(unsigned int cpu, int seconds);
+bool cpu_report_death(void);
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 40190f28db35..065499f432ec 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -4,6 +4,7 @@
 #include <linux/cpu.h>
 #include <linux/err.h>
 #include <linux/smp.h>
+#include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/slab.h>
@@ -314,3 +315,158 @@ void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
+
+static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
+
+/*
+ * Called to poll specified CPU's state, for example, when waiting for
+ * a CPU to come online.
+ */
+int cpu_report_state(int cpu)
+{
+	return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+}
+
+/*
+ * If CPU has died properly, set its state to CPU_UP_PREPARE and
+ * return success.  Otherwise, return -EBUSY if the CPU died after
+ * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
+ * if cpu_wait_death() timed out and the CPU still hasn't gotten around
+ * to dying.  In the latter two cases, the CPU might not be set up
+ * properly, but it is up to the arch-specific code to decide.
+ * Finally, -EIO indicates an unanticipated problem.
+ *
+ * Note that it is permissible to omit this call entirely, as is
+ * done in architectures that do no CPU-hotplug error checking.
+ */
+int cpu_check_up_prepare(int cpu)
+{
+	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
+		return 0;
+	}
+
+	switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
+
+	case CPU_POST_DEAD:
+
+		/* The CPU died properly, so just start it up again. */
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
+		return 0;
+
+	case CPU_TASKS_FROZEN | CPU_DEAD:
+
+		/*
+		 * Timeout during CPU death, so let caller know.
+		 * The outgoing CPU completed its processing, but after
+		 * cpu_wait_death() timed out and reported the error. The
+		 * caller is free to proceed, in which case the state
+		 * will be reset properly by cpu_set_state_online().
+		 * Proceeding despite this -EBUSY return makes sense
+		 * for systems where the outgoing CPUs take themselves
+		 * offline, with no post-death manipulation required from
+		 * a surviving CPU.
+		 */
+		return -EBUSY;
+
+	case CPU_BROKEN:
+
+		/*
+		 * The most likely reason we got here is that there was
+		 * a timeout during CPU death, and the outgoing CPU never
+		 * did complete its processing.  This could happen on
+		 * a virtualized system if the outgoing VCPU gets preempted
+		 * for more than five seconds, and the user attempts to
+		 * immediately online that same CPU.  Trying again later
+		 * might return -EBUSY above, hence -EAGAIN.
+		 */
+		return -EAGAIN;
+
+	default:
+
+		/* Should not happen.  Famous last words. */
+		return -EIO;
+	}
+}
+
+/*
+ * Mark the specified CPU online.
+ *
+ * Note that it is permissible to omit this call entirely, as is
+ * done in architectures that do no CPU-hotplug error checking.
+ */
+void cpu_set_state_online(int cpu)
+{
+	(void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+/*
+ * Wait for the specified CPU to exit the idle loop and die.
+ */
+bool cpu_wait_death(unsigned int cpu, int seconds)
+{
+	int jf_left = seconds * HZ;
+	int oldstate;
+	bool ret = true;
+	int sleep_jf = 1;
+
+	might_sleep();
+
+	/* The outgoing CPU will normally get done quite quickly. */
+	if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
+		goto update_state;
+	udelay(5);
+
+	/* But if the outgoing CPU dawdles, wait increasingly long times. */
+	while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
+		schedule_timeout_uninterruptible(sleep_jf);
+		jf_left -= sleep_jf;
+		if (jf_left <= 0)
+			break;
+		sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
+	}
+update_state:
+	oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+	if (oldstate == CPU_DEAD) {
+		/* Outgoing CPU died normally, update state. */
+		smp_mb(); /* atomic_read() before update. */
+		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
+	} else {
+		/* Outgoing CPU still hasn't died, set state accordingly. */
+		if (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
+				   oldstate, CPU_BROKEN) != oldstate)
+			goto update_state;
+		ret = false;
+	}
+	return ret;
+}
+
+/*
+ * Called by the outgoing CPU to report its successful death.  Return
+ * false if this report follows the surviving CPU's timing out.
+ *
+ * A separate "CPU_TASKS_FROZEN | CPU_DEAD" is used when the surviving
+ * CPU timed out.  This approach allows architectures to omit calls to
+ * cpu_check_up_prepare() and cpu_set_state_online() without defeating
+ * the next cpu_wait_death()'s polling loop.
+ */
+bool cpu_report_death(void)
+{
+	int oldstate;
+	int newstate;
+	int cpu = smp_processor_id();
+
+	do {
+		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
+		if (oldstate == CPU_ONLINE)
+			newstate = CPU_DEAD;
+		else
+			newstate = CPU_TASKS_FROZEN | CPU_DEAD;
+	} while (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
+				oldstate, newstate) != oldstate);
+	return newstate == CPU_DEAD;
+}
+
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` Paul E. McKenney
  (?)
@ 2015-03-03 17:42   ` Paul E. McKenney
  2015-03-03 19:17     ` Boris Ostrovsky
  2015-03-03 19:17     ` Boris Ostrovsky
  -1 siblings, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, x86, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Vrabel, xen-devel

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit removes the open-coded CPU-offline notification with new
common code.  Among other things, this change avoids calling scheduler
code using RCU from an offline CPU that RCU is ignoring.  It also allows
Xen to notice at online time that the CPU did not go offline correctly.
Note that Xen has the surviving CPU carry out some cleanup operations,
so if the surviving CPU times out, these cleanup operations might have
been carried out while the outgoing CPU was still running.  It might
therefore be unwise to bring this CPU back online, and this commit
avoids doing so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <x86@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
---
 arch/x86/include/asm/cpu.h |  2 --
 arch/x86/include/asm/smp.h |  1 -
 arch/x86/kernel/smpboot.c  | 29 ++++++++---------------------
 arch/x86/xen/smp.c         | 18 ++++++++----------
 4 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index d2b12988d2ed..bf2caa1dedc5 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -34,8 +34,6 @@ extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
 #endif
 
-DECLARE_PER_CPU(int, cpu_state);
-
 int mwait_usable(const struct cpuinfo_x86 *);
 
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 8cd1cc3bc835..8cd27e08e23c 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -150,7 +150,6 @@ static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 }
 
 void cpu_disable_common(void);
-void cpu_die_common(unsigned int cpu);
 void native_smp_prepare_boot_cpu(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void native_smp_cpus_done(unsigned int max_cpus);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index febc6aabc72e..ff24fbd17fe7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -77,9 +77,6 @@
 #include <asm/realmode.h>
 #include <asm/misc.h>
 
-/* State of each CPU */
-DEFINE_PER_CPU(int, cpu_state) = { 0 };
-
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
@@ -257,7 +254,7 @@ static void notrace start_secondary(void *unused)
 	lock_vector_lock();
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
-	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+	cpu_set_state_online(smp_processor_id());
 	x86_platform.nmi_init();
 
 	/* enable local interrupts */
@@ -948,7 +945,10 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 */
 	mtrr_save_state();
 
-	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+	/* x86 CPUs take themselves offline, so delayed offline is OK. */
+	err = cpu_check_up_prepare(cpu);
+	if (err && err != -EBUSY)
+		return err;
 
 	/* the FPU context is blank, nobody can own it */
 	__cpu_disable_lazy_restore(cpu);
@@ -1191,7 +1191,7 @@ void __init native_smp_prepare_boot_cpu(void)
 	switch_to_new_gdt(me);
 	/* already set me in cpu_online_mask in boot_cpu_init() */
 	cpumask_set_cpu(me, cpu_callout_mask);
-	per_cpu(cpu_state, me) = CPU_ONLINE;
+	cpu_set_state_online(me);
 }
 
 void __init native_smp_cpus_done(unsigned int max_cpus)
@@ -1318,14 +1318,10 @@ static void __ref remove_cpu_from_maps(int cpu)
 	numa_remove_cpu(cpu);
 }
 
-static DEFINE_PER_CPU(struct completion, die_complete);
-
 void cpu_disable_common(void)
 {
 	int cpu = smp_processor_id();
 
-	init_completion(&per_cpu(die_complete, smp_processor_id()));
-
 	remove_siblinginfo(cpu);
 
 	/* It's now safe to remove this processor from the online map */
@@ -1349,19 +1345,12 @@ int native_cpu_disable(void)
 	return 0;
 }
 
-void cpu_die_common(unsigned int cpu)
-{
-	wait_for_completion_timeout(&per_cpu(die_complete, cpu), HZ);
-}
-
 void native_cpu_die(unsigned int cpu)
 {
 	/* We don't do anything here: idle task is faking death itself. */
 
-	cpu_die_common(cpu);
-
 	/* They ack this in play_dead() by setting CPU_DEAD */
-	if (per_cpu(cpu_state, cpu) == CPU_DEAD) {
+	if (cpu_wait_death(cpu, 5)) {
 		if (system_state == SYSTEM_RUNNING)
 			pr_info("CPU %u is now offline\n", cpu);
 	} else {
@@ -1375,10 +1364,8 @@ void play_dead_common(void)
 	reset_lazy_tlbstate();
 	amd_e400_remove_cpu(raw_smp_processor_id());
 
-	mb();
 	/* Ack it */
-	__this_cpu_write(cpu_state, CPU_DEAD);
-	complete(&per_cpu(die_complete, smp_processor_id()));
+	(void)cpu_report_death();
 
 	/*
 	 * With physical CPU hotplug, we should halt the cpu
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 08e8489c47f1..e2c7389c58c5 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -90,14 +90,10 @@ static void cpu_bringup(void)
 
 	set_cpu_online(cpu, true);
 
-	this_cpu_write(cpu_state, CPU_ONLINE);
-
-	wmb();
+	cpu_set_state_online(cpu);  /* Implies full memory barrier. */
 
 	/* We can take interrupts now: we're officially "up". */
 	local_irq_enable();
-
-	wmb();			/* make sure everything is out */
 }
 
 /*
@@ -459,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
 	xen_setup_timer(cpu);
 	xen_init_lock_cpu(cpu);
 
-	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
+	rc = cpu_check_up_prepare(cpu);
+	if (rc)
+		return rc;
 
 	/* make sure interrupts start blocked */
 	per_cpu(xen_vcpu, cpu)->evtchn_upcall_mask = 1;
@@ -479,10 +478,8 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
 	rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
 	BUG_ON(rc);
 
-	while(per_cpu(cpu_state, cpu) != CPU_ONLINE) {
+	while (cpu_report_state(cpu) != CPU_ONLINE)
 		HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
-		barrier();
-	}
 
 	return 0;
 }
@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
 		schedule_timeout(HZ/10);
 	}
 
-	cpu_die_common(cpu);
+	(void)cpu_wait_death(cpu, 5);
+	/* FIXME: Are the below calls really safe in case of timeout? */
 
 	xen_smp_intr_free(cpu);
 	xen_uninit_lock_cpu(cpu);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` Paul E. McKenney
  (?)
  (?)
@ 2015-03-03 17:42   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Boris Ostrovsky, tglx, laijs, bobby.prani, peterz, fweisbec,
	dvhart, x86, josh, rostedt, oleg, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	Paul E. McKenney, mingo

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit removes the open-coded CPU-offline notification with new
common code.  Among other things, this change avoids calling scheduler
code using RCU from an offline CPU that RCU is ignoring.  It also allows
Xen to notice at online time that the CPU did not go offline correctly.
Note that Xen has the surviving CPU carry out some cleanup operations,
so if the surviving CPU times out, these cleanup operations might have
been carried out while the outgoing CPU was still running.  It might
therefore be unwise to bring this CPU back online, and this commit
avoids doing so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <x86@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
---
 arch/x86/include/asm/cpu.h |  2 --
 arch/x86/include/asm/smp.h |  1 -
 arch/x86/kernel/smpboot.c  | 29 ++++++++---------------------
 arch/x86/xen/smp.c         | 18 ++++++++----------
 4 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index d2b12988d2ed..bf2caa1dedc5 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -34,8 +34,6 @@ extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
 #endif
 
-DECLARE_PER_CPU(int, cpu_state);
-
 int mwait_usable(const struct cpuinfo_x86 *);
 
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 8cd1cc3bc835..8cd27e08e23c 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -150,7 +150,6 @@ static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 }
 
 void cpu_disable_common(void);
-void cpu_die_common(unsigned int cpu);
 void native_smp_prepare_boot_cpu(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void native_smp_cpus_done(unsigned int max_cpus);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index febc6aabc72e..ff24fbd17fe7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -77,9 +77,6 @@
 #include <asm/realmode.h>
 #include <asm/misc.h>
 
-/* State of each CPU */
-DEFINE_PER_CPU(int, cpu_state) = { 0 };
-
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
@@ -257,7 +254,7 @@ static void notrace start_secondary(void *unused)
 	lock_vector_lock();
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
-	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+	cpu_set_state_online(smp_processor_id());
 	x86_platform.nmi_init();
 
 	/* enable local interrupts */
@@ -948,7 +945,10 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 */
 	mtrr_save_state();
 
-	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+	/* x86 CPUs take themselves offline, so delayed offline is OK. */
+	err = cpu_check_up_prepare(cpu);
+	if (err && err != -EBUSY)
+		return err;
 
 	/* the FPU context is blank, nobody can own it */
 	__cpu_disable_lazy_restore(cpu);
@@ -1191,7 +1191,7 @@ void __init native_smp_prepare_boot_cpu(void)
 	switch_to_new_gdt(me);
 	/* already set me in cpu_online_mask in boot_cpu_init() */
 	cpumask_set_cpu(me, cpu_callout_mask);
-	per_cpu(cpu_state, me) = CPU_ONLINE;
+	cpu_set_state_online(me);
 }
 
 void __init native_smp_cpus_done(unsigned int max_cpus)
@@ -1318,14 +1318,10 @@ static void __ref remove_cpu_from_maps(int cpu)
 	numa_remove_cpu(cpu);
 }
 
-static DEFINE_PER_CPU(struct completion, die_complete);
-
 void cpu_disable_common(void)
 {
 	int cpu = smp_processor_id();
 
-	init_completion(&per_cpu(die_complete, smp_processor_id()));
-
 	remove_siblinginfo(cpu);
 
 	/* It's now safe to remove this processor from the online map */
@@ -1349,19 +1345,12 @@ int native_cpu_disable(void)
 	return 0;
 }
 
-void cpu_die_common(unsigned int cpu)
-{
-	wait_for_completion_timeout(&per_cpu(die_complete, cpu), HZ);
-}
-
 void native_cpu_die(unsigned int cpu)
 {
 	/* We don't do anything here: idle task is faking death itself. */
 
-	cpu_die_common(cpu);
-
 	/* They ack this in play_dead() by setting CPU_DEAD */
-	if (per_cpu(cpu_state, cpu) == CPU_DEAD) {
+	if (cpu_wait_death(cpu, 5)) {
 		if (system_state == SYSTEM_RUNNING)
 			pr_info("CPU %u is now offline\n", cpu);
 	} else {
@@ -1375,10 +1364,8 @@ void play_dead_common(void)
 	reset_lazy_tlbstate();
 	amd_e400_remove_cpu(raw_smp_processor_id());
 
-	mb();
 	/* Ack it */
-	__this_cpu_write(cpu_state, CPU_DEAD);
-	complete(&per_cpu(die_complete, smp_processor_id()));
+	(void)cpu_report_death();
 
 	/*
 	 * With physical CPU hotplug, we should halt the cpu
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 08e8489c47f1..e2c7389c58c5 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -90,14 +90,10 @@ static void cpu_bringup(void)
 
 	set_cpu_online(cpu, true);
 
-	this_cpu_write(cpu_state, CPU_ONLINE);
-
-	wmb();
+	cpu_set_state_online(cpu);  /* Implies full memory barrier. */
 
 	/* We can take interrupts now: we're officially "up". */
 	local_irq_enable();
-
-	wmb();			/* make sure everything is out */
 }
 
 /*
@@ -459,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
 	xen_setup_timer(cpu);
 	xen_init_lock_cpu(cpu);
 
-	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
+	rc = cpu_check_up_prepare(cpu);
+	if (rc)
+		return rc;
 
 	/* make sure interrupts start blocked */
 	per_cpu(xen_vcpu, cpu)->evtchn_upcall_mask = 1;
@@ -479,10 +478,8 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
 	rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
 	BUG_ON(rc);
 
-	while(per_cpu(cpu_state, cpu) != CPU_ONLINE) {
+	while (cpu_report_state(cpu) != CPU_ONLINE)
 		HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
-		barrier();
-	}
 
 	return 0;
 }
@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
 		schedule_timeout(HZ/10);
 	}
 
-	cpu_die_common(cpu);
+	(void)cpu_wait_death(cpu, 5);
+	/* FIXME: Are the below calls really safe in case of timeout? */
 
 	xen_smp_intr_free(cpu);
 	xen_uninit_lock_cpu(cpu);
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 03/20] blackfin: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (2 preceding siblings ...)
  (?)
@ 2015-03-03 17:42   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, Steven Miao, adi-buildroot-devel

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit removes the open-coded CPU-offline notification with new
common code.  This change avoids calling scheduler code using RCU from
an offline CPU that RCU is ignoring.  This commit is compatible with
the existing code in not checking for timeout during a prior offline
for a given CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: <adi-buildroot-devel@lists.sourceforge.net>
---
 arch/blackfin/mach-common/smp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index 8ad3e90cc8fc..1c7259597395 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -413,16 +413,14 @@ int __cpu_disable(void)
 	return 0;
 }
 
-static DECLARE_COMPLETION(cpu_killed);
-
 int __cpu_die(unsigned int cpu)
 {
-	return wait_for_completion_timeout(&cpu_killed, 5000);
+	return cpu_wait_death(cpu, 5);
 }
 
 void cpu_die(void)
 {
-	complete(&cpu_killed);
+	(void)cpu_report_death();
 
 	atomic_dec(&init_mm.mm_users);
 	atomic_dec(&init_mm.mm_count);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` Paul E. McKenney
@ 2015-03-03 17:42     ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, James Hogan, linux-metag

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit removes the open-coded CPU-offline notification with new
common code.  This change avoids calling scheduler code using RCU from
an offline CPU that RCU is ignoring.  This commit is compatible with
the existing code in not checking for timeout during a prior offline
for a given CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: <linux-metag@vger.kernel.org>
---
 arch/metag/kernel/smp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
index f006d2276f40..ac3a199e33e7 100644
--- a/arch/metag/kernel/smp.c
+++ b/arch/metag/kernel/smp.c
@@ -261,7 +261,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static DECLARE_COMPLETION(cpu_killed);
 
 /*
  * __cpu_disable runs on the processor to be shutdown.
@@ -299,7 +298,7 @@ int __cpu_disable(void)
  */
 void __cpu_die(unsigned int cpu)
 {
-	if (!wait_for_completion_timeout(&cpu_killed, msecs_to_jiffies(1)))
+	if (!cpu_wait_death(cpu, 1))
 		pr_err("CPU%u: unable to kill\n", cpu);
 }
 
@@ -314,7 +313,7 @@ void cpu_die(void)
 	local_irq_disable();
 	idle_task_exit();
 
-	complete(&cpu_killed);
+	(void)cpu_report_death();
 
 	asm ("XOR	TXENABLE, D0Re0,D0Re0\n");
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-03 17:42     ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney, James Hogan, linux-metag

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit removes the open-coded CPU-offline notification with new
common code.  This change avoids calling scheduler code using RCU from
an offline CPU that RCU is ignoring.  This commit is compatible with
the existing code in not checking for timeout during a prior offline
for a given CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: <linux-metag@vger.kernel.org>
---
 arch/metag/kernel/smp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
index f006d2276f40..ac3a199e33e7 100644
--- a/arch/metag/kernel/smp.c
+++ b/arch/metag/kernel/smp.c
@@ -261,7 +261,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static DECLARE_COMPLETION(cpu_killed);
 
 /*
  * __cpu_disable runs on the processor to be shutdown.
@@ -299,7 +298,7 @@ int __cpu_disable(void)
  */
 void __cpu_die(unsigned int cpu)
 {
-	if (!wait_for_completion_timeout(&cpu_killed, msecs_to_jiffies(1)))
+	if (!cpu_wait_death(cpu, 1))
 		pr_err("CPU%u: unable to kill\n", cpu);
 }
 
@@ -314,7 +313,7 @@ void cpu_die(void)
 	local_irq_disable();
 	idle_task_exit();
 
-	complete(&cpu_killed);
+	(void)cpu_report_death();
 
 	asm ("XOR	TXENABLE, D0Re0,D0Re0\n");
 }
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 05/20] rcu: Consolidate offline-CPU callback initialization
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (4 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage()
initialize the outgoing CPU's callback list.  However, only
rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and
it does so unconditionally, which means that only one of these
initializations is required.  This commit therefore consolidates the
callback-list initialization with the rest of the callback handling in
rcu_send_cbs_to_orphanage().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 48d640ca1a05..8e020c59ecfd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2256,8 +2256,12 @@ rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp,
 		rsp->orphan_donetail = rdp->nxttail[RCU_DONE_TAIL];
 	}
 
-	/* Finally, initialize the rcu_data structure's list to empty.  */
+	/*
+	 * Finally, initialize the rcu_data structure's list to empty and
+	 * disallow further callbacks on this CPU.
+	 */
 	init_callback_list(rdp);
+	rdp->nxttail[RCU_NEXT_TAIL] = NULL;
 }
 
 /*
@@ -2398,9 +2402,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL,
 		  "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, nxtlist=%p\n",
 		  cpu, rdp->qlen, rdp->nxtlist);
-	init_callback_list(rdp);
-	/* Disallow further callbacks on this CPU. */
-	rdp->nxttail[RCU_NEXT_TAIL] = NULL;
 	mutex_unlock(&rsp->onoff_mutex);
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 06/20] rcu: Put all orphan-callback-related code under same comment
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (5 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8e020c59ecfd..98da632d1d49 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2385,9 +2385,9 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 
 	/* Exclude any attempts to start a new grace period. */
 	mutex_lock(&rsp->onoff_mutex);
-	raw_spin_lock_irqsave(&rsp->orphan_lock, flags);
 
 	/* Orphan the dead CPU's callbacks, and adopt them if appropriate. */
+	raw_spin_lock_irqsave(&rsp->orphan_lock, flags);
 	rcu_send_cbs_to_orphanage(cpu, rsp, rnp, rdp);
 	rcu_adopt_orphan_cbs(rsp, flags);
 	raw_spin_unlock_irqrestore(&rsp->orphan_lock, flags);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 07/20] rcu: Simplify sync_rcu_preempt_exp_init()
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (6 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit eliminates a boolean and associated "if" statement by
rearranging the code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0a571e9a0f1d..d37c9fbdba71 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -677,19 +677,16 @@ static void
 sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
 {
 	unsigned long flags;
-	int must_wait = 0;
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
 	if (!rcu_preempt_has_tasks(rnp)) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		rcu_report_exp_rnp(rsp, rnp, false); /* No tasks, report. */
 	} else {
 		rnp->exp_tasks = rnp->blkd_tasks.next;
 		rcu_initiate_boost(rnp, flags);  /* releases rnp->lock */
-		must_wait = 1;
 	}
-	if (!must_wait)
-		rcu_report_exp_rnp(rsp, rnp, false); /* Don't wake self. */
 }
 
 /**
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 08/20] rcu: Eliminate empty HOTPLUG_CPU ifdef
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (7 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d37c9fbdba71..79376e2461c9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -520,10 +520,6 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
 	WARN_ON_ONCE(rnp->qsmask);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 /*
  * Check for a quiescent state from the current CPU.  When a task blocks,
  * the task is recorded in the corresponding CPU's rcu_node structure,
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 09/20] rcu: Detect stalls caused by failure to propagate up rcu_node tree
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (8 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

If all CPUs have passed through quiescent states, then stalls might be
due to starvation of the grace-period kthread or to failure to propagate
the quiescent states up the rcu_node combining tree.  The current stall
warning messages do not differentiate, so this commit adds a printout
of the root rcu_node structure's ->qsmask field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 98da632d1d49..3b7e4133ca99 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1196,9 +1196,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 		} else {
 			j = jiffies;
 			gpa = ACCESS_ONCE(rsp->gp_activity);
-			pr_err("All QSes seen, last %s kthread activity %ld (%ld-%ld), jiffies_till_next_fqs=%ld\n",
+			pr_err("All QSes seen, last %s kthread activity %ld (%ld-%ld), jiffies_till_next_fqs=%ld, root ->qsmask %#lx\n",
 			       rsp->name, j - gpa, j, gpa,
-			       jiffies_till_next_fqs);
+			       jiffies_till_next_fqs,
+			       rcu_get_root(rsp)->qsmask);
 			/* In this case, the current CPU might be at fault. */
 			sched_show_task(current);
 		}
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 10/20] rcu: Provide diagnostic option to slow down grace-period initialization
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (9 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  2015-03-04 10:54     ` Paul Bolle
  -1 siblings, 1 reply; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Grace-period initialization normally proceeds quite quickly, so
that it is very difficult to reproduce races against grace-period
initialization.  This commit therefore allows grace-period
initialization to be artificially slowed down, increasing
race-reproduction probability.  A pair of new Kconfig parameters are
provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and
CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies
of slowdown to apply.  A boot-time parameter named rcutree.gp_init_delay
allows boot-time delay to be specified.  By default, no delay will be
applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt |  6 ++++++
 kernel/rcu/tree.c                   | 10 ++++++++++
 lib/Kconfig.debug                   | 24 ++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..94de410ec341 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2968,6 +2968,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Set maximum number of finished RCU callbacks to
 			process in one batch.
 
+	rcutree.gp_init_delay=	[KNL]
+			Set the number of jiffies to delay each step of
+			RCU grace-period initialization.  This only has
+			effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT is
+			set.
+
 	rcutree.rcu_fanout_leaf= [KNL]
 			Increase the number of CPUs assigned to each
 			leaf rcu_node structure.  Useful for very large
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3b7e4133ca99..b42001fd55fb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -160,6 +160,12 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 static int kthread_prio = CONFIG_RCU_KTHREAD_PRIO;
 module_param(kthread_prio, int, 0644);
 
+/* Delay in jiffies for grace-period initialization delays. */
+static int gp_init_delay = IS_ENABLED(CONFIG_RCU_TORTURE_TEST_SLOW_INIT)
+				? CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY
+				: 0;
+module_param(gp_init_delay, int, 0644);
+
 /*
  * Track the rcutorture test sequence number and the update version
  * number within a given test.  The rcutorture_testseq is incremented
@@ -1769,6 +1775,10 @@ static int rcu_gp_init(struct rcu_state *rsp)
 		raw_spin_unlock_irq(&rnp->lock);
 		cond_resched_rcu_qs();
 		ACCESS_ONCE(rsp->gp_activity) = jiffies;
+		if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_SLOW_INIT) &&
+		    gp_init_delay > 0 &&
+		    !(rsp->gpnum % (rcu_num_nodes * 10)))
+			schedule_timeout_uninterruptible(gp_init_delay);
 	}
 
 	mutex_unlock(&rsp->onoff_mutex);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c5cefb3c009c..4ddfc5d09161 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1257,6 +1257,30 @@ config RCU_TORTURE_TEST_RUNNABLE
 	  Say N here if you want the RCU torture tests to start only
 	  after being manually enabled via /proc.
 
+config RCU_TORTURE_TEST_SLOW_INIT
+	bool "Slow down RCU grace-period initialization to expose races"
+	depends on RCU_TORTURE_TEST
+	help
+	  This option makes grace-period initialization block for a
+	  few jiffies between initializing each pair of consecutive
+	  rcu_node structures.	This helps to expose races involving
+	  grace-period initialization, in other words, it makes your
+	  kernel less stable.  It can also greatly increase grace-period
+	  latency, especially on systems with large numbers of CPUs.
+	  This is useful when torture-testing RCU, but in almost no
+	  other circumstance.
+
+	  Say Y here if you want your system to crash and hang more often.
+	  Say N if you want a sane system.
+
+config RCU_TORTURE_TEST_SLOW_INIT_DELAY
+	int "How must to slow down RCU grace-period initialization"
+	range 0 5
+	default 0
+	help
+	  This option specifies the number of jiffies to wait between
+	  each rcu_node structure initialization.
+
 config RCU_CPU_STALL_TIMEOUT
 	int "RCU CPU stall timeout in seconds"
 	depends on RCU_STALL_COMMON
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 11/20] rcutorture: Enable slow grace-period initializations
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (10 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit sets CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y, but leaves the
default time zero.  This can be overridden by passing the
"--bootargs rcutree.gp_init_delay=1" argument to kvm.sh.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 tools/testing/selftests/rcutorture/configs/rcu/CFcommon | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
index d2d2a86139db..49701218dc62 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
@@ -1,2 +1,3 @@
 CONFIG_RCU_TORTURE_TEST=y
 CONFIG_PRINTK_TIME=y
+CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 12/20] rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (11 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Offline CPUs cannot safely invoke trace events, but such CPUs do execute
within rcu_cpu_notify().  Therefore, this commit removes the trace events
from rcu_cpu_notify().  These trace events are for utilization, against
which rcu_cpu_notify() execution time should be negligible.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b42001fd55fb..a7151d26b940 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3629,7 +3629,6 @@ static int rcu_cpu_notify(struct notifier_block *self,
 	struct rcu_node *rnp = rdp->mynode;
 	struct rcu_state *rsp;
 
-	trace_rcu_utilization(TPS("Start CPU hotplug"));
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
@@ -3661,7 +3660,6 @@ static int rcu_cpu_notify(struct notifier_block *self,
 	default:
 		break;
 	}
-	trace_rcu_utilization(TPS("End CPU hotplug"));
 	return NOTIFY_OK;
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 13/20] rcu: Rework preemptible expedited bitmask handling
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (12 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Currently, the rcu_node tree ->expmask bitmasks are initially set to
reflect the online CPUs.  This is pointless, because only the CPUs
preempted within RCU read-side critical sections by the preceding
synchronize_sched_expedited() need to be tracked.  This commit therefore
instead sets up these bitmasks based on the state of the ->blkd_tasks
lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 50 ++++++++++++++++++++++++++----------------------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 79376e2461c9..ce8fb810770c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -98,8 +98,7 @@ RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu);
 static struct rcu_state *rcu_state_p = &rcu_preempt_state;
 
 static int rcu_preempted_readers_exp(struct rcu_node *rnp);
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
-			       bool wake);
+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp);
 
 /*
  * Tell them what RCU they are running.
@@ -415,7 +414,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 		 * then we need to report up the rcu_node hierarchy.
 		 */
 		if (!empty_exp && empty_exp_now)
-			rcu_report_exp_rnp(&rcu_preempt_state, rnp, true);
+			rcu_report_exp_rnp(&rcu_preempt_state, rnp);
 	} else {
 		local_irq_restore(flags);
 	}
@@ -626,13 +625,9 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
- * Most callers will set the "wake" flag, but the task initiating the
- * expedited grace period need not wake itself.
- *
  * Caller must hold sync_rcu_preempt_exp_mutex.
  */
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
-			       bool wake)
+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 {
 	unsigned long flags;
 	unsigned long mask;
@@ -646,10 +641,8 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 		}
 		if (rnp->parent == NULL) {
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
-			if (wake) {
-				smp_mb(); /* EGP done before wake_up(). */
-				wake_up(&sync_rcu_preempt_exp_wq);
-			}
+			smp_mb(); /* EGP done before wake_up(). */
+			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
 		mask = rnp->grpmask;
@@ -663,8 +656,11 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 
 /*
  * Snapshot the tasks blocking the newly started preemptible-RCU expedited
- * grace period for the specified rcu_node structure.  If there are no such
- * tasks, report it up the rcu_node hierarchy.
+ * grace period for the specified rcu_node structure.  If there are such
+ * tasks, set the ->exp_mask bits up the rcu_node tree.  Note that we
+ * do not set the ->exp_mask bits on the leaf rcu_node structures, because:
+ * (1) No particular CPU is blocking us and (2) A non-NULL ->exp_tasks
+ * pointer tells us that a given leaf rcu_node is blocked by some tasks.
  *
  * Caller must hold sync_rcu_preempt_exp_mutex and must exclude
  * CPU hotplug operations.
@@ -673,14 +669,27 @@ static void
 sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
 {
 	unsigned long flags;
+	unsigned long mask;
+	struct rcu_node *rnp_up;
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();
 	if (!rcu_preempt_has_tasks(rnp)) {
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		rcu_report_exp_rnp(rsp, rnp, false); /* No tasks, report. */
 	} else {
+		/* Propagate ->expmask bits up the rcu_node tree. */
 		rnp->exp_tasks = rnp->blkd_tasks.next;
+		rnp_up = rnp;
+		while (rnp_up->parent) {
+			mask = rnp_up->grpmask;
+			rnp_up = rnp_up->parent;
+			if (ACCESS_ONCE(rnp_up->expmask) & mask)
+				break;
+			raw_spin_lock(&rnp_up->lock); /* irqs already off */
+			smp_mb__after_unlock_lock();
+			rnp_up->expmask |= mask;
+			raw_spin_unlock(&rnp_up->lock); /* irqs still off */
+		}
 		rcu_initiate_boost(rnp, flags);  /* releases rnp->lock */
 	}
 }
@@ -699,7 +708,6 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
  */
 void synchronize_rcu_expedited(void)
 {
-	unsigned long flags;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_preempt_state;
 	unsigned long snap;
@@ -750,13 +758,9 @@ void synchronize_rcu_expedited(void)
 	/* force all RCU readers onto ->blkd_tasks lists. */
 	synchronize_sched_expedited();
 
-	/* Initialize ->expmask for all non-leaf rcu_node structures. */
-	rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) {
-		raw_spin_lock_irqsave(&rnp->lock, flags);
-		smp_mb__after_unlock_lock();
-		rnp->expmask = rnp->qsmaskinit;
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-	}
+	/* Check initial state.  Later under CONFIG_PROVE_RCU. */
+	rcu_for_each_node_breadth_first(rsp, rnp)
+		WARN_ON_ONCE(rnp->expmask);
 
 	/* Snapshot current state of ->blkd_tasks lists. */
 	rcu_for_each_leaf_node(rsp, rnp)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 14/20] rcu: Move rcu_report_unblock_qs_rnp() to common code
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (13 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The rcu_report_unblock_qs_rnp() function is invoked when the
last task blocking the current grace period exits its outermost
RCU read-side critical section.  Previously, this was called only
from rcu_read_unlock_special(), and was therefore defined only when
CONFIG_RCU_PREEMPT=y.  However, this function will be invoked even when
CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
the beginnings of RCU grace periods.  The reason for this change is that
the last task on a given leaf rcu_node structure's ->blkd_tasks list
might well exit its RCU read-side critical section between the time that
recent CPU-hotplug operations were applied and when the new grace period
was initialized.  This situation could result in RCU waiting forever on
that leaf rcu_node structure, because if all that structure's CPUs were
already offline, there would be no quiescent-state events to drive that
structure's part of the grace period.

This commit therefore moves rcu_report_unblock_qs_rnp() to common code
that is built unconditionally so that the quiescent-state-forcing code
can clean up after this situation, avoiding the grace-period stall.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 39 +++++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree_plugin.h | 40 ++--------------------------------------
 2 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a7151d26b940..5b5cb1ff73ed 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2127,6 +2127,45 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
 }
 
 /*
+ * Record a quiescent state for all tasks that were previously queued
+ * on the specified rcu_node structure and that were blocking the current
+ * RCU grace period.  The caller must hold the specified rnp->lock with
+ * irqs disabled, and this lock is released upon return, but irqs remain
+ * disabled.
+ */
+static void __maybe_unused rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
+				      struct rcu_node *rnp, unsigned long flags)
+	__releases(rnp->lock)
+{
+	unsigned long mask;
+	struct rcu_node *rnp_p;
+
+	WARN_ON_ONCE(rsp == &rcu_bh_state || rsp == &rcu_sched_state);
+	if (rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		return;  /* Still need more quiescent states! */
+	}
+
+	rnp_p = rnp->parent;
+	if (rnp_p == NULL) {
+		/*
+		 * Either there is only one rcu_node in the tree,
+		 * or tasks were kicked up to root rcu_node due to
+		 * CPUs going offline.
+		 */
+		rcu_report_qs_rsp(rsp, flags);
+		return;
+	}
+
+	/* Report up the rest of the hierarchy. */
+	mask = rnp->grpmask;
+	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
+	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
+	smp_mb__after_unlock_lock();
+	rcu_report_qs_rnp(mask, rsp, rnp_p, flags);
+}
+
+/*
  * Record a quiescent state for the specified CPU to that CPU's rcu_data
  * structure.  This must be either called from the specified CPU, or
  * called when the specified CPU is known to be offline (and when it is
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ce8fb810770c..abccde8b893c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -232,43 +232,6 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
 }
 
 /*
- * Record a quiescent state for all tasks that were previously queued
- * on the specified rcu_node structure and that were blocking the current
- * RCU grace period.  The caller must hold the specified rnp->lock with
- * irqs disabled, and this lock is released upon return, but irqs remain
- * disabled.
- */
-static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
-	__releases(rnp->lock)
-{
-	unsigned long mask;
-	struct rcu_node *rnp_p;
-
-	if (rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		return;  /* Still need more quiescent states! */
-	}
-
-	rnp_p = rnp->parent;
-	if (rnp_p == NULL) {
-		/*
-		 * Either there is only one rcu_node in the tree,
-		 * or tasks were kicked up to root rcu_node due to
-		 * CPUs going offline.
-		 */
-		rcu_report_qs_rsp(&rcu_preempt_state, flags);
-		return;
-	}
-
-	/* Report up the rest of the hierarchy. */
-	mask = rnp->grpmask;
-	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
-	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
-	smp_mb__after_unlock_lock();
-	rcu_report_qs_rnp(mask, &rcu_preempt_state, rnp_p, flags);
-}
-
-/*
  * Advance a ->blkd_tasks-list pointer to the next entry, instead
  * returning NULL if at the end of the list.
  */
@@ -398,7 +361,8 @@ void rcu_read_unlock_special(struct task_struct *t)
 							 rnp->grplo,
 							 rnp->grphi,
 							 !!rnp->gp_tasks);
-			rcu_report_unblock_qs_rnp(rnp, flags);
+			rcu_report_unblock_qs_rnp(&rcu_preempt_state,
+						  rnp, flags);
 		} else {
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		}
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 15/20] rcu: Process offlining and onlining only at grace-period start
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (14 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Races between CPU hotplug and grace periods can be difficult to resolve,
so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
this means that it is impossible for an outgoing CPU to perform the
last bits of its offlining from its last pass through the idle loop,
because sleeplocks cannot be acquired in that context.

This commit avoids these problems by buffering online and offline events
in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
grace period starts, the events accumulated in this mask are applied to
the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
case of all CPUs corresponding to a given leaf rcu_node structure being
offline while there are still elements in that structure's ->blkd_tasks
list is handled using a new ->wait_blkd_tasks field.  In this case,
propagating the offline bits up the tree is deferred until the beginning
of the grace period after all of the tasks have exited their RCU read-side
critical sections and removed themselves from the list, at which point
the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
structure's CPUs comes back online before the list empties, then the
->wait_blkd_tasks flag is simply cleared.

This of course means that RCU's notion of which CPUs are offline can be
out of date.  This is OK because RCU need only wait on CPUs that were
online at the time that the grace period started.  In addition, RCU's
force-quiescent-state actions will handle the case where a CPU goes
offline after the grace period starts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 154 +++++++++++++++++++++++++++++++++++++----------
 kernel/rcu/tree.h        |   9 +++
 kernel/rcu/tree_plugin.h |  22 ++-----
 kernel/rcu/tree_trace.c  |   4 +-
 4 files changed, 136 insertions(+), 53 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5b5cb1ff73ed..f0f4d3510d24 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -152,6 +152,8 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
  */
 static int rcu_scheduler_fully_active __read_mostly;
 
+static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
+static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
@@ -179,6 +181,17 @@ unsigned long rcutorture_testseq;
 unsigned long rcutorture_vernum;
 
 /*
+ * Compute the mask of online CPUs for the specified rcu_node structure.
+ * This will not be stable unless the rcu_node structure's ->lock is
+ * held, but the bit corresponding to the current CPU will be stable
+ * in most contexts.
+ */
+unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
+{
+	return ACCESS_ONCE(rnp->qsmaskinitnext);
+}
+
+/*
  * Return true if an RCU grace period is in progress.  The ACCESS_ONCE()s
  * permit this function to be invoked without holding the root rcu_node
  * structure's ->lock, but of course results can be subject to change.
@@ -960,7 +973,7 @@ bool rcu_lockdep_current_cpu_online(void)
 	preempt_disable();
 	rdp = this_cpu_ptr(&rcu_sched_data);
 	rnp = rdp->mynode;
-	ret = (rdp->grpmask & rnp->qsmaskinit) ||
+	ret = (rdp->grpmask & rcu_rnp_online_cpus(rnp)) ||
 	      !rcu_scheduler_fully_active;
 	preempt_enable();
 	return ret;
@@ -1710,6 +1723,7 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
  */
 static int rcu_gp_init(struct rcu_state *rsp)
 {
+	unsigned long oldmask;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
@@ -1745,6 +1759,55 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	smp_mb__after_unlock_lock(); /* ->gpnum increment before GP! */
 
 	/*
+	 * Apply per-leaf buffered online and offline operations to the
+	 * rcu_node tree.  Note that this new grace period need not wait
+	 * for subsequent online CPUs, and that quiescent-state forcing
+	 * will handle subsequent offline CPUs.
+	 */
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irq(&rnp->lock);
+		smp_mb__after_unlock_lock();
+		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
+		    !rnp->wait_blkd_tasks) {
+			/* Nothing to do on this leaf rcu_node structure. */
+			raw_spin_unlock_irq(&rnp->lock);
+			continue;
+		}
+
+		/* Record old state, apply changes to ->qsmaskinit field. */
+		oldmask = rnp->qsmaskinit;
+		rnp->qsmaskinit = rnp->qsmaskinitnext;
+
+		/* If zero-ness of ->qsmaskinit changed, propagate up tree. */
+		if (!oldmask != !rnp->qsmaskinit) {
+			if (!oldmask) /* First online CPU for this rcu_node. */
+				rcu_init_new_rnp(rnp);
+			else if (rcu_preempt_has_tasks(rnp)) /* blocked tasks */
+				rnp->wait_blkd_tasks = true;
+			else /* Last offline CPU and can propagate. */
+				rcu_cleanup_dead_rnp(rnp);
+		}
+
+		/*
+		 * If all waited-on tasks from prior grace period are
+		 * done, and if all this rcu_node structure's CPUs are
+		 * still offline, propagate up the rcu_node tree and
+		 * clear ->wait_blkd_tasks.  Otherwise, if one of this
+		 * rcu_node structure's CPUs has since come back online,
+		 * simply clear ->wait_blkd_tasks (but rcu_cleanup_dead_rnp()
+		 * checks for this, so just call it unconditionally).
+		 */
+		if (rnp->wait_blkd_tasks &&
+		    (!rcu_preempt_has_tasks(rnp) ||
+		     rnp->qsmaskinit)) {
+			rnp->wait_blkd_tasks = false;
+			rcu_cleanup_dead_rnp(rnp);
+		}
+
+		raw_spin_unlock_irq(&rnp->lock);
+	}
+
+	/*
 	 * Set the quiescent-state-needed bits in all the rcu_node
 	 * structures for all currently online CPUs in breadth-first order,
 	 * starting from the root rcu_node structure, relying on the layout
@@ -2133,7 +2196,7 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
  * irqs disabled, and this lock is released upon return, but irqs remain
  * disabled.
  */
-static void __maybe_unused rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
+static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
 				      struct rcu_node *rnp, unsigned long flags)
 	__releases(rnp->lock)
 {
@@ -2409,6 +2472,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
 		smp_mb__after_unlock_lock(); /* GP memory ordering. */
 		rnp->qsmaskinit &= ~mask;
+		rnp->qsmask &= ~mask;
 		if (rnp->qsmaskinit) {
 			raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
 			return;
@@ -2427,6 +2491,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
+	unsigned long mask;
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
@@ -2443,12 +2508,12 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	raw_spin_unlock_irqrestore(&rsp->orphan_lock, flags);
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
+	mask = rdp->grpmask;
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
-	rnp->qsmaskinit &= ~rdp->grpmask;
-	if (rnp->qsmaskinit == 0 && !rcu_preempt_has_tasks(rnp))
-		rcu_cleanup_dead_rnp(rnp);
-	rcu_report_qs_rnp(rdp->grpmask, rsp, rnp, flags); /* Rlses rnp->lock. */
+	rnp->qsmaskinitnext &= ~mask;
+	raw_spin_unlock_irqrestore(&rnp->lock, flags);
+
 	WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL,
 		  "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, nxtlist=%p\n",
 		  cpu, rdp->qlen, rdp->nxtlist);
@@ -2654,12 +2719,21 @@ static void force_qs_rnp(struct rcu_state *rsp,
 			}
 		}
 		if (mask != 0) {
-
-			/* rcu_report_qs_rnp() releases rnp->lock. */
+			/* Idle/offline CPUs, report. */
 			rcu_report_qs_rnp(mask, rsp, rnp, flags);
-			continue;
+		} else if (rnp->parent &&
+			 list_empty(&rnp->blkd_tasks) &&
+			 !rnp->qsmask &&
+			 (rnp->parent->qsmask & rnp->grpmask)) {
+			/*
+			 * Race between grace-period initialization and task
+			 * existing RCU read-side critical section, report.
+			 */
+			rcu_report_unblock_qs_rnp(rsp, rnp, flags);
+		} else {
+			/* Nothing to do here, so just drop the lock. */
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		}
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 }
 
@@ -3569,6 +3643,28 @@ void rcu_barrier_sched(void)
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
 /*
+ * Propagate ->qsinitmask bits up the rcu_node tree to account for the
+ * first CPU in a given leaf rcu_node structure coming online.  The caller
+ * must hold the corresponding leaf rcu_node ->lock with interrrupts
+ * disabled.
+ */
+static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
+{
+	long mask;
+	struct rcu_node *rnp = rnp_leaf;
+
+	for (;;) {
+		mask = rnp->grpmask;
+		rnp = rnp->parent;
+		if (rnp == NULL)
+			return;
+		raw_spin_lock(&rnp->lock); /* Interrupts already disabled. */
+		rnp->qsmaskinit |= mask;
+		raw_spin_unlock(&rnp->lock); /* Interrupts remain disabled. */
+	}
+}
+
+/*
  * Do boot-time initialization of a CPU's per-CPU RCU data.
  */
 static void __init
@@ -3620,31 +3716,23 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 		   (atomic_read(&rdp->dynticks->dynticks) & ~0x1) + 1);
 	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
-	/* Add CPU to rcu_node bitmasks. */
+	/*
+	 * Add CPU to leaf rcu_node pending-online bitmask.  Any needed
+	 * propagation up the rcu_node tree will happen at the beginning
+	 * of the next grace period.
+	 */
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
-	do {
-		/* Exclude any attempts to start a new GP on small systems. */
-		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
-		rnp->qsmaskinit |= mask;
-		mask = rnp->grpmask;
-		if (rnp == rdp->mynode) {
-			/*
-			 * If there is a grace period in progress, we will
-			 * set up to wait for it next time we run the
-			 * RCU core code.
-			 */
-			rdp->gpnum = rnp->completed;
-			rdp->completed = rnp->completed;
-			rdp->passed_quiesce = 0;
-			rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
-			rdp->qs_pending = 0;
-			trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
-		}
-		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */
-		rnp = rnp->parent;
-	} while (rnp != NULL && !(rnp->qsmaskinit & mask));
-	local_irq_restore(flags);
+	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
+	smp_mb__after_unlock_lock();
+	rnp->qsmaskinitnext |= mask;
+	rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
+	rdp->completed = rnp->completed;
+	rdp->passed_quiesce = false;
+	rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
+	rdp->qs_pending = false;
+	trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
+	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 
 	mutex_unlock(&rsp->onoff_mutex);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 119de399eb2f..aa42562ff5b2 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -141,12 +141,20 @@ struct rcu_node {
 				/*  complete (only for PREEMPT_RCU). */
 	unsigned long qsmaskinit;
 				/* Per-GP initial value for qsmask & expmask. */
+				/*  Initialized from ->qsmaskinitnext at the */
+				/*  beginning of each grace period. */
+	unsigned long qsmaskinitnext;
+				/* Online CPUs for next grace period. */
 	unsigned long grpmask;	/* Mask to apply to parent qsmask. */
 				/*  Only one bit will be set in this mask. */
 	int	grplo;		/* lowest-numbered CPU or group here. */
 	int	grphi;		/* highest-numbered CPU or group here. */
 	u8	grpnum;		/* CPU/group number for next level up. */
 	u8	level;		/* root is at level 0. */
+	bool	wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
+				/*  exit RCU read-side critical sections */
+				/*  before propagating offline up the */
+				/*  rcu_node tree? */
 	struct rcu_node *parent;
 	struct list_head blkd_tasks;
 				/* Tasks blocked in RCU read-side critical */
@@ -559,6 +567,7 @@ static void rcu_prepare_kthreads(int cpu);
 static void rcu_cleanup_after_idle(void);
 static void rcu_prepare_for_idle(void);
 static void rcu_idle_count_callbacks_posted(void);
+static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
 static void print_cpu_stall_info_begin(void);
 static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index abccde8b893c..5c12914b6c41 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -179,7 +179,7 @@ static void rcu_preempt_note_context_switch(void)
 		 * But first, note that the current CPU must still be
 		 * on line!
 		 */
-		WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
+		WARN_ON_ONCE((rdp->grpmask & rcu_rnp_online_cpus(rnp)) == 0);
 		WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
 		if ((rnp->qsmask & rdp->grpmask) && rnp->gp_tasks != NULL) {
 			list_add(&t->rcu_node_entry, rnp->gp_tasks->prev);
@@ -262,7 +262,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
  */
 void rcu_read_unlock_special(struct task_struct *t)
 {
-	bool empty;
 	bool empty_exp;
 	bool empty_norm;
 	bool empty_exp_now;
@@ -318,7 +317,6 @@ void rcu_read_unlock_special(struct task_struct *t)
 				break;
 			raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
 		}
-		empty = !rcu_preempt_has_tasks(rnp);
 		empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
 		empty_exp = !rcu_preempted_readers_exp(rnp);
 		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
@@ -339,14 +337,6 @@ void rcu_read_unlock_special(struct task_struct *t)
 #endif /* #ifdef CONFIG_RCU_BOOST */
 
 		/*
-		 * If this was the last task on the list, go see if we
-		 * need to propagate ->qsmaskinit bit clearing up the
-		 * rcu_node tree.
-		 */
-		if (!empty && !rcu_preempt_has_tasks(rnp))
-			rcu_cleanup_dead_rnp(rnp);
-
-		/*
 		 * If this was the last task on the current list, and if
 		 * we aren't waiting on any CPUs, report the quiescent state.
 		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
@@ -820,8 +810,6 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
 	return 0;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
 /*
  * Because there is no preemptible RCU, there can be no readers blocked.
  */
@@ -830,8 +818,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
 	return false;
 }
 
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 /*
  * Because preemptible RCU does not exist, we never have to check for
  * tasks blocked within RCU read-side critical sections.
@@ -1131,7 +1117,7 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
  * Returns zero if all is well, a negated errno otherwise.
  */
 static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
-						 struct rcu_node *rnp)
+				       struct rcu_node *rnp)
 {
 	int rnp_index = rnp - &rsp->node[0];
 	unsigned long flags;
@@ -1141,7 +1127,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 	if (&rcu_preempt_state != rsp)
 		return 0;
 
-	if (!rcu_scheduler_fully_active || rnp->qsmaskinit == 0)
+	if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
 		return 0;
 
 	rsp->boost = 1;
@@ -1234,7 +1220,7 @@ static void rcu_cpu_kthread(unsigned int cpu)
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 {
 	struct task_struct *t = rnp->boost_kthread_task;
-	unsigned long mask = rnp->qsmaskinit;
+	unsigned long mask = rcu_rnp_online_cpus(rnp);
 	cpumask_var_t cm;
 	int cpu;
 
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index fbb6240509ea..f92361efd0f5 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -283,8 +283,8 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
-		seq_printf(m, "%lx/%lx %c%c>%c %d:%d ^%d    ",
-			   rnp->qsmask, rnp->qsmaskinit,
+		seq_printf(m, "%lx/%lx->%lx %c%c>%c %d:%d ^%d    ",
+			   rnp->qsmask, rnp->qsmaskinit, rnp->qsmaskinitnext,
 			   ".G"[rnp->gp_tasks != NULL],
 			   ".E"[rnp->exp_tasks != NULL],
 			   ".T"[!list_empty(&rnp->blkd_tasks)],
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 16/20] rcu: Eliminate ->onoff_mutex from rcu_node structure
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (15 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Because that RCU grace-period initialization need no longer exclude
CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
its uses.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 15 ---------------
 kernel/rcu/tree.h |  2 --
 2 files changed, 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f0f4d3510d24..79d53399247e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -101,7 +101,6 @@ struct rcu_state sname##_state = { \
 	.orphan_nxttail = &sname##_state.orphan_nxtlist, \
 	.orphan_donetail = &sname##_state.orphan_donelist, \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
-	.onoff_mutex = __MUTEX_INITIALIZER(sname##_state.onoff_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
 }; \
@@ -1754,10 +1753,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	trace_rcu_grace_period(rsp->name, rsp->gpnum, TPS("start"));
 	raw_spin_unlock_irq(&rnp->lock);
 
-	/* Exclude any concurrent CPU-hotplug operations. */
-	mutex_lock(&rsp->onoff_mutex);
-	smp_mb__after_unlock_lock(); /* ->gpnum increment before GP! */
-
 	/*
 	 * Apply per-leaf buffered online and offline operations to the
 	 * rcu_node tree.  Note that this new grace period need not wait
@@ -1844,7 +1839,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
 			schedule_timeout_uninterruptible(gp_init_delay);
 	}
 
-	mutex_unlock(&rsp->onoff_mutex);
 	return 1;
 }
 
@@ -2498,9 +2492,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	/* Adjust any no-longer-needed kthreads. */
 	rcu_boost_kthread_setaffinity(rnp, -1);
 
-	/* Exclude any attempts to start a new grace period. */
-	mutex_lock(&rsp->onoff_mutex);
-
 	/* Orphan the dead CPU's callbacks, and adopt them if appropriate. */
 	raw_spin_lock_irqsave(&rsp->orphan_lock, flags);
 	rcu_send_cbs_to_orphanage(cpu, rsp, rnp, rdp);
@@ -2517,7 +2508,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL,
 		  "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, nxtlist=%p\n",
 		  cpu, rdp->qlen, rdp->nxtlist);
-	mutex_unlock(&rsp->onoff_mutex);
 }
 
 #else /* #ifdef CONFIG_HOTPLUG_CPU */
@@ -3700,9 +3690,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
-	/* Exclude new grace periods. */
-	mutex_lock(&rsp->onoff_mutex);
-
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	rdp->beenonline = 1;	 /* We have now been online. */
@@ -3733,8 +3720,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 	rdp->qs_pending = false;
 	trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-
-	mutex_unlock(&rsp->onoff_mutex);
 }
 
 static void rcu_prepare_cpu(int cpu)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index aa42562ff5b2..a69d3dab2ec4 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -456,8 +456,6 @@ struct rcu_state {
 	long qlen;				/* Total number of callbacks. */
 	/* End of fields guarded by orphan_lock. */
 
-	struct mutex onoff_mutex;		/* Coordinate hotplug & GPs. */
-
 	struct mutex barrier_mutex;		/* Guards barrier fields. */
 	atomic_t barrier_cpu_count;		/* # CPUs waiting on. */
 	struct completion barrier_completion;	/* Wake at barrier end. */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 17/20] cpu: Make CPU-offline idle-loop transition point more precise
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (16 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit uses a per-CPU variable to make the CPU-offline code path
through the idle loop more precise, so that the outgoing CPU is
guaranteed to make it into the idle loop before it is powered off.
This commit is in preparation for putting the RCU offline-handling
code on this code path, which will eliminate the magic one-jiffy
wait that RCU uses as the maximum time for an outgoing CPU to get
all the way through the scheduler.

The magic one-jiffy wait for incoming CPUs remains a separate issue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/cpu.c        | 4 +++-
 kernel/sched/idle.c | 7 ++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1972b161c61e..d46b4dae0ca0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -408,8 +408,10 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	 *
 	 * Wait for the stop thread to go away.
 	 */
-	while (!idle_cpu(cpu))
+	while (!per_cpu(cpu_dead_idle, cpu))
 		cpu_relax();
+	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
+	per_cpu(cpu_dead_idle, cpu) = false;
 
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 94b2d7b88a27..e99e361ade20 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -198,6 +198,8 @@ exit_idle:
 	start_critical_timings();
 }
 
+DEFINE_PER_CPU(bool, cpu_dead_idle);
+
 /*
  * Generic idle loop implementation
  *
@@ -222,8 +224,11 @@ static void cpu_idle_loop(void)
 			check_pgt_cache();
 			rmb();
 
-			if (cpu_is_offline(smp_processor_id()))
+			if (cpu_is_offline(smp_processor_id())) {
+				smp_mb(); /* all activity before dead. */
+				this_cpu_write(cpu_dead_idle, true);
 				arch_cpu_idle_dead();
+			}
 
 			local_irq_disable();
 			arch_cpu_idle_enter();
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 18/20] rcu: Handle outgoing CPUs on exit from idle loop
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (17 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit informs RCU of an outgoing CPU just before that CPU invokes
arch_cpu_idle_dead() during its last pass through the idle loop (via a
new CPU_DYING_IDLE notifier value).  This change means that RCU need not
deal with outgoing CPUs passing through the scheduler after informing
RCU that they are no longer online.  Note that removing the CPU from
the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
and orphaning callbacks is still done at CPU_DEAD time, the reason being
that at CPU_DEAD time we have another CPU that can adopt them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/cpu.h      |  2 ++
 include/linux/rcupdate.h |  2 ++
 kernel/rcu/tree.c        | 41 +++++++++++++++++++++++++++++++----------
 kernel/sched/idle.c      |  2 ++
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4744ef915acd..d028721748d4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -95,6 +95,8 @@ enum {
 					* Called on the new cpu, just before
 					* enabling interrupts. Must not sleep,
 					* must not fail */
+#define CPU_DYING_IDLE		0x000B /* CPU (unsigned)v dying, reached
+					* idle loop. */
 #define CPU_BROKEN		0x000C /* CPU (unsigned)v did not die properly,
 					* perhaps due to preemption. */
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 78097491cd99..762022f07afd 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -266,6 +266,8 @@ void rcu_idle_enter(void);
 void rcu_idle_exit(void);
 void rcu_irq_enter(void);
 void rcu_irq_exit(void);
+int rcu_cpu_notify(struct notifier_block *self,
+		   unsigned long action, void *hcpu);
 
 #ifdef CONFIG_RCU_STALL_COMMON
 void rcu_sysrq_start(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79d53399247e..d5247ed44004 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2476,6 +2476,26 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 }
 
 /*
+ * The CPU is exiting the idle loop into the arch_cpu_idle_dead()
+ * function.  We now remove it from the rcu_node tree's ->qsmaskinit
+ * bit masks.
+ */
+static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
+{
+	unsigned long flags;
+	unsigned long mask;
+	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
+
+	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
+	mask = rdp->grpmask;
+	raw_spin_lock_irqsave(&rnp->lock, flags);
+	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
+	rnp->qsmaskinitnext &= ~mask;
+	raw_spin_unlock_irqrestore(&rnp->lock, flags);
+}
+
+/*
  * The CPU has been completely removed, and some other CPU is reporting
  * this fact from process context.  Do the remainder of the cleanup,
  * including orphaning the outgoing CPU's RCU callbacks, and also
@@ -2485,7 +2505,6 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
-	unsigned long mask;
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
@@ -2498,13 +2517,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	rcu_adopt_orphan_cbs(rsp, flags);
 	raw_spin_unlock_irqrestore(&rsp->orphan_lock, flags);
 
-	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
-	mask = rdp->grpmask;
-	raw_spin_lock_irqsave(&rnp->lock, flags);
-	smp_mb__after_unlock_lock();	/* Enforce GP memory-order guarantee. */
-	rnp->qsmaskinitnext &= ~mask;
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-
 	WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL,
 		  "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, nxtlist=%p\n",
 		  cpu, rdp->qlen, rdp->nxtlist);
@@ -2520,6 +2532,10 @@ static void __maybe_unused rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 {
 }
 
+static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
+{
+}
+
 static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 {
 }
@@ -3733,8 +3749,8 @@ static void rcu_prepare_cpu(int cpu)
 /*
  * Handle CPU online/offline notification events.
  */
-static int rcu_cpu_notify(struct notifier_block *self,
-				    unsigned long action, void *hcpu)
+int rcu_cpu_notify(struct notifier_block *self,
+		   unsigned long action, void *hcpu)
 {
 	long cpu = (long)hcpu;
 	struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu);
@@ -3760,6 +3776,11 @@ static int rcu_cpu_notify(struct notifier_block *self,
 		for_each_rcu_flavor(rsp)
 			rcu_cleanup_dying_cpu(rsp);
 		break;
+	case CPU_DYING_IDLE:
+		for_each_rcu_flavor(rsp) {
+			rcu_cleanup_dying_idle_cpu(cpu, rsp);
+		}
+		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 	case CPU_UP_CANCELED:
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index e99e361ade20..b0090accfb5b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -225,6 +225,8 @@ static void cpu_idle_loop(void)
 			rmb();
 
 			if (cpu_is_offline(smp_processor_id())) {
+				rcu_cpu_notify(NULL, CPU_DYING_IDLE,
+					       (void *)(long)smp_processor_id());
 				smp_mb(); /* all activity before dead. */
 				this_cpu_write(cpu_dead_idle, true);
 				arch_cpu_idle_dead();
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 19/20] rcutorture: Default to grace-period-initialization delays
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (18 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Given that CPU-hotplug events are now applied only at the starts of
grace periods, it makes sense to unconditionally enable slow grace-period
initialization for rcutorture testing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4ddfc5d09161..0ae13a384cfb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1276,7 +1276,7 @@ config RCU_TORTURE_TEST_SLOW_INIT
 config RCU_TORTURE_TEST_SLOW_INIT_DELAY
 	int "How must to slow down RCU grace-period initialization"
 	range 0 5
-	default 0
+	default 3
 	help
 	  This option specifies the number of jiffies to wait between
 	  each rcu_node structure initialization.
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH tip/core/rcu 20/20] rcu: Add diagnostics to grace-period cleanup
  2015-03-03 17:42   ` Paul E. McKenney
                     ` (19 preceding siblings ...)
  (?)
@ 2015-03-03 17:43   ` Paul E. McKenney
  -1 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

At grace-period initialization time, RCU checks that all quiescent
states were really reported for the previous grace period.  Now that
grace-period cleanup has been split out of grace-period initialization,
this commit also performs those checks at grace-period cleanup time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d5247ed44004..17b5abf999ca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1920,6 +1920,8 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		raw_spin_lock_irq(&rnp->lock);
 		smp_mb__after_unlock_lock();
+		WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
+		WARN_ON_ONCE(rnp->qsmask);
 		ACCESS_ONCE(rnp->completed) = rsp->gpnum;
 		rdp = this_cpu_ptr(rsp->rda);
 		if (rnp == rdp->mynode)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code Paul E. McKenney
@ 2015-03-03 19:17     ` Boris Ostrovsky
  2015-03-03 19:42       ` Paul E. McKenney
  2015-03-03 19:42       ` Paul E. McKenney
  2015-03-03 19:17     ` Boris Ostrovsky
  1 sibling, 2 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 19:17 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, x86, Konrad Rzeszutek Wilk, David Vrabel, xen-devel

On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>   }
> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>   		schedule_timeout(HZ/10);
>   	}
>   
> -	cpu_die_common(cpu);
> +	(void)cpu_wait_death(cpu, 5);
> +	/* FIXME: Are the below calls really safe in case of timeout? */


Not for HVM guests (PV guests will only reach this point after target 
cpu has been marked as down by the hypervisor).

We need at least to have a message similar to what native_cpu_die() 
prints on cpu_wait_death() failure. And I think we should not call the 
two routines below (three, actually --- there is also 
xen_teardown_timer() below, which is not part of the diff).

-boris


>   
>   	xen_smp_intr_free(cpu);
>   	xen_uninit_lock_cpu(cpu);


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 17:42   ` [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code Paul E. McKenney
  2015-03-03 19:17     ` Boris Ostrovsky
@ 2015-03-03 19:17     ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 19:17 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, josh,
	rostedt, oleg, dhowells, edumazet, mathieu.desnoyers,
	David Vrabel, dipankar, xen-devel, akpm, mingo

On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>   }
> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>   		schedule_timeout(HZ/10);
>   	}
>   
> -	cpu_die_common(cpu);
> +	(void)cpu_wait_death(cpu, 5);
> +	/* FIXME: Are the below calls really safe in case of timeout? */


Not for HVM guests (PV guests will only reach this point after target 
cpu has been marked as down by the hypervisor).

We need at least to have a message similar to what native_cpu_die() 
prints on cpu_wait_death() failure. And I think we should not call the 
two routines below (three, actually --- there is also 
xen_teardown_timer() below, which is not part of the diff).

-boris


>   
>   	xen_smp_intr_free(cpu);
>   	xen_uninit_lock_cpu(cpu);

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 19:17     ` Boris Ostrovsky
  2015-03-03 19:42       ` Paul E. McKenney
@ 2015-03-03 19:42       ` Paul E. McKenney
  2015-03-03 20:13         ` Boris Ostrovsky
  2015-03-03 20:13         ` Boris Ostrovsky
  1 sibling, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 19:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >  }
> >@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >  		schedule_timeout(HZ/10);
> >  	}
> >-	cpu_die_common(cpu);
> >+	(void)cpu_wait_death(cpu, 5);
> >+	/* FIXME: Are the below calls really safe in case of timeout? */
> 
> 
> Not for HVM guests (PV guests will only reach this point after
> target cpu has been marked as down by the hypervisor).
> 
> We need at least to have a message similar to what native_cpu_die()
> prints on cpu_wait_death() failure. And I think we should not call
> the two routines below (three, actually --- there is also
> xen_teardown_timer() below, which is not part of the diff).
> 
> -boris
> 
> 
> >  	xen_smp_intr_free(cpu);
> >  	xen_uninit_lock_cpu(cpu);

So something like this, then?

	if (cpu_wait_death(cpu, 5)) {
		xen_smp_intr_free(cpu);
		xen_uninit_lock_cpu(cpu);
		xen_teardown_timer(cpu);
	}

Easy change for me to make if so!

Or do I need some other check for HVM-vs.-PV guests, and, if so, what
would that check be?  And also if so, is it OK to online a PV guest's
CPU that timed out during its previous offline?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 19:17     ` Boris Ostrovsky
@ 2015-03-03 19:42       ` Paul E. McKenney
  2015-03-03 19:42       ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 19:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >  }
> >@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >  		schedule_timeout(HZ/10);
> >  	}
> >-	cpu_die_common(cpu);
> >+	(void)cpu_wait_death(cpu, 5);
> >+	/* FIXME: Are the below calls really safe in case of timeout? */
> 
> 
> Not for HVM guests (PV guests will only reach this point after
> target cpu has been marked as down by the hypervisor).
> 
> We need at least to have a message similar to what native_cpu_die()
> prints on cpu_wait_death() failure. And I think we should not call
> the two routines below (three, actually --- there is also
> xen_teardown_timer() below, which is not part of the diff).
> 
> -boris
> 
> 
> >  	xen_smp_intr_free(cpu);
> >  	xen_uninit_lock_cpu(cpu);

So something like this, then?

	if (cpu_wait_death(cpu, 5)) {
		xen_smp_intr_free(cpu);
		xen_uninit_lock_cpu(cpu);
		xen_teardown_timer(cpu);
	}

Easy change for me to make if so!

Or do I need some other check for HVM-vs.-PV guests, and, if so, what
would that check be?  And also if so, is it OK to online a PV guest's
CPU that timed out during its previous offline?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 19:42       ` Paul E. McKenney
  2015-03-03 20:13         ` Boris Ostrovsky
@ 2015-03-03 20:13         ` Boris Ostrovsky
  2015-03-03 21:26           ` Paul E. McKenney
  2015-03-03 21:26           ` Paul E. McKenney
  1 sibling, 2 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 20:13 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>   }
>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>   		schedule_timeout(HZ/10);
>>>   	}
>>> -	cpu_die_common(cpu);
>>> +	(void)cpu_wait_death(cpu, 5);
>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>
>>
>> Not for HVM guests (PV guests will only reach this point after
>> target cpu has been marked as down by the hypervisor).
>>
>> We need at least to have a message similar to what native_cpu_die()
>> prints on cpu_wait_death() failure. And I think we should not call
>> the two routines below (three, actually --- there is also
>> xen_teardown_timer() below, which is not part of the diff).
>>
>> -boris
>>
>>
>>>   	xen_smp_intr_free(cpu);
>>>   	xen_uninit_lock_cpu(cpu);
>
> So something like this, then?
>
> 	if (cpu_wait_death(cpu, 5)) {
> 		xen_smp_intr_free(cpu);
> 		xen_uninit_lock_cpu(cpu);
> 		xen_teardown_timer(cpu);
> 	}

	else
		pr_err("CPU %u didn't die...\n", cpu);


>
> Easy change for me to make if so!
>
> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> would that check be?  And also if so, is it OK to online a PV guest's
> CPU that timed out during its previous offline?


I believe PV VCPUs will always be CPU_DEAD by the time we get here since 
we are (indirectly) waiting for this in the loop at the beginning of 
xen_cpu_die():

'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL))' 
will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down, 
smp_processor_id()' in xen_play_dead(). Which happens after 
play_dead_common() has marked the cpu as CPU_DEAD.

So no test is needed.

Thanks.
-boris



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 19:42       ` Paul E. McKenney
@ 2015-03-03 20:13         ` Boris Ostrovsky
  2015-03-03 20:13         ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 20:13 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>   }
>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>   		schedule_timeout(HZ/10);
>>>   	}
>>> -	cpu_die_common(cpu);
>>> +	(void)cpu_wait_death(cpu, 5);
>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>
>>
>> Not for HVM guests (PV guests will only reach this point after
>> target cpu has been marked as down by the hypervisor).
>>
>> We need at least to have a message similar to what native_cpu_die()
>> prints on cpu_wait_death() failure. And I think we should not call
>> the two routines below (three, actually --- there is also
>> xen_teardown_timer() below, which is not part of the diff).
>>
>> -boris
>>
>>
>>>   	xen_smp_intr_free(cpu);
>>>   	xen_uninit_lock_cpu(cpu);
>
> So something like this, then?
>
> 	if (cpu_wait_death(cpu, 5)) {
> 		xen_smp_intr_free(cpu);
> 		xen_uninit_lock_cpu(cpu);
> 		xen_teardown_timer(cpu);
> 	}

	else
		pr_err("CPU %u didn't die...\n", cpu);


>
> Easy change for me to make if so!
>
> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> would that check be?  And also if so, is it OK to online a PV guest's
> CPU that timed out during its previous offline?


I believe PV VCPUs will always be CPU_DEAD by the time we get here since 
we are (indirectly) waiting for this in the loop at the beginning of 
xen_cpu_die():

'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL))' 
will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down, 
smp_processor_id()' in xen_play_dead(). Which happens after 
play_dead_common() has marked the cpu as CPU_DEAD.

So no test is needed.

Thanks.
-boris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 20:13         ` Boris Ostrovsky
@ 2015-03-03 21:26           ` Paul E. McKenney
  2015-03-03 22:06             ` Boris Ostrovsky
  2015-03-03 22:06             ` Boris Ostrovsky
  2015-03-03 21:26           ` Paul E. McKenney
  1 sibling, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 21:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>  }
> >>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>  		schedule_timeout(HZ/10);
> >>>  	}
> >>>-	cpu_die_common(cpu);
> >>>+	(void)cpu_wait_death(cpu, 5);
> >>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>
> >>
> >>Not for HVM guests (PV guests will only reach this point after
> >>target cpu has been marked as down by the hypervisor).
> >>
> >>We need at least to have a message similar to what native_cpu_die()
> >>prints on cpu_wait_death() failure. And I think we should not call
> >>the two routines below (three, actually --- there is also
> >>xen_teardown_timer() below, which is not part of the diff).
> >>
> >>-boris
> >>
> >>
> >>>  	xen_smp_intr_free(cpu);
> >>>  	xen_uninit_lock_cpu(cpu);
> >
> >So something like this, then?
> >
> >	if (cpu_wait_death(cpu, 5)) {
> >		xen_smp_intr_free(cpu);
> >		xen_uninit_lock_cpu(cpu);
> >		xen_teardown_timer(cpu);
> >	}
> 
> 	else
> 		pr_err("CPU %u didn't die...\n", cpu);
> 
> 
> >
> >Easy change for me to make if so!
> >
> >Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >would that check be?  And also if so, is it OK to online a PV guest's
> >CPU that timed out during its previous offline?
> 
> 
> I believe PV VCPUs will always be CPU_DEAD by the time we get here
> since we are (indirectly) waiting for this in the loop at the
> beginning of xen_cpu_die():
> 
> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> smp_processor_id()' in xen_play_dead(). Which happens after
> play_dead_common() has marked the cpu as CPU_DEAD.
> 
> So no test is needed.

OK, so I have the following patch on top of my previous patch, which
I will merge if testing goes well.  So if a CPU times out going offline,
the above three functions will not be called, the "didn't die" message
will be printed, and any future attempt to online that CPU will fail.
Is that the correct semantics?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index e2c7389c58c5..f2a06ff0614d 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
 		schedule_timeout(HZ/10);
 	}
 
-	(void)cpu_wait_death(cpu, 5);
-	/* FIXME: Are the below calls really safe in case of timeout? */
-
-	xen_smp_intr_free(cpu);
-	xen_uninit_lock_cpu(cpu);
-	xen_teardown_timer(cpu);
+	if (cpu_wait_death(cpu, 5)) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+		xen_teardown_timer(cpu);
+	} else {
+		pr_err("CPU %u didn't die...\n", cpu);
+	}
 }
 
 static void xen_play_dead(void) /* used only with HOTPLUG_CPU */


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 20:13         ` Boris Ostrovsky
  2015-03-03 21:26           ` Paul E. McKenney
@ 2015-03-03 21:26           ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 21:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>  }
> >>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>  		schedule_timeout(HZ/10);
> >>>  	}
> >>>-	cpu_die_common(cpu);
> >>>+	(void)cpu_wait_death(cpu, 5);
> >>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>
> >>
> >>Not for HVM guests (PV guests will only reach this point after
> >>target cpu has been marked as down by the hypervisor).
> >>
> >>We need at least to have a message similar to what native_cpu_die()
> >>prints on cpu_wait_death() failure. And I think we should not call
> >>the two routines below (three, actually --- there is also
> >>xen_teardown_timer() below, which is not part of the diff).
> >>
> >>-boris
> >>
> >>
> >>>  	xen_smp_intr_free(cpu);
> >>>  	xen_uninit_lock_cpu(cpu);
> >
> >So something like this, then?
> >
> >	if (cpu_wait_death(cpu, 5)) {
> >		xen_smp_intr_free(cpu);
> >		xen_uninit_lock_cpu(cpu);
> >		xen_teardown_timer(cpu);
> >	}
> 
> 	else
> 		pr_err("CPU %u didn't die...\n", cpu);
> 
> 
> >
> >Easy change for me to make if so!
> >
> >Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >would that check be?  And also if so, is it OK to online a PV guest's
> >CPU that timed out during its previous offline?
> 
> 
> I believe PV VCPUs will always be CPU_DEAD by the time we get here
> since we are (indirectly) waiting for this in the loop at the
> beginning of xen_cpu_die():
> 
> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> smp_processor_id()' in xen_play_dead(). Which happens after
> play_dead_common() has marked the cpu as CPU_DEAD.
> 
> So no test is needed.

OK, so I have the following patch on top of my previous patch, which
I will merge if testing goes well.  So if a CPU times out going offline,
the above three functions will not be called, the "didn't die" message
will be printed, and any future attempt to online that CPU will fail.
Is that the correct semantics?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index e2c7389c58c5..f2a06ff0614d 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
 		schedule_timeout(HZ/10);
 	}
 
-	(void)cpu_wait_death(cpu, 5);
-	/* FIXME: Are the below calls really safe in case of timeout? */
-
-	xen_smp_intr_free(cpu);
-	xen_uninit_lock_cpu(cpu);
-	xen_teardown_timer(cpu);
+	if (cpu_wait_death(cpu, 5)) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+		xen_teardown_timer(cpu);
+	} else {
+		pr_err("CPU %u didn't die...\n", cpu);
+	}
 }
 
 static void xen_play_dead(void) /* used only with HOTPLUG_CPU */

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 21:26           ` Paul E. McKenney
@ 2015-03-03 22:06             ` Boris Ostrovsky
  2015-03-03 22:31               ` Paul E. McKenney
  2015-03-03 22:31               ` Paul E. McKenney
  2015-03-03 22:06             ` Boris Ostrovsky
  1 sibling, 2 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 22:06 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
>> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
>>> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>>>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>>>   }
>>>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>>>   		schedule_timeout(HZ/10);
>>>>>   	}
>>>>> -	cpu_die_common(cpu);
>>>>> +	(void)cpu_wait_death(cpu, 5);
>>>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>>>
>>>> Not for HVM guests (PV guests will only reach this point after
>>>> target cpu has been marked as down by the hypervisor).
>>>>
>>>> We need at least to have a message similar to what native_cpu_die()
>>>> prints on cpu_wait_death() failure. And I think we should not call
>>>> the two routines below (three, actually --- there is also
>>>> xen_teardown_timer() below, which is not part of the diff).
>>>>
>>>> -boris
>>>>
>>>>
>>>>>   	xen_smp_intr_free(cpu);
>>>>>   	xen_uninit_lock_cpu(cpu);
>>> So something like this, then?
>>>
>>> 	if (cpu_wait_death(cpu, 5)) {
>>> 		xen_smp_intr_free(cpu);
>>> 		xen_uninit_lock_cpu(cpu);
>>> 		xen_teardown_timer(cpu);
>>> 	}
>> 	else
>> 		pr_err("CPU %u didn't die...\n", cpu);
>>
>>
>>> Easy change for me to make if so!
>>>
>>> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
>>> would that check be?  And also if so, is it OK to online a PV guest's
>>> CPU that timed out during its previous offline?
>>
>> I believe PV VCPUs will always be CPU_DEAD by the time we get here
>> since we are (indirectly) waiting for this in the loop at the
>> beginning of xen_cpu_die():
>>
>> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
>> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
>> smp_processor_id()' in xen_play_dead(). Which happens after
>> play_dead_common() has marked the cpu as CPU_DEAD.
>>
>> So no test is needed.
> OK, so I have the following patch on top of my previous patch, which
> I will merge if testing goes well.  So if a CPU times out going offline,
> the above three functions will not be called, the "didn't die" message
> will be printed, and any future attempt to online that CPU will fail.
> Is that the correct semantics?

Yes.

I am not sure whether not ever onlining the CPU is the best outcome but 
then I don't think trying to online it again with all interrupts and 
such still set up will work well. And it's an improvement over what we 
have now anyway (with current code we may clean up things for a non-dead 
cpu).

Thanks.
-boris


>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index e2c7389c58c5..f2a06ff0614d 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
>   		schedule_timeout(HZ/10);
>   	}
>   
> -	(void)cpu_wait_death(cpu, 5);
> -	/* FIXME: Are the below calls really safe in case of timeout? */
> -
> -	xen_smp_intr_free(cpu);
> -	xen_uninit_lock_cpu(cpu);
> -	xen_teardown_timer(cpu);
> +	if (cpu_wait_death(cpu, 5)) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +		xen_teardown_timer(cpu);
> +	} else {
> +		pr_err("CPU %u didn't die...\n", cpu);
> +	}
>   }
>   
>   static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 21:26           ` Paul E. McKenney
  2015-03-03 22:06             ` Boris Ostrovsky
@ 2015-03-03 22:06             ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-03 22:06 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
>> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
>>> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>>>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>>>   }
>>>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>>>   		schedule_timeout(HZ/10);
>>>>>   	}
>>>>> -	cpu_die_common(cpu);
>>>>> +	(void)cpu_wait_death(cpu, 5);
>>>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>>>
>>>> Not for HVM guests (PV guests will only reach this point after
>>>> target cpu has been marked as down by the hypervisor).
>>>>
>>>> We need at least to have a message similar to what native_cpu_die()
>>>> prints on cpu_wait_death() failure. And I think we should not call
>>>> the two routines below (three, actually --- there is also
>>>> xen_teardown_timer() below, which is not part of the diff).
>>>>
>>>> -boris
>>>>
>>>>
>>>>>   	xen_smp_intr_free(cpu);
>>>>>   	xen_uninit_lock_cpu(cpu);
>>> So something like this, then?
>>>
>>> 	if (cpu_wait_death(cpu, 5)) {
>>> 		xen_smp_intr_free(cpu);
>>> 		xen_uninit_lock_cpu(cpu);
>>> 		xen_teardown_timer(cpu);
>>> 	}
>> 	else
>> 		pr_err("CPU %u didn't die...\n", cpu);
>>
>>
>>> Easy change for me to make if so!
>>>
>>> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
>>> would that check be?  And also if so, is it OK to online a PV guest's
>>> CPU that timed out during its previous offline?
>>
>> I believe PV VCPUs will always be CPU_DEAD by the time we get here
>> since we are (indirectly) waiting for this in the loop at the
>> beginning of xen_cpu_die():
>>
>> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
>> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
>> smp_processor_id()' in xen_play_dead(). Which happens after
>> play_dead_common() has marked the cpu as CPU_DEAD.
>>
>> So no test is needed.
> OK, so I have the following patch on top of my previous patch, which
> I will merge if testing goes well.  So if a CPU times out going offline,
> the above three functions will not be called, the "didn't die" message
> will be printed, and any future attempt to online that CPU will fail.
> Is that the correct semantics?

Yes.

I am not sure whether not ever onlining the CPU is the best outcome but 
then I don't think trying to online it again with all interrupts and 
such still set up will work well. And it's an improvement over what we 
have now anyway (with current code we may clean up things for a non-dead 
cpu).

Thanks.
-boris


>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index e2c7389c58c5..f2a06ff0614d 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
>   		schedule_timeout(HZ/10);
>   	}
>   
> -	(void)cpu_wait_death(cpu, 5);
> -	/* FIXME: Are the below calls really safe in case of timeout? */
> -
> -	xen_smp_intr_free(cpu);
> -	xen_uninit_lock_cpu(cpu);
> -	xen_teardown_timer(cpu);
> +	if (cpu_wait_death(cpu, 5)) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +		xen_teardown_timer(cpu);
> +	} else {
> +		pr_err("CPU %u didn't die...\n", cpu);
> +	}
>   }
>   
>   static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 22:06             ` Boris Ostrovsky
  2015-03-03 22:31               ` Paul E. McKenney
@ 2015-03-03 22:31               ` Paul E. McKenney
  2015-03-04 14:43                 ` Paul E. McKenney
  2015-03-04 14:43                 ` Paul E. McKenney
  1 sibling, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 22:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> >>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>>>  }
> >>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>>  		schedule_timeout(HZ/10);
> >>>>>  	}
> >>>>>-	cpu_die_common(cpu);
> >>>>>+	(void)cpu_wait_death(cpu, 5);
> >>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>
> >>>>Not for HVM guests (PV guests will only reach this point after
> >>>>target cpu has been marked as down by the hypervisor).
> >>>>
> >>>>We need at least to have a message similar to what native_cpu_die()
> >>>>prints on cpu_wait_death() failure. And I think we should not call
> >>>>the two routines below (three, actually --- there is also
> >>>>xen_teardown_timer() below, which is not part of the diff).
> >>>>
> >>>>-boris
> >>>>
> >>>>
> >>>>>  	xen_smp_intr_free(cpu);
> >>>>>  	xen_uninit_lock_cpu(cpu);
> >>>So something like this, then?
> >>>
> >>>	if (cpu_wait_death(cpu, 5)) {
> >>>		xen_smp_intr_free(cpu);
> >>>		xen_uninit_lock_cpu(cpu);
> >>>		xen_teardown_timer(cpu);
> >>>	}
> >>	else
> >>		pr_err("CPU %u didn't die...\n", cpu);
> >>
> >>
> >>>Easy change for me to make if so!
> >>>
> >>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >>>would that check be?  And also if so, is it OK to online a PV guest's
> >>>CPU that timed out during its previous offline?
> >>
> >>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> >>since we are (indirectly) waiting for this in the loop at the
> >>beginning of xen_cpu_die():
> >>
> >>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> >>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> >>smp_processor_id()' in xen_play_dead(). Which happens after
> >>play_dead_common() has marked the cpu as CPU_DEAD.
> >>
> >>So no test is needed.
> >OK, so I have the following patch on top of my previous patch, which
> >I will merge if testing goes well.  So if a CPU times out going offline,
> >the above three functions will not be called, the "didn't die" message
> >will be printed, and any future attempt to online that CPU will fail.
> >Is that the correct semantics?
> 
> Yes.
> 
> I am not sure whether not ever onlining the CPU is the best outcome
> but then I don't think trying to online it again with all interrupts
> and such still set up will work well. And it's an improvement over
> what we have now anyway (with current code we may clean up things
> for a non-dead cpu).

Another strategy is to key off of the return value of cpu_check_up_prepare().
If it returns -EBUSY, then the outgoing CPU finished up after the
surviving CPU timed out.  The CPU trying to bring the new CPU online
could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
and xen_teardown_timer() at that point.

But I must defer to you on this sort of thing.

							Thanx, Paul

> Thanks.
> -boris
> 
> 
> >
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >index e2c7389c58c5..f2a06ff0614d 100644
> >--- a/arch/x86/xen/smp.c
> >+++ b/arch/x86/xen/smp.c
> >@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> >  		schedule_timeout(HZ/10);
> >  	}
> >-	(void)cpu_wait_death(cpu, 5);
> >-	/* FIXME: Are the below calls really safe in case of timeout? */
> >-
> >-	xen_smp_intr_free(cpu);
> >-	xen_uninit_lock_cpu(cpu);
> >-	xen_teardown_timer(cpu);
> >+	if (cpu_wait_death(cpu, 5)) {
> >+		xen_smp_intr_free(cpu);
> >+		xen_uninit_lock_cpu(cpu);
> >+		xen_teardown_timer(cpu);
> >+	} else {
> >+		pr_err("CPU %u didn't die...\n", cpu);
> >+	}
> >  }
> >  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> >
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 22:06             ` Boris Ostrovsky
@ 2015-03-03 22:31               ` Paul E. McKenney
  2015-03-03 22:31               ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-03 22:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> >>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>>>  }
> >>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>>  		schedule_timeout(HZ/10);
> >>>>>  	}
> >>>>>-	cpu_die_common(cpu);
> >>>>>+	(void)cpu_wait_death(cpu, 5);
> >>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>
> >>>>Not for HVM guests (PV guests will only reach this point after
> >>>>target cpu has been marked as down by the hypervisor).
> >>>>
> >>>>We need at least to have a message similar to what native_cpu_die()
> >>>>prints on cpu_wait_death() failure. And I think we should not call
> >>>>the two routines below (three, actually --- there is also
> >>>>xen_teardown_timer() below, which is not part of the diff).
> >>>>
> >>>>-boris
> >>>>
> >>>>
> >>>>>  	xen_smp_intr_free(cpu);
> >>>>>  	xen_uninit_lock_cpu(cpu);
> >>>So something like this, then?
> >>>
> >>>	if (cpu_wait_death(cpu, 5)) {
> >>>		xen_smp_intr_free(cpu);
> >>>		xen_uninit_lock_cpu(cpu);
> >>>		xen_teardown_timer(cpu);
> >>>	}
> >>	else
> >>		pr_err("CPU %u didn't die...\n", cpu);
> >>
> >>
> >>>Easy change for me to make if so!
> >>>
> >>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >>>would that check be?  And also if so, is it OK to online a PV guest's
> >>>CPU that timed out during its previous offline?
> >>
> >>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> >>since we are (indirectly) waiting for this in the loop at the
> >>beginning of xen_cpu_die():
> >>
> >>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> >>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> >>smp_processor_id()' in xen_play_dead(). Which happens after
> >>play_dead_common() has marked the cpu as CPU_DEAD.
> >>
> >>So no test is needed.
> >OK, so I have the following patch on top of my previous patch, which
> >I will merge if testing goes well.  So if a CPU times out going offline,
> >the above three functions will not be called, the "didn't die" message
> >will be printed, and any future attempt to online that CPU will fail.
> >Is that the correct semantics?
> 
> Yes.
> 
> I am not sure whether not ever onlining the CPU is the best outcome
> but then I don't think trying to online it again with all interrupts
> and such still set up will work well. And it's an improvement over
> what we have now anyway (with current code we may clean up things
> for a non-dead cpu).

Another strategy is to key off of the return value of cpu_check_up_prepare().
If it returns -EBUSY, then the outgoing CPU finished up after the
surviving CPU timed out.  The CPU trying to bring the new CPU online
could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
and xen_teardown_timer() at that point.

But I must defer to you on this sort of thing.

							Thanx, Paul

> Thanks.
> -boris
> 
> 
> >
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >index e2c7389c58c5..f2a06ff0614d 100644
> >--- a/arch/x86/xen/smp.c
> >+++ b/arch/x86/xen/smp.c
> >@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> >  		schedule_timeout(HZ/10);
> >  	}
> >-	(void)cpu_wait_death(cpu, 5);
> >-	/* FIXME: Are the below calls really safe in case of timeout? */
> >-
> >-	xen_smp_intr_free(cpu);
> >-	xen_uninit_lock_cpu(cpu);
> >-	xen_teardown_timer(cpu);
> >+	if (cpu_wait_death(cpu, 5)) {
> >+		xen_smp_intr_free(cpu);
> >+		xen_uninit_lock_cpu(cpu);
> >+		xen_teardown_timer(cpu);
> >+	} else {
> >+		pr_err("CPU %u didn't die...\n", cpu);
> >+	}
> >  }
> >  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> >
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 10/20] rcu: Provide diagnostic option to slow down grace-period initialization
  2015-03-03 17:43   ` [PATCH tip/core/rcu 10/20] rcu: Provide diagnostic option to slow down grace-period initialization Paul E. McKenney
@ 2015-03-04 10:54     ` Paul Bolle
  2015-03-04 14:59       ` Paul E. McKenney
  0 siblings, 1 reply; 59+ messages in thread
From: Paul Bolle @ 2015-03-04 10:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

All I spotted is a silly typo.

Paul E. McKenney schreef op di 03-03-2015 om 09:43 [-0800]:
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1257,6 +1257,30 @@ config RCU_TORTURE_TEST_RUNNABLE
>  	  Say N here if you want the RCU torture tests to start only
>  	  after being manually enabled via /proc.
>  
> +config RCU_TORTURE_TEST_SLOW_INIT
> +	bool "Slow down RCU grace-period initialization to expose races"
> +	depends on RCU_TORTURE_TEST
> +	help
> +	  This option makes grace-period initialization block for a
> +	  few jiffies between initializing each pair of consecutive
> +	  rcu_node structures.	This helps to expose races involving
> +	  grace-period initialization, in other words, it makes your
> +	  kernel less stable.  It can also greatly increase grace-period
> +	  latency, especially on systems with large numbers of CPUs.
> +	  This is useful when torture-testing RCU, but in almost no
> +	  other circumstance.
> +
> +	  Say Y here if you want your system to crash and hang more often.

(Did you ever consider going into marketing?)

> +	  Say N if you want a sane system.
> +
> +config RCU_TORTURE_TEST_SLOW_INIT_DELAY
> +	int "How must to slow down RCU grace-period initialization"

s/must/much/

> +	range 0 5
> +	default 0
> +	help
> +	  This option specifies the number of jiffies to wait between
> +	  each rcu_node structure initialization.
> +
>  config RCU_CPU_STALL_TIMEOUT
>  	int "RCU CPU stall timeout in seconds"
>  	depends on RCU_STALL_COMMON


Paul Bolle


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 22:31               ` Paul E. McKenney
  2015-03-04 14:43                 ` Paul E. McKenney
@ 2015-03-04 14:43                 ` Paul E. McKenney
  2015-03-04 14:55                   ` Boris Ostrovsky
  2015-03-04 14:55                   ` Boris Ostrovsky
  1 sibling, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-04 14:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> > On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> > >On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> > >>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> > >>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> > >>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> > >>>>>  }
> > >>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> > >>>>>  		schedule_timeout(HZ/10);
> > >>>>>  	}
> > >>>>>-	cpu_die_common(cpu);
> > >>>>>+	(void)cpu_wait_death(cpu, 5);
> > >>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> > >>>>
> > >>>>Not for HVM guests (PV guests will only reach this point after
> > >>>>target cpu has been marked as down by the hypervisor).
> > >>>>
> > >>>>We need at least to have a message similar to what native_cpu_die()
> > >>>>prints on cpu_wait_death() failure. And I think we should not call
> > >>>>the two routines below (three, actually --- there is also
> > >>>>xen_teardown_timer() below, which is not part of the diff).
> > >>>>
> > >>>>-boris
> > >>>>
> > >>>>
> > >>>>>  	xen_smp_intr_free(cpu);
> > >>>>>  	xen_uninit_lock_cpu(cpu);
> > >>>So something like this, then?
> > >>>
> > >>>	if (cpu_wait_death(cpu, 5)) {
> > >>>		xen_smp_intr_free(cpu);
> > >>>		xen_uninit_lock_cpu(cpu);
> > >>>		xen_teardown_timer(cpu);
> > >>>	}
> > >>	else
> > >>		pr_err("CPU %u didn't die...\n", cpu);
> > >>
> > >>
> > >>>Easy change for me to make if so!
> > >>>
> > >>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> > >>>would that check be?  And also if so, is it OK to online a PV guest's
> > >>>CPU that timed out during its previous offline?
> > >>
> > >>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> > >>since we are (indirectly) waiting for this in the loop at the
> > >>beginning of xen_cpu_die():
> > >>
> > >>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> > >>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> > >>smp_processor_id()' in xen_play_dead(). Which happens after
> > >>play_dead_common() has marked the cpu as CPU_DEAD.
> > >>
> > >>So no test is needed.
> > >OK, so I have the following patch on top of my previous patch, which
> > >I will merge if testing goes well.  So if a CPU times out going offline,
> > >the above three functions will not be called, the "didn't die" message
> > >will be printed, and any future attempt to online that CPU will fail.
> > >Is that the correct semantics?
> > 
> > Yes.
> > 
> > I am not sure whether not ever onlining the CPU is the best outcome
> > but then I don't think trying to online it again with all interrupts
> > and such still set up will work well. And it's an improvement over
> > what we have now anyway (with current code we may clean up things
> > for a non-dead cpu).
> 
> Another strategy is to key off of the return value of cpu_check_up_prepare().
> If it returns -EBUSY, then the outgoing CPU finished up after the
> surviving CPU timed out.  The CPU trying to bring the new CPU online
> could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
> and xen_teardown_timer() at that point.

And the code for this, in xen_cpu_up(), might look something like the
following:

	rc = cpu_check_up_prepare(cpu);
	if (rc && rc != -EBUSY)
		return rc;
	if (rc == EBUSY) {
		xen_smp_intr_free(cpu);
		xen_uninit_lock_cpu(cpu);
		xen_teardown_timer(cpu);
	}

The idea is that we detect when the CPU eventually took itself offline,
but only did so after the surviving CPU timed out.  (Of course, it
would probably be best to put those three statements into a small
function that is called from both places.)

I have no idea whether this approach would really work, especially given
your earlier statement that CPU_DEAD happens early on.  But in case it
is helpful or sparks some better idea.

							Thanx, Paul

> But I must defer to you on this sort of thing.
> 
> 							Thanx, Paul
> 
> > Thanks.
> > -boris
> > 
> > 
> > >
> > >							Thanx, Paul
> > >
> > >------------------------------------------------------------------------
> > >
> > >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > >index e2c7389c58c5..f2a06ff0614d 100644
> > >--- a/arch/x86/xen/smp.c
> > >+++ b/arch/x86/xen/smp.c
> > >@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> > >  		schedule_timeout(HZ/10);
> > >  	}
> > >-	(void)cpu_wait_death(cpu, 5);
> > >-	/* FIXME: Are the below calls really safe in case of timeout? */
> > >-
> > >-	xen_smp_intr_free(cpu);
> > >-	xen_uninit_lock_cpu(cpu);
> > >-	xen_teardown_timer(cpu);
> > >+	if (cpu_wait_death(cpu, 5)) {
> > >+		xen_smp_intr_free(cpu);
> > >+		xen_uninit_lock_cpu(cpu);
> > >+		xen_teardown_timer(cpu);
> > >+	} else {
> > >+		pr_err("CPU %u didn't die...\n", cpu);
> > >+	}
> > >  }
> > >  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> > >
> > 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-03 22:31               ` Paul E. McKenney
@ 2015-03-04 14:43                 ` Paul E. McKenney
  2015-03-04 14:43                 ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-04 14:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> > On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> > >On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> > >>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> > >>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> > >>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> > >>>>>  }
> > >>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> > >>>>>  		schedule_timeout(HZ/10);
> > >>>>>  	}
> > >>>>>-	cpu_die_common(cpu);
> > >>>>>+	(void)cpu_wait_death(cpu, 5);
> > >>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> > >>>>
> > >>>>Not for HVM guests (PV guests will only reach this point after
> > >>>>target cpu has been marked as down by the hypervisor).
> > >>>>
> > >>>>We need at least to have a message similar to what native_cpu_die()
> > >>>>prints on cpu_wait_death() failure. And I think we should not call
> > >>>>the two routines below (three, actually --- there is also
> > >>>>xen_teardown_timer() below, which is not part of the diff).
> > >>>>
> > >>>>-boris
> > >>>>
> > >>>>
> > >>>>>  	xen_smp_intr_free(cpu);
> > >>>>>  	xen_uninit_lock_cpu(cpu);
> > >>>So something like this, then?
> > >>>
> > >>>	if (cpu_wait_death(cpu, 5)) {
> > >>>		xen_smp_intr_free(cpu);
> > >>>		xen_uninit_lock_cpu(cpu);
> > >>>		xen_teardown_timer(cpu);
> > >>>	}
> > >>	else
> > >>		pr_err("CPU %u didn't die...\n", cpu);
> > >>
> > >>
> > >>>Easy change for me to make if so!
> > >>>
> > >>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> > >>>would that check be?  And also if so, is it OK to online a PV guest's
> > >>>CPU that timed out during its previous offline?
> > >>
> > >>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> > >>since we are (indirectly) waiting for this in the loop at the
> > >>beginning of xen_cpu_die():
> > >>
> > >>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> > >>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> > >>smp_processor_id()' in xen_play_dead(). Which happens after
> > >>play_dead_common() has marked the cpu as CPU_DEAD.
> > >>
> > >>So no test is needed.
> > >OK, so I have the following patch on top of my previous patch, which
> > >I will merge if testing goes well.  So if a CPU times out going offline,
> > >the above three functions will not be called, the "didn't die" message
> > >will be printed, and any future attempt to online that CPU will fail.
> > >Is that the correct semantics?
> > 
> > Yes.
> > 
> > I am not sure whether not ever onlining the CPU is the best outcome
> > but then I don't think trying to online it again with all interrupts
> > and such still set up will work well. And it's an improvement over
> > what we have now anyway (with current code we may clean up things
> > for a non-dead cpu).
> 
> Another strategy is to key off of the return value of cpu_check_up_prepare().
> If it returns -EBUSY, then the outgoing CPU finished up after the
> surviving CPU timed out.  The CPU trying to bring the new CPU online
> could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
> and xen_teardown_timer() at that point.

And the code for this, in xen_cpu_up(), might look something like the
following:

	rc = cpu_check_up_prepare(cpu);
	if (rc && rc != -EBUSY)
		return rc;
	if (rc == EBUSY) {
		xen_smp_intr_free(cpu);
		xen_uninit_lock_cpu(cpu);
		xen_teardown_timer(cpu);
	}

The idea is that we detect when the CPU eventually took itself offline,
but only did so after the surviving CPU timed out.  (Of course, it
would probably be best to put those three statements into a small
function that is called from both places.)

I have no idea whether this approach would really work, especially given
your earlier statement that CPU_DEAD happens early on.  But in case it
is helpful or sparks some better idea.

							Thanx, Paul

> But I must defer to you on this sort of thing.
> 
> 							Thanx, Paul
> 
> > Thanks.
> > -boris
> > 
> > 
> > >
> > >							Thanx, Paul
> > >
> > >------------------------------------------------------------------------
> > >
> > >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > >index e2c7389c58c5..f2a06ff0614d 100644
> > >--- a/arch/x86/xen/smp.c
> > >+++ b/arch/x86/xen/smp.c
> > >@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> > >  		schedule_timeout(HZ/10);
> > >  	}
> > >-	(void)cpu_wait_death(cpu, 5);
> > >-	/* FIXME: Are the below calls really safe in case of timeout? */
> > >-
> > >-	xen_smp_intr_free(cpu);
> > >-	xen_uninit_lock_cpu(cpu);
> > >-	xen_teardown_timer(cpu);
> > >+	if (cpu_wait_death(cpu, 5)) {
> > >+		xen_smp_intr_free(cpu);
> > >+		xen_uninit_lock_cpu(cpu);
> > >+		xen_teardown_timer(cpu);
> > >+	} else {
> > >+		pr_err("CPU %u didn't die...\n", cpu);
> > >+	}
> > >  }
> > >  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> > >
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:43                 ` Paul E. McKenney
  2015-03-04 14:55                   ` Boris Ostrovsky
@ 2015-03-04 14:55                   ` Boris Ostrovsky
  2015-03-04 15:25                     ` Paul E. McKenney
                                       ` (3 more replies)
  1 sibling, 4 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-04 14:55 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On 03/04/2015 09:43 AM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
>> On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
>>> On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
>>>> On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
>>>>> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
>>>>>> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>>>>>>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>>>>>>   }
>>>>>>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>>>>>>   		schedule_timeout(HZ/10);
>>>>>>>>   	}
>>>>>>>> -	cpu_die_common(cpu);
>>>>>>>> +	(void)cpu_wait_death(cpu, 5);
>>>>>>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>>>>>> Not for HVM guests (PV guests will only reach this point after
>>>>>>> target cpu has been marked as down by the hypervisor).
>>>>>>>
>>>>>>> We need at least to have a message similar to what native_cpu_die()
>>>>>>> prints on cpu_wait_death() failure. And I think we should not call
>>>>>>> the two routines below (three, actually --- there is also
>>>>>>> xen_teardown_timer() below, which is not part of the diff).
>>>>>>>
>>>>>>> -boris
>>>>>>>
>>>>>>>
>>>>>>>>   	xen_smp_intr_free(cpu);
>>>>>>>>   	xen_uninit_lock_cpu(cpu);
>>>>>> So something like this, then?
>>>>>>
>>>>>> 	if (cpu_wait_death(cpu, 5)) {
>>>>>> 		xen_smp_intr_free(cpu);
>>>>>> 		xen_uninit_lock_cpu(cpu);
>>>>>> 		xen_teardown_timer(cpu);
>>>>>> 	}
>>>>> 	else
>>>>> 		pr_err("CPU %u didn't die...\n", cpu);
>>>>>
>>>>>
>>>>>> Easy change for me to make if so!
>>>>>>
>>>>>> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
>>>>>> would that check be?  And also if so, is it OK to online a PV guest's
>>>>>> CPU that timed out during its previous offline?
>>>>> I believe PV VCPUs will always be CPU_DEAD by the time we get here
>>>>> since we are (indirectly) waiting for this in the loop at the
>>>>> beginning of xen_cpu_die():
>>>>>
>>>>> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
>>>>> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
>>>>> smp_processor_id()' in xen_play_dead(). Which happens after
>>>>> play_dead_common() has marked the cpu as CPU_DEAD.
>>>>>
>>>>> So no test is needed.
>>>> OK, so I have the following patch on top of my previous patch, which
>>>> I will merge if testing goes well.  So if a CPU times out going offline,
>>>> the above three functions will not be called, the "didn't die" message
>>>> will be printed, and any future attempt to online that CPU will fail.
>>>> Is that the correct semantics?
>>> Yes.
>>>
>>> I am not sure whether not ever onlining the CPU is the best outcome
>>> but then I don't think trying to online it again with all interrupts
>>> and such still set up will work well. And it's an improvement over
>>> what we have now anyway (with current code we may clean up things
>>> for a non-dead cpu).
>> Another strategy is to key off of the return value of cpu_check_up_prepare().
>> If it returns -EBUSY, then the outgoing CPU finished up after the
>> surviving CPU timed out.  The CPU trying to bring the new CPU online
>> could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
>> and xen_teardown_timer() at that point.
> And the code for this, in xen_cpu_up(), might look something like the
> following:
>
> 	rc = cpu_check_up_prepare(cpu);
> 	if (rc && rc != -EBUSY)
> 		return rc;
> 	if (rc == EBUSY) {
> 		xen_smp_intr_free(cpu);
> 		xen_uninit_lock_cpu(cpu);
> 		xen_teardown_timer(cpu);
> 	}
>
> The idea is that we detect when the CPU eventually took itself offline,
> but only did so after the surviving CPU timed out.  (Of course, it
> would probably be best to put those three statements into a small
> function that is called from both places.)
>
> I have no idea whether this approach would really work, especially given
> your earlier statement that CPU_DEAD happens early on.  But in case it
> is helpful or sparks some better idea.

Let me test this, I think it may work.

In the meantime, it turned out that HVM guests are broken by this patch 
(with our without changes that we've been discussing), because HVM CPUs 
die with

static void xen_hvm_cpu_die(unsigned int cpu)
{
         xen_cpu_die(cpu);
         native_cpu_die(cpu);
}

Which means that cpu_wait_death() is called twice, and second call moves 
the CPU to CPU_BROKEN.

The simple solution is to stop calling native_cpu_die() above but I'd 
like to use common code in native_cpu_die(). I'll see if I can carve it 
out without too much damage to x86.

Thanks.
-boris


>
> 							Thanx, Paul
>
>> But I must defer to you on this sort of thing.
>>
>> 							Thanx, Paul
>>
>>> Thanks.
>>> -boris
>>>
>>>
>>>> 							Thanx, Paul
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>>>> index e2c7389c58c5..f2a06ff0614d 100644
>>>> --- a/arch/x86/xen/smp.c
>>>> +++ b/arch/x86/xen/smp.c
>>>> @@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
>>>>   		schedule_timeout(HZ/10);
>>>>   	}
>>>> -	(void)cpu_wait_death(cpu, 5);
>>>> -	/* FIXME: Are the below calls really safe in case of timeout? */
>>>> -
>>>> -	xen_smp_intr_free(cpu);
>>>> -	xen_uninit_lock_cpu(cpu);
>>>> -	xen_teardown_timer(cpu);
>>>> +	if (cpu_wait_death(cpu, 5)) {
>>>> +		xen_smp_intr_free(cpu);
>>>> +		xen_uninit_lock_cpu(cpu);
>>>> +		xen_teardown_timer(cpu);
>>>> +	} else {
>>>> +		pr_err("CPU %u didn't die...\n", cpu);
>>>> +	}
>>>>   }
>>>>   static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
>>>>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:43                 ` Paul E. McKenney
@ 2015-03-04 14:55                   ` Boris Ostrovsky
  2015-03-04 14:55                   ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-04 14:55 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On 03/04/2015 09:43 AM, Paul E. McKenney wrote:
> On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
>> On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
>>> On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
>>>> On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
>>>>> On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
>>>>>> On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
>>>>>>> On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
>>>>>>>>   }
>>>>>>>> @@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
>>>>>>>>   		schedule_timeout(HZ/10);
>>>>>>>>   	}
>>>>>>>> -	cpu_die_common(cpu);
>>>>>>>> +	(void)cpu_wait_death(cpu, 5);
>>>>>>>> +	/* FIXME: Are the below calls really safe in case of timeout? */
>>>>>>> Not for HVM guests (PV guests will only reach this point after
>>>>>>> target cpu has been marked as down by the hypervisor).
>>>>>>>
>>>>>>> We need at least to have a message similar to what native_cpu_die()
>>>>>>> prints on cpu_wait_death() failure. And I think we should not call
>>>>>>> the two routines below (three, actually --- there is also
>>>>>>> xen_teardown_timer() below, which is not part of the diff).
>>>>>>>
>>>>>>> -boris
>>>>>>>
>>>>>>>
>>>>>>>>   	xen_smp_intr_free(cpu);
>>>>>>>>   	xen_uninit_lock_cpu(cpu);
>>>>>> So something like this, then?
>>>>>>
>>>>>> 	if (cpu_wait_death(cpu, 5)) {
>>>>>> 		xen_smp_intr_free(cpu);
>>>>>> 		xen_uninit_lock_cpu(cpu);
>>>>>> 		xen_teardown_timer(cpu);
>>>>>> 	}
>>>>> 	else
>>>>> 		pr_err("CPU %u didn't die...\n", cpu);
>>>>>
>>>>>
>>>>>> Easy change for me to make if so!
>>>>>>
>>>>>> Or do I need some other check for HVM-vs.-PV guests, and, if so, what
>>>>>> would that check be?  And also if so, is it OK to online a PV guest's
>>>>>> CPU that timed out during its previous offline?
>>>>> I believe PV VCPUs will always be CPU_DEAD by the time we get here
>>>>> since we are (indirectly) waiting for this in the loop at the
>>>>> beginning of xen_cpu_die():
>>>>>
>>>>> 'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
>>>>> NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
>>>>> smp_processor_id()' in xen_play_dead(). Which happens after
>>>>> play_dead_common() has marked the cpu as CPU_DEAD.
>>>>>
>>>>> So no test is needed.
>>>> OK, so I have the following patch on top of my previous patch, which
>>>> I will merge if testing goes well.  So if a CPU times out going offline,
>>>> the above three functions will not be called, the "didn't die" message
>>>> will be printed, and any future attempt to online that CPU will fail.
>>>> Is that the correct semantics?
>>> Yes.
>>>
>>> I am not sure whether not ever onlining the CPU is the best outcome
>>> but then I don't think trying to online it again with all interrupts
>>> and such still set up will work well. And it's an improvement over
>>> what we have now anyway (with current code we may clean up things
>>> for a non-dead cpu).
>> Another strategy is to key off of the return value of cpu_check_up_prepare().
>> If it returns -EBUSY, then the outgoing CPU finished up after the
>> surviving CPU timed out.  The CPU trying to bring the new CPU online
>> could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
>> and xen_teardown_timer() at that point.
> And the code for this, in xen_cpu_up(), might look something like the
> following:
>
> 	rc = cpu_check_up_prepare(cpu);
> 	if (rc && rc != -EBUSY)
> 		return rc;
> 	if (rc == EBUSY) {
> 		xen_smp_intr_free(cpu);
> 		xen_uninit_lock_cpu(cpu);
> 		xen_teardown_timer(cpu);
> 	}
>
> The idea is that we detect when the CPU eventually took itself offline,
> but only did so after the surviving CPU timed out.  (Of course, it
> would probably be best to put those three statements into a small
> function that is called from both places.)
>
> I have no idea whether this approach would really work, especially given
> your earlier statement that CPU_DEAD happens early on.  But in case it
> is helpful or sparks some better idea.

Let me test this, I think it may work.

In the meantime, it turned out that HVM guests are broken by this patch 
(with our without changes that we've been discussing), because HVM CPUs 
die with

static void xen_hvm_cpu_die(unsigned int cpu)
{
         xen_cpu_die(cpu);
         native_cpu_die(cpu);
}

Which means that cpu_wait_death() is called twice, and second call moves 
the CPU to CPU_BROKEN.

The simple solution is to stop calling native_cpu_die() above but I'd 
like to use common code in native_cpu_die(). I'll see if I can carve it 
out without too much damage to x86.

Thanks.
-boris


>
> 							Thanx, Paul
>
>> But I must defer to you on this sort of thing.
>>
>> 							Thanx, Paul
>>
>>> Thanks.
>>> -boris
>>>
>>>
>>>> 							Thanx, Paul
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>>>> index e2c7389c58c5..f2a06ff0614d 100644
>>>> --- a/arch/x86/xen/smp.c
>>>> +++ b/arch/x86/xen/smp.c
>>>> @@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
>>>>   		schedule_timeout(HZ/10);
>>>>   	}
>>>> -	(void)cpu_wait_death(cpu, 5);
>>>> -	/* FIXME: Are the below calls really safe in case of timeout? */
>>>> -
>>>> -	xen_smp_intr_free(cpu);
>>>> -	xen_uninit_lock_cpu(cpu);
>>>> -	xen_teardown_timer(cpu);
>>>> +	if (cpu_wait_death(cpu, 5)) {
>>>> +		xen_smp_intr_free(cpu);
>>>> +		xen_uninit_lock_cpu(cpu);
>>>> +		xen_teardown_timer(cpu);
>>>> +	} else {
>>>> +		pr_err("CPU %u didn't die...\n", cpu);
>>>> +	}
>>>>   }
>>>>   static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
>>>>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 10/20] rcu: Provide diagnostic option to slow down grace-period initialization
  2015-03-04 10:54     ` Paul Bolle
@ 2015-03-04 14:59       ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-04 14:59 UTC (permalink / raw)
  To: Paul Bolle
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, Mar 04, 2015 at 11:54:37AM +0100, Paul Bolle wrote:
> All I spotted is a silly typo.
> 
> Paul E. McKenney schreef op di 03-03-2015 om 09:43 [-0800]:
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1257,6 +1257,30 @@ config RCU_TORTURE_TEST_RUNNABLE
> >  	  Say N here if you want the RCU torture tests to start only
> >  	  after being manually enabled via /proc.
> >  
> > +config RCU_TORTURE_TEST_SLOW_INIT
> > +	bool "Slow down RCU grace-period initialization to expose races"
> > +	depends on RCU_TORTURE_TEST
> > +	help
> > +	  This option makes grace-period initialization block for a
> > +	  few jiffies between initializing each pair of consecutive
> > +	  rcu_node structures.	This helps to expose races involving
> > +	  grace-period initialization, in other words, it makes your
> > +	  kernel less stable.  It can also greatly increase grace-period
> > +	  latency, especially on systems with large numbers of CPUs.
> > +	  This is useful when torture-testing RCU, but in almost no
> > +	  other circumstance.
> > +
> > +	  Say Y here if you want your system to crash and hang more often.
> 
> (Did you ever consider going into marketing?)

"Get your cold dead fish here!!!  And what is this 'sashimi' you speak of?"

But yes, I really do not want distros to enable this one.  Unless it
is some sort of Ridiculously Unstable Linux distro.  ;-)

> > +	  Say N if you want a sane system.
> > +
> > +config RCU_TORTURE_TEST_SLOW_INIT_DELAY
> > +	int "How must to slow down RCU grace-period initialization"
> 
> s/must/much/

Good catch, fixed!

							Thanx, Paul

> > +	range 0 5
> > +	default 0
> > +	help
> > +	  This option specifies the number of jiffies to wait between
> > +	  each rcu_node structure initialization.
> > +
> >  config RCU_CPU_STALL_TIMEOUT
> >  	int "RCU CPU stall timeout in seconds"
> >  	depends on RCU_STALL_COMMON
> 
> 
> Paul Bolle
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:55                   ` Boris Ostrovsky
@ 2015-03-04 15:25                     ` Paul E. McKenney
  2015-03-05 21:17                       ` Boris Ostrovsky
  2015-03-05 21:17                       ` Boris Ostrovsky
  2015-03-04 15:25                     ` Paul E. McKenney
                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-04 15:25 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:
> On 03/04/2015 09:43 AM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
> >>On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> >>>On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> >>>>On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> >>>>>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >>>>>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>>>>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>>>>>>  }
> >>>>>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>>>>>  		schedule_timeout(HZ/10);
> >>>>>>>>  	}
> >>>>>>>>-	cpu_die_common(cpu);
> >>>>>>>>+	(void)cpu_wait_death(cpu, 5);
> >>>>>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>>>>Not for HVM guests (PV guests will only reach this point after
> >>>>>>>target cpu has been marked as down by the hypervisor).
> >>>>>>>
> >>>>>>>We need at least to have a message similar to what native_cpu_die()
> >>>>>>>prints on cpu_wait_death() failure. And I think we should not call
> >>>>>>>the two routines below (three, actually --- there is also
> >>>>>>>xen_teardown_timer() below, which is not part of the diff).
> >>>>>>>
> >>>>>>>-boris
> >>>>>>>
> >>>>>>>
> >>>>>>>>  	xen_smp_intr_free(cpu);
> >>>>>>>>  	xen_uninit_lock_cpu(cpu);
> >>>>>>So something like this, then?
> >>>>>>
> >>>>>>	if (cpu_wait_death(cpu, 5)) {
> >>>>>>		xen_smp_intr_free(cpu);
> >>>>>>		xen_uninit_lock_cpu(cpu);
> >>>>>>		xen_teardown_timer(cpu);
> >>>>>>	}
> >>>>>	else
> >>>>>		pr_err("CPU %u didn't die...\n", cpu);
> >>>>>
> >>>>>
> >>>>>>Easy change for me to make if so!
> >>>>>>
> >>>>>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >>>>>>would that check be?  And also if so, is it OK to online a PV guest's
> >>>>>>CPU that timed out during its previous offline?
> >>>>>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> >>>>>since we are (indirectly) waiting for this in the loop at the
> >>>>>beginning of xen_cpu_die():
> >>>>>
> >>>>>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> >>>>>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> >>>>>smp_processor_id()' in xen_play_dead(). Which happens after
> >>>>>play_dead_common() has marked the cpu as CPU_DEAD.
> >>>>>
> >>>>>So no test is needed.
> >>>>OK, so I have the following patch on top of my previous patch, which
> >>>>I will merge if testing goes well.  So if a CPU times out going offline,
> >>>>the above three functions will not be called, the "didn't die" message
> >>>>will be printed, and any future attempt to online that CPU will fail.
> >>>>Is that the correct semantics?
> >>>Yes.
> >>>
> >>>I am not sure whether not ever onlining the CPU is the best outcome
> >>>but then I don't think trying to online it again with all interrupts
> >>>and such still set up will work well. And it's an improvement over
> >>>what we have now anyway (with current code we may clean up things
> >>>for a non-dead cpu).
> >>Another strategy is to key off of the return value of cpu_check_up_prepare().
> >>If it returns -EBUSY, then the outgoing CPU finished up after the
> >>surviving CPU timed out.  The CPU trying to bring the new CPU online
> >>could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
> >>and xen_teardown_timer() at that point.
> >And the code for this, in xen_cpu_up(), might look something like the
> >following:
> >
> >	rc = cpu_check_up_prepare(cpu);
> >	if (rc && rc != -EBUSY)
> >		return rc;
> >	if (rc == EBUSY) {
> >		xen_smp_intr_free(cpu);
> >		xen_uninit_lock_cpu(cpu);
> >		xen_teardown_timer(cpu);
> >	}
> >
> >The idea is that we detect when the CPU eventually took itself offline,
> >but only did so after the surviving CPU timed out.  (Of course, it
> >would probably be best to put those three statements into a small
> >function that is called from both places.)
> >
> >I have no idea whether this approach would really work, especially given
> >your earlier statement that CPU_DEAD happens early on.  But in case it
> >is helpful or sparks some better idea.
> 
> Let me test this, I think it may work.
> 
> In the meantime, it turned out that HVM guests are broken by this
> patch (with our without changes that we've been discussing), because
> HVM CPUs die with
> 
> static void xen_hvm_cpu_die(unsigned int cpu)
> {
>         xen_cpu_die(cpu);
>         native_cpu_die(cpu);
> }
> 
> Which means that cpu_wait_death() is called twice, and second call
> moves the CPU to CPU_BROKEN.

Yikes!  I did miss this one.  :-(

> The simple solution is to stop calling native_cpu_die() above but
> I'd like to use common code in native_cpu_die(). I'll see if I can
> carve it out without too much damage to x86.

Very good, thank you!  I look forward to seeing your patch.

							Thanx, Paul

> Thanks.
> -boris
> 
> 
> >
> >							Thanx, Paul
> >
> >>But I must defer to you on this sort of thing.
> >>
> >>							Thanx, Paul
> >>
> >>>Thanks.
> >>>-boris
> >>>
> >>>
> >>>>							Thanx, Paul
> >>>>
> >>>>------------------------------------------------------------------------
> >>>>
> >>>>diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >>>>index e2c7389c58c5..f2a06ff0614d 100644
> >>>>--- a/arch/x86/xen/smp.c
> >>>>+++ b/arch/x86/xen/smp.c
> >>>>@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>  		schedule_timeout(HZ/10);
> >>>>  	}
> >>>>-	(void)cpu_wait_death(cpu, 5);
> >>>>-	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>-
> >>>>-	xen_smp_intr_free(cpu);
> >>>>-	xen_uninit_lock_cpu(cpu);
> >>>>-	xen_teardown_timer(cpu);
> >>>>+	if (cpu_wait_death(cpu, 5)) {
> >>>>+		xen_smp_intr_free(cpu);
> >>>>+		xen_uninit_lock_cpu(cpu);
> >>>>+		xen_teardown_timer(cpu);
> >>>>+	} else {
> >>>>+		pr_err("CPU %u didn't die...\n", cpu);
> >>>>+	}
> >>>>  }
> >>>>  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> >>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:55                   ` Boris Ostrovsky
  2015-03-04 15:25                     ` Paul E. McKenney
@ 2015-03-04 15:25                     ` Paul E. McKenney
  2015-03-04 15:45                     ` David Vrabel
  2015-03-04 15:45                     ` David Vrabel
  3 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-04 15:25 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:
> On 03/04/2015 09:43 AM, Paul E. McKenney wrote:
> >On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
> >>On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:
> >>>On 03/03/2015 04:26 PM, Paul E. McKenney wrote:
> >>>>On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:
> >>>>>On 03/03/2015 02:42 PM, Paul E. McKenney wrote:
> >>>>>>On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:
> >>>>>>>On 03/03/2015 12:42 PM, Paul E. McKenney wrote:
> >>>>>>>>  }
> >>>>>>>>@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>>>>>  		schedule_timeout(HZ/10);
> >>>>>>>>  	}
> >>>>>>>>-	cpu_die_common(cpu);
> >>>>>>>>+	(void)cpu_wait_death(cpu, 5);
> >>>>>>>>+	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>>>>Not for HVM guests (PV guests will only reach this point after
> >>>>>>>target cpu has been marked as down by the hypervisor).
> >>>>>>>
> >>>>>>>We need at least to have a message similar to what native_cpu_die()
> >>>>>>>prints on cpu_wait_death() failure. And I think we should not call
> >>>>>>>the two routines below (three, actually --- there is also
> >>>>>>>xen_teardown_timer() below, which is not part of the diff).
> >>>>>>>
> >>>>>>>-boris
> >>>>>>>
> >>>>>>>
> >>>>>>>>  	xen_smp_intr_free(cpu);
> >>>>>>>>  	xen_uninit_lock_cpu(cpu);
> >>>>>>So something like this, then?
> >>>>>>
> >>>>>>	if (cpu_wait_death(cpu, 5)) {
> >>>>>>		xen_smp_intr_free(cpu);
> >>>>>>		xen_uninit_lock_cpu(cpu);
> >>>>>>		xen_teardown_timer(cpu);
> >>>>>>	}
> >>>>>	else
> >>>>>		pr_err("CPU %u didn't die...\n", cpu);
> >>>>>
> >>>>>
> >>>>>>Easy change for me to make if so!
> >>>>>>
> >>>>>>Or do I need some other check for HVM-vs.-PV guests, and, if so, what
> >>>>>>would that check be?  And also if so, is it OK to online a PV guest's
> >>>>>>CPU that timed out during its previous offline?
> >>>>>I believe PV VCPUs will always be CPU_DEAD by the time we get here
> >>>>>since we are (indirectly) waiting for this in the loop at the
> >>>>>beginning of xen_cpu_die():
> >>>>>
> >>>>>'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
> >>>>>NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
> >>>>>smp_processor_id()' in xen_play_dead(). Which happens after
> >>>>>play_dead_common() has marked the cpu as CPU_DEAD.
> >>>>>
> >>>>>So no test is needed.
> >>>>OK, so I have the following patch on top of my previous patch, which
> >>>>I will merge if testing goes well.  So if a CPU times out going offline,
> >>>>the above three functions will not be called, the "didn't die" message
> >>>>will be printed, and any future attempt to online that CPU will fail.
> >>>>Is that the correct semantics?
> >>>Yes.
> >>>
> >>>I am not sure whether not ever onlining the CPU is the best outcome
> >>>but then I don't think trying to online it again with all interrupts
> >>>and such still set up will work well. And it's an improvement over
> >>>what we have now anyway (with current code we may clean up things
> >>>for a non-dead cpu).
> >>Another strategy is to key off of the return value of cpu_check_up_prepare().
> >>If it returns -EBUSY, then the outgoing CPU finished up after the
> >>surviving CPU timed out.  The CPU trying to bring the new CPU online
> >>could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
> >>and xen_teardown_timer() at that point.
> >And the code for this, in xen_cpu_up(), might look something like the
> >following:
> >
> >	rc = cpu_check_up_prepare(cpu);
> >	if (rc && rc != -EBUSY)
> >		return rc;
> >	if (rc == EBUSY) {
> >		xen_smp_intr_free(cpu);
> >		xen_uninit_lock_cpu(cpu);
> >		xen_teardown_timer(cpu);
> >	}
> >
> >The idea is that we detect when the CPU eventually took itself offline,
> >but only did so after the surviving CPU timed out.  (Of course, it
> >would probably be best to put those three statements into a small
> >function that is called from both places.)
> >
> >I have no idea whether this approach would really work, especially given
> >your earlier statement that CPU_DEAD happens early on.  But in case it
> >is helpful or sparks some better idea.
> 
> Let me test this, I think it may work.
> 
> In the meantime, it turned out that HVM guests are broken by this
> patch (with our without changes that we've been discussing), because
> HVM CPUs die with
> 
> static void xen_hvm_cpu_die(unsigned int cpu)
> {
>         xen_cpu_die(cpu);
>         native_cpu_die(cpu);
> }
> 
> Which means that cpu_wait_death() is called twice, and second call
> moves the CPU to CPU_BROKEN.

Yikes!  I did miss this one.  :-(

> The simple solution is to stop calling native_cpu_die() above but
> I'd like to use common code in native_cpu_die(). I'll see if I can
> carve it out without too much damage to x86.

Very good, thank you!  I look forward to seeing your patch.

							Thanx, Paul

> Thanks.
> -boris
> 
> 
> >
> >							Thanx, Paul
> >
> >>But I must defer to you on this sort of thing.
> >>
> >>							Thanx, Paul
> >>
> >>>Thanks.
> >>>-boris
> >>>
> >>>
> >>>>							Thanx, Paul
> >>>>
> >>>>------------------------------------------------------------------------
> >>>>
> >>>>diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >>>>index e2c7389c58c5..f2a06ff0614d 100644
> >>>>--- a/arch/x86/xen/smp.c
> >>>>+++ b/arch/x86/xen/smp.c
> >>>>@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
> >>>>  		schedule_timeout(HZ/10);
> >>>>  	}
> >>>>-	(void)cpu_wait_death(cpu, 5);
> >>>>-	/* FIXME: Are the below calls really safe in case of timeout? */
> >>>>-
> >>>>-	xen_smp_intr_free(cpu);
> >>>>-	xen_uninit_lock_cpu(cpu);
> >>>>-	xen_teardown_timer(cpu);
> >>>>+	if (cpu_wait_death(cpu, 5)) {
> >>>>+		xen_smp_intr_free(cpu);
> >>>>+		xen_uninit_lock_cpu(cpu);
> >>>>+		xen_teardown_timer(cpu);
> >>>>+	} else {
> >>>>+		pr_err("CPU %u didn't die...\n", cpu);
> >>>>+	}
> >>>>  }
> >>>>  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> >>>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:55                   ` Boris Ostrovsky
                                       ` (2 preceding siblings ...)
  2015-03-04 15:45                     ` David Vrabel
@ 2015-03-04 15:45                     ` David Vrabel
  2015-03-04 16:10                       ` Boris Ostrovsky
  2015-03-04 16:10                       ` Boris Ostrovsky
  3 siblings, 2 replies; 59+ messages in thread
From: David Vrabel @ 2015-03-04 15:45 UTC (permalink / raw)
  To: Boris Ostrovsky, paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	xen-devel

On 04/03/15 14:55, Boris Ostrovsky wrote:
> 
> In the meantime, it turned out that HVM guests are broken by this patch
> (with our without changes that we've been discussing), because HVM CPUs
> die with
> 
> static void xen_hvm_cpu_die(unsigned int cpu)
> {
>         xen_cpu_die(cpu);
>         native_cpu_die(cpu);
> }
> 
> Which means that cpu_wait_death() is called twice, and second call moves
> the CPU to CPU_BROKEN.
> 
> The simple solution is to stop calling native_cpu_die() above but I'd
> like to use common code in native_cpu_die(). I'll see if I can carve it
> out without too much damage to x86.

If not really been following this thread but...

Would it be preferable to refactor xen_cpu_die() instead to factor out
its the cpu_wait_death() call?

David

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 14:55                   ` Boris Ostrovsky
  2015-03-04 15:25                     ` Paul E. McKenney
  2015-03-04 15:25                     ` Paul E. McKenney
@ 2015-03-04 15:45                     ` David Vrabel
  2015-03-04 15:45                     ` David Vrabel
  3 siblings, 0 replies; 59+ messages in thread
From: David Vrabel @ 2015-03-04 15:45 UTC (permalink / raw)
  To: Boris Ostrovsky, paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, dipankar, xen-devel, akpm, mingo

On 04/03/15 14:55, Boris Ostrovsky wrote:
> 
> In the meantime, it turned out that HVM guests are broken by this patch
> (with our without changes that we've been discussing), because HVM CPUs
> die with
> 
> static void xen_hvm_cpu_die(unsigned int cpu)
> {
>         xen_cpu_die(cpu);
>         native_cpu_die(cpu);
> }
> 
> Which means that cpu_wait_death() is called twice, and second call moves
> the CPU to CPU_BROKEN.
> 
> The simple solution is to stop calling native_cpu_die() above but I'd
> like to use common code in native_cpu_die(). I'll see if I can carve it
> out without too much damage to x86.

If not really been following this thread but...

Would it be preferable to refactor xen_cpu_die() instead to factor out
its the cpu_wait_death() call?

David

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 15:45                     ` David Vrabel
  2015-03-04 16:10                       ` Boris Ostrovsky
@ 2015-03-04 16:10                       ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-04 16:10 UTC (permalink / raw)
  To: David Vrabel, paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	xen-devel

On 03/04/2015 10:45 AM, David Vrabel wrote:
> On 04/03/15 14:55, Boris Ostrovsky wrote:
>> In the meantime, it turned out that HVM guests are broken by this patch
>> (with our without changes that we've been discussing), because HVM CPUs
>> die with
>>
>> static void xen_hvm_cpu_die(unsigned int cpu)
>> {
>>          xen_cpu_die(cpu);
>>          native_cpu_die(cpu);
>> }
>>
>> Which means that cpu_wait_death() is called twice, and second call moves
>> the CPU to CPU_BROKEN.
>>
>> The simple solution is to stop calling native_cpu_die() above but I'd
>> like to use common code in native_cpu_die(). I'll see if I can carve it
>> out without too much damage to x86.
> If not really been following this thread but...
>
> Would it be preferable to refactor xen_cpu_die() instead to factor out
> its the cpu_wait_death() call?

That's essentially what I was going to do. Except that native_cpu_die() 
returns void so I'll need some common non-void code that lives in x86.

And then we can drop xen_hvm_cpu_die() and use xen_cpu_die() for all guests.

-boris


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 15:45                     ` David Vrabel
@ 2015-03-04 16:10                       ` Boris Ostrovsky
  2015-03-04 16:10                       ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-04 16:10 UTC (permalink / raw)
  To: David Vrabel, paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, dipankar, xen-devel, akpm, mingo

On 03/04/2015 10:45 AM, David Vrabel wrote:
> On 04/03/15 14:55, Boris Ostrovsky wrote:
>> In the meantime, it turned out that HVM guests are broken by this patch
>> (with our without changes that we've been discussing), because HVM CPUs
>> die with
>>
>> static void xen_hvm_cpu_die(unsigned int cpu)
>> {
>>          xen_cpu_die(cpu);
>>          native_cpu_die(cpu);
>> }
>>
>> Which means that cpu_wait_death() is called twice, and second call moves
>> the CPU to CPU_BROKEN.
>>
>> The simple solution is to stop calling native_cpu_die() above but I'd
>> like to use common code in native_cpu_die(). I'll see if I can carve it
>> out without too much damage to x86.
> If not really been following this thread but...
>
> Would it be preferable to refactor xen_cpu_die() instead to factor out
> its the cpu_wait_death() call?

That's essentially what I was going to do. Except that native_cpu_die() 
returns void so I'll need some common non-void code that lives in x86.

And then we can drop xen_hvm_cpu_die() and use xen_cpu_die() for all guests.

-boris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 15:25                     ` Paul E. McKenney
  2015-03-05 21:17                       ` Boris Ostrovsky
@ 2015-03-05 21:17                       ` Boris Ostrovsky
  2015-03-05 22:00                         ` Paul E. McKenney
  2015-03-05 22:00                         ` Paul E. McKenney
  1 sibling, 2 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-05 21:17 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On 03/04/2015 10:25 AM, Paul E. McKenney wrote:
> On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:

>> The simple solution is to stop calling native_cpu_die() above but
>> I'd like to use common code in native_cpu_die(). I'll see if I can
>> carve it out without too much damage to x86.
>
> Very good, thank you!  I look forward to seeing your patch.



How about something like this, on top of your original 02/20 patch (this 
is copy-paste but hopefully it can be applied):


  arch/x86/include/asm/smp.h |    1 +
  arch/x86/kernel/smpboot.c  |   12 +++++++++++-
  arch/x86/xen/smp.c         |   34 ++++++++++++++++++++--------------
  3 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 8cd27e0..a5cb4f6 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -155,6 +155,7 @@ void native_smp_prepare_cpus(unsigned int max_cpus);
  void native_smp_cpus_done(unsigned int max_cpus);
  int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
  int native_cpu_disable(void);
+int common_cpu_die(unsigned int cpu);
  void native_cpu_die(unsigned int cpu);
  void native_play_dead(void);
  void play_dead_common(void);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ff24fbd..c8fa349 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1345,8 +1345,10 @@ int native_cpu_disable(void)
  	return 0;
  }

-void native_cpu_die(unsigned int cpu)
+int common_cpu_die(unsigned int cpu)
  {
+	int ret = 0;
+
  	/* We don't do anything here: idle task is faking death itself. */

  	/* They ack this in play_dead() by setting CPU_DEAD */
@@ -1355,7 +1357,15 @@ void native_cpu_die(unsigned int cpu)
  			pr_info("CPU %u is now offline\n", cpu);
  	} else {
  		pr_err("CPU %u didn't die...\n", cpu);
+		ret = -1;
  	}
+
+	return ret;
+}
+
+void native_cpu_die(unsigned int cpu)
+{
+	common_cpu_die(cpu);
  }

  void play_dead_common(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index e2c7389..1c5e760 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -455,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct 
task_struct *idle)
  	xen_setup_timer(cpu);
  	xen_init_lock_cpu(cpu);

-	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
+	/*
+	 * PV VCPUs are always successfully taken down (see 'while' loop
+	 * in xen_cpu_die()), so -EBUSY is an error.
+	 */
  	rc = cpu_check_up_prepare(cpu);
  	if (rc)
  		return rc;
@@ -508,12 +511,11 @@ static void xen_cpu_die(unsigned int cpu)
  		schedule_timeout(HZ/10);
  	}

-	(void)cpu_wait_death(cpu, 5);
-	/* FIXME: Are the below calls really safe in case of timeout? */
-
-	xen_smp_intr_free(cpu);
-	xen_uninit_lock_cpu(cpu);
-	xen_teardown_timer(cpu);
+	if (common_cpu_die(cpu) == 0) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+		xen_teardown_timer(cpu);
+	}
  }

  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
@@ -745,6 +747,16 @@ static void __init 
xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
  static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
  {
  	int rc;
+
+	/*
+	 * This can happen if CPU was offlined earlier and
+	 * offlining timed out in common_cpu_die().
+	 */
+	if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+	}
+
  	/*
  	 * xen_smp_intr_init() needs to run before native_cpu_up()
  	 * so that IPI vectors are set up on the booting CPU before
@@ -766,12 +778,6 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct 
task_struct *tidle)
  	return rc;
  }

-static void xen_hvm_cpu_die(unsigned int cpu)
-{
-	xen_cpu_die(cpu);
-	native_cpu_die(cpu);
-}
-
  void __init xen_hvm_smp_init(void)
  {
  	if (!xen_have_vector_callback)
@@ -779,7 +785,7 @@ void __init xen_hvm_smp_init(void)
  	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
  	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
  	smp_ops.cpu_up = xen_hvm_cpu_up;
-	smp_ops.cpu_die = xen_hvm_cpu_die;
+	smp_ops.cpu_die = xen_cpu_die;
  	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
  	smp_ops.send_call_func_single_ipi = 
xen_smp_send_call_function_single_ipi;
  	smp_ops.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu;
-- 
1.7.7.6



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-04 15:25                     ` Paul E. McKenney
@ 2015-03-05 21:17                       ` Boris Ostrovsky
  2015-03-05 21:17                       ` Boris Ostrovsky
  1 sibling, 0 replies; 59+ messages in thread
From: Boris Ostrovsky @ 2015-03-05 21:17 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On 03/04/2015 10:25 AM, Paul E. McKenney wrote:
> On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:

>> The simple solution is to stop calling native_cpu_die() above but
>> I'd like to use common code in native_cpu_die(). I'll see if I can
>> carve it out without too much damage to x86.
>
> Very good, thank you!  I look forward to seeing your patch.



How about something like this, on top of your original 02/20 patch (this 
is copy-paste but hopefully it can be applied):


  arch/x86/include/asm/smp.h |    1 +
  arch/x86/kernel/smpboot.c  |   12 +++++++++++-
  arch/x86/xen/smp.c         |   34 ++++++++++++++++++++--------------
  3 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 8cd27e0..a5cb4f6 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -155,6 +155,7 @@ void native_smp_prepare_cpus(unsigned int max_cpus);
  void native_smp_cpus_done(unsigned int max_cpus);
  int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
  int native_cpu_disable(void);
+int common_cpu_die(unsigned int cpu);
  void native_cpu_die(unsigned int cpu);
  void native_play_dead(void);
  void play_dead_common(void);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ff24fbd..c8fa349 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1345,8 +1345,10 @@ int native_cpu_disable(void)
  	return 0;
  }

-void native_cpu_die(unsigned int cpu)
+int common_cpu_die(unsigned int cpu)
  {
+	int ret = 0;
+
  	/* We don't do anything here: idle task is faking death itself. */

  	/* They ack this in play_dead() by setting CPU_DEAD */
@@ -1355,7 +1357,15 @@ void native_cpu_die(unsigned int cpu)
  			pr_info("CPU %u is now offline\n", cpu);
  	} else {
  		pr_err("CPU %u didn't die...\n", cpu);
+		ret = -1;
  	}
+
+	return ret;
+}
+
+void native_cpu_die(unsigned int cpu)
+{
+	common_cpu_die(cpu);
  }

  void play_dead_common(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index e2c7389..1c5e760 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -455,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct 
task_struct *idle)
  	xen_setup_timer(cpu);
  	xen_init_lock_cpu(cpu);

-	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
+	/*
+	 * PV VCPUs are always successfully taken down (see 'while' loop
+	 * in xen_cpu_die()), so -EBUSY is an error.
+	 */
  	rc = cpu_check_up_prepare(cpu);
  	if (rc)
  		return rc;
@@ -508,12 +511,11 @@ static void xen_cpu_die(unsigned int cpu)
  		schedule_timeout(HZ/10);
  	}

-	(void)cpu_wait_death(cpu, 5);
-	/* FIXME: Are the below calls really safe in case of timeout? */
-
-	xen_smp_intr_free(cpu);
-	xen_uninit_lock_cpu(cpu);
-	xen_teardown_timer(cpu);
+	if (common_cpu_die(cpu) == 0) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+		xen_teardown_timer(cpu);
+	}
  }

  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
@@ -745,6 +747,16 @@ static void __init 
xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
  static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
  {
  	int rc;
+
+	/*
+	 * This can happen if CPU was offlined earlier and
+	 * offlining timed out in common_cpu_die().
+	 */
+	if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
+		xen_smp_intr_free(cpu);
+		xen_uninit_lock_cpu(cpu);
+	}
+
  	/*
  	 * xen_smp_intr_init() needs to run before native_cpu_up()
  	 * so that IPI vectors are set up on the booting CPU before
@@ -766,12 +778,6 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct 
task_struct *tidle)
  	return rc;
  }

-static void xen_hvm_cpu_die(unsigned int cpu)
-{
-	xen_cpu_die(cpu);
-	native_cpu_die(cpu);
-}
-
  void __init xen_hvm_smp_init(void)
  {
  	if (!xen_have_vector_callback)
@@ -779,7 +785,7 @@ void __init xen_hvm_smp_init(void)
  	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
  	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
  	smp_ops.cpu_up = xen_hvm_cpu_up;
-	smp_ops.cpu_die = xen_hvm_cpu_die;
+	smp_ops.cpu_die = xen_cpu_die;
  	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
  	smp_ops.send_call_func_single_ipi = 
xen_smp_send_call_function_single_ipi;
  	smp_ops.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-05 21:17                       ` Boris Ostrovsky
  2015-03-05 22:00                         ` Paul E. McKenney
@ 2015-03-05 22:00                         ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-05 22:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, x86, Konrad Rzeszutek Wilk,
	David Vrabel, xen-devel

On Thu, Mar 05, 2015 at 04:17:59PM -0500, Boris Ostrovsky wrote:
> On 03/04/2015 10:25 AM, Paul E. McKenney wrote:
> >On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:
> 
> >>The simple solution is to stop calling native_cpu_die() above but
> >>I'd like to use common code in native_cpu_die(). I'll see if I can
> >>carve it out without too much damage to x86.
> >
> >Very good, thank you!  I look forward to seeing your patch.
> 
> How about something like this, on top of your original 02/20 patch
> (this is copy-paste but hopefully it can be applied):

Unfortunately, no joy.  :-(

I was able to fix a couple of whitespace problems, but it still didn't
like this hunk: "@@ -766,12 +778,6 @@".

Could you please send the patch as an attachment, post it on the web
somewhere, or otherwise get me a clean copy it it?

							Thanx, Paul

>  arch/x86/include/asm/smp.h |    1 +
>  arch/x86/kernel/smpboot.c  |   12 +++++++++++-
>  arch/x86/xen/smp.c         |   34 ++++++++++++++++++++--------------
>  3 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 8cd27e0..a5cb4f6 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -155,6 +155,7 @@ void native_smp_prepare_cpus(unsigned int max_cpus);
>  void native_smp_cpus_done(unsigned int max_cpus);
>  int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
>  int native_cpu_disable(void);
> +int common_cpu_die(unsigned int cpu);
>  void native_cpu_die(unsigned int cpu);
>  void native_play_dead(void);
>  void play_dead_common(void);
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ff24fbd..c8fa349 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1345,8 +1345,10 @@ int native_cpu_disable(void)
>  	return 0;
>  }
> 
> -void native_cpu_die(unsigned int cpu)
> +int common_cpu_die(unsigned int cpu)
>  {
> +	int ret = 0;
> +
>  	/* We don't do anything here: idle task is faking death itself. */
> 
>  	/* They ack this in play_dead() by setting CPU_DEAD */
> @@ -1355,7 +1357,15 @@ void native_cpu_die(unsigned int cpu)
>  			pr_info("CPU %u is now offline\n", cpu);
>  	} else {
>  		pr_err("CPU %u didn't die...\n", cpu);
> +		ret = -1;
>  	}
> +
> +	return ret;
> +}
> +
> +void native_cpu_die(unsigned int cpu)
> +{
> +	common_cpu_die(cpu);
>  }
> 
>  void play_dead_common(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index e2c7389..1c5e760 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -455,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct
> task_struct *idle)
>  	xen_setup_timer(cpu);
>  	xen_init_lock_cpu(cpu);
> 
> -	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
> +	/*
> +	 * PV VCPUs are always successfully taken down (see 'while' loop
> +	 * in xen_cpu_die()), so -EBUSY is an error.
> +	 */
>  	rc = cpu_check_up_prepare(cpu);
>  	if (rc)
>  		return rc;
> @@ -508,12 +511,11 @@ static void xen_cpu_die(unsigned int cpu)
>  		schedule_timeout(HZ/10);
>  	}
> 
> -	(void)cpu_wait_death(cpu, 5);
> -	/* FIXME: Are the below calls really safe in case of timeout? */
> -
> -	xen_smp_intr_free(cpu);
> -	xen_uninit_lock_cpu(cpu);
> -	xen_teardown_timer(cpu);
> +	if (common_cpu_die(cpu) == 0) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +		xen_teardown_timer(cpu);
> +	}
>  }
> 
>  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> @@ -745,6 +747,16 @@ static void __init
> xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
>  static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  {
>  	int rc;
> +
> +	/*
> +	 * This can happen if CPU was offlined earlier and
> +	 * offlining timed out in common_cpu_die().
> +	 */
> +	if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +	}
> +
>  	/*
>  	 * xen_smp_intr_init() needs to run before native_cpu_up()
>  	 * so that IPI vectors are set up on the booting CPU before
> @@ -766,12 +778,6 @@ static int xen_hvm_cpu_up(unsigned int cpu,
> struct task_struct *tidle)
>  	return rc;
>  }
> 
> -static void xen_hvm_cpu_die(unsigned int cpu)
> -{
> -	xen_cpu_die(cpu);
> -	native_cpu_die(cpu);
> -}
> -
>  void __init xen_hvm_smp_init(void)
>  {
>  	if (!xen_have_vector_callback)
> @@ -779,7 +785,7 @@ void __init xen_hvm_smp_init(void)
>  	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
>  	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
>  	smp_ops.cpu_up = xen_hvm_cpu_up;
> -	smp_ops.cpu_die = xen_hvm_cpu_die;
> +	smp_ops.cpu_die = xen_cpu_die;
>  	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
>  	smp_ops.send_call_func_single_ipi =
> xen_smp_send_call_function_single_ipi;
>  	smp_ops.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu;
> -- 
> 1.7.7.6
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code
  2015-03-05 21:17                       ` Boris Ostrovsky
@ 2015-03-05 22:00                         ` Paul E. McKenney
  2015-03-05 22:00                         ` Paul E. McKenney
  1 sibling, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-05 22:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tglx, laijs, bobby.prani, peterz, fweisbec, dvhart, x86, oleg,
	linux-kernel, rostedt, josh, dhowells, edumazet,
	mathieu.desnoyers, David Vrabel, dipankar, xen-devel, akpm,
	mingo

On Thu, Mar 05, 2015 at 04:17:59PM -0500, Boris Ostrovsky wrote:
> On 03/04/2015 10:25 AM, Paul E. McKenney wrote:
> >On Wed, Mar 04, 2015 at 09:55:11AM -0500, Boris Ostrovsky wrote:
> 
> >>The simple solution is to stop calling native_cpu_die() above but
> >>I'd like to use common code in native_cpu_die(). I'll see if I can
> >>carve it out without too much damage to x86.
> >
> >Very good, thank you!  I look forward to seeing your patch.
> 
> How about something like this, on top of your original 02/20 patch
> (this is copy-paste but hopefully it can be applied):

Unfortunately, no joy.  :-(

I was able to fix a couple of whitespace problems, but it still didn't
like this hunk: "@@ -766,12 +778,6 @@".

Could you please send the patch as an attachment, post it on the web
somewhere, or otherwise get me a clean copy it it?

							Thanx, Paul

>  arch/x86/include/asm/smp.h |    1 +
>  arch/x86/kernel/smpboot.c  |   12 +++++++++++-
>  arch/x86/xen/smp.c         |   34 ++++++++++++++++++++--------------
>  3 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 8cd27e0..a5cb4f6 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -155,6 +155,7 @@ void native_smp_prepare_cpus(unsigned int max_cpus);
>  void native_smp_cpus_done(unsigned int max_cpus);
>  int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
>  int native_cpu_disable(void);
> +int common_cpu_die(unsigned int cpu);
>  void native_cpu_die(unsigned int cpu);
>  void native_play_dead(void);
>  void play_dead_common(void);
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ff24fbd..c8fa349 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1345,8 +1345,10 @@ int native_cpu_disable(void)
>  	return 0;
>  }
> 
> -void native_cpu_die(unsigned int cpu)
> +int common_cpu_die(unsigned int cpu)
>  {
> +	int ret = 0;
> +
>  	/* We don't do anything here: idle task is faking death itself. */
> 
>  	/* They ack this in play_dead() by setting CPU_DEAD */
> @@ -1355,7 +1357,15 @@ void native_cpu_die(unsigned int cpu)
>  			pr_info("CPU %u is now offline\n", cpu);
>  	} else {
>  		pr_err("CPU %u didn't die...\n", cpu);
> +		ret = -1;
>  	}
> +
> +	return ret;
> +}
> +
> +void native_cpu_die(unsigned int cpu)
> +{
> +	common_cpu_die(cpu);
>  }
> 
>  void play_dead_common(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index e2c7389..1c5e760 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -455,7 +455,10 @@ static int xen_cpu_up(unsigned int cpu, struct
> task_struct *idle)
>  	xen_setup_timer(cpu);
>  	xen_init_lock_cpu(cpu);
> 
> -	/* Xen outgoing CPUs need help cleaning up, so -EBUSY is an error. */
> +	/*
> +	 * PV VCPUs are always successfully taken down (see 'while' loop
> +	 * in xen_cpu_die()), so -EBUSY is an error.
> +	 */
>  	rc = cpu_check_up_prepare(cpu);
>  	if (rc)
>  		return rc;
> @@ -508,12 +511,11 @@ static void xen_cpu_die(unsigned int cpu)
>  		schedule_timeout(HZ/10);
>  	}
> 
> -	(void)cpu_wait_death(cpu, 5);
> -	/* FIXME: Are the below calls really safe in case of timeout? */
> -
> -	xen_smp_intr_free(cpu);
> -	xen_uninit_lock_cpu(cpu);
> -	xen_teardown_timer(cpu);
> +	if (common_cpu_die(cpu) == 0) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +		xen_teardown_timer(cpu);
> +	}
>  }
> 
>  static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
> @@ -745,6 +747,16 @@ static void __init
> xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
>  static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  {
>  	int rc;
> +
> +	/*
> +	 * This can happen if CPU was offlined earlier and
> +	 * offlining timed out in common_cpu_die().
> +	 */
> +	if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
> +		xen_smp_intr_free(cpu);
> +		xen_uninit_lock_cpu(cpu);
> +	}
> +
>  	/*
>  	 * xen_smp_intr_init() needs to run before native_cpu_up()
>  	 * so that IPI vectors are set up on the booting CPU before
> @@ -766,12 +778,6 @@ static int xen_hvm_cpu_up(unsigned int cpu,
> struct task_struct *tidle)
>  	return rc;
>  }
> 
> -static void xen_hvm_cpu_die(unsigned int cpu)
> -{
> -	xen_cpu_die(cpu);
> -	native_cpu_die(cpu);
> -}
> -
>  void __init xen_hvm_smp_init(void)
>  {
>  	if (!xen_have_vector_callback)
> @@ -779,7 +785,7 @@ void __init xen_hvm_smp_init(void)
>  	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
>  	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
>  	smp_ops.cpu_up = xen_hvm_cpu_up;
> -	smp_ops.cpu_die = xen_hvm_cpu_die;
> +	smp_ops.cpu_die = xen_cpu_die;
>  	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
>  	smp_ops.send_call_func_single_ipi =
> xen_smp_send_call_function_single_ipi;
>  	smp_ops.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu;
> -- 
> 1.7.7.6
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-10 15:30       ` James Hogan
  0 siblings, 0 replies; 59+ messages in thread
From: James Hogan @ 2015-03-10 15:30 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, linux-metag

[-- Attachment #1: Type: text/plain, Size: 2864 bytes --]

Hi Paul,

On 03/03/15 17:42, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> This commit removes the open-coded CPU-offline notification with new
> common code.  This change avoids calling scheduler code using RCU from
> an offline CPU that RCU is ignoring.  This commit is compatible with
> the existing code in not checking for timeout during a prior offline
> for a given CPU.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: James Hogan <james.hogan@imgtec.com>
> Cc: <linux-metag@vger.kernel.org>

I gave this a try via linux-next, but unfortunately it causes the
following warning every time a CPU goes down:
META213-Thread0 DSP [LogF] CPU1: unable to kill

If I add printks, I see that the state on entry to both cpu_wait_death
and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
changed from its initial value.

Should arches other than x86 now be calling cpu_set_state_online()? The
patchlet below seems to resolve it for Meta (not sure if that is the
best place in the startup sequence to do it, perhaps it doesn't matter).

diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
index ac3a199e33e7..430e379ec71f 100644
--- a/arch/metag/kernel/smp.c
+++ b/arch/metag/kernel/smp.c
@@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * OK, now it's safe to let the boot CPU continue
 	 */
 	set_cpu_online(cpu, true);
+	cpu_set_state_online(cpu);
 	complete(&cpu_running);
 
 	/*

Looking at the comment before cpu_set_state_online:
> /*
>  * Mark the specified CPU online.
>  *
>  * Note that it is permissible to omit this call entirely, as is
>  * done in architectures that do no CPU-hotplug error checking.
>  */

Which suggests it wasn't wrong to omit it before your patches came
along.

Cheers
James


> ---
>  arch/metag/kernel/smp.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> index f006d2276f40..ac3a199e33e7 100644
> --- a/arch/metag/kernel/smp.c
> +++ b/arch/metag/kernel/smp.c
> @@ -261,7 +261,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -static DECLARE_COMPLETION(cpu_killed);
>  
>  /*
>   * __cpu_disable runs on the processor to be shutdown.
> @@ -299,7 +298,7 @@ int __cpu_disable(void)
>   */
>  void __cpu_die(unsigned int cpu)
>  {
> -	if (!wait_for_completion_timeout(&cpu_killed, msecs_to_jiffies(1)))
> +	if (!cpu_wait_death(cpu, 1))
>  		pr_err("CPU%u: unable to kill\n", cpu);
>  }
>  
> @@ -314,7 +313,7 @@ void cpu_die(void)
>  	local_irq_disable();
>  	idle_task_exit();
>  
> -	complete(&cpu_killed);
> +	(void)cpu_report_death();
>  
>  	asm ("XOR	TXENABLE, D0Re0,D0Re0\n");
>  }
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-10 15:30       ` James Hogan
  0 siblings, 0 replies; 59+ messages in thread
From: James Hogan @ 2015-03-10 15:30 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, laijs-BthXqXjhjHXQFUHtdCDX3A,
	dipankar-xthvdsQ13ZrQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	josh-iaAMLnmF4UmaiuxdJuQwMA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
	dvhart-VuQAYsv1563Yd54FQh9/CA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, bobby.prani-Re5JQEeQqe8AvxtiuMwx3w,
	linux-metag-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2979 bytes --]

Hi Paul,

On 03/03/15 17:42, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This commit removes the open-coded CPU-offline notification with new
> common code.  This change avoids calling scheduler code using RCU from
> an offline CPU that RCU is ignoring.  This commit is compatible with
> the existing code in not checking for timeout during a prior offline
> for a given CPU.
> 
> Signed-off-by: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Cc: James Hogan <james.hogan-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
> Cc: <linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>

I gave this a try via linux-next, but unfortunately it causes the
following warning every time a CPU goes down:
META213-Thread0 DSP [LogF] CPU1: unable to kill

If I add printks, I see that the state on entry to both cpu_wait_death
and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
changed from its initial value.

Should arches other than x86 now be calling cpu_set_state_online()? The
patchlet below seems to resolve it for Meta (not sure if that is the
best place in the startup sequence to do it, perhaps it doesn't matter).

diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
index ac3a199e33e7..430e379ec71f 100644
--- a/arch/metag/kernel/smp.c
+++ b/arch/metag/kernel/smp.c
@@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * OK, now it's safe to let the boot CPU continue
 	 */
 	set_cpu_online(cpu, true);
+	cpu_set_state_online(cpu);
 	complete(&cpu_running);
 
 	/*

Looking at the comment before cpu_set_state_online:
> /*
>  * Mark the specified CPU online.
>  *
>  * Note that it is permissible to omit this call entirely, as is
>  * done in architectures that do no CPU-hotplug error checking.
>  */

Which suggests it wasn't wrong to omit it before your patches came
along.

Cheers
James


> ---
>  arch/metag/kernel/smp.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> index f006d2276f40..ac3a199e33e7 100644
> --- a/arch/metag/kernel/smp.c
> +++ b/arch/metag/kernel/smp.c
> @@ -261,7 +261,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -static DECLARE_COMPLETION(cpu_killed);
>  
>  /*
>   * __cpu_disable runs on the processor to be shutdown.
> @@ -299,7 +298,7 @@ int __cpu_disable(void)
>   */
>  void __cpu_die(unsigned int cpu)
>  {
> -	if (!wait_for_completion_timeout(&cpu_killed, msecs_to_jiffies(1)))
> +	if (!cpu_wait_death(cpu, 1))
>  		pr_err("CPU%u: unable to kill\n", cpu);
>  }
>  
> @@ -314,7 +313,7 @@ void cpu_die(void)
>  	local_irq_disable();
>  	idle_task_exit();
>  
> -	complete(&cpu_killed);
> +	(void)cpu_report_death();
>  
>  	asm ("XOR	TXENABLE, D0Re0,D0Re0\n");
>  }
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
  2015-03-10 15:30       ` James Hogan
  (?)
@ 2015-03-10 16:59       ` Paul E. McKenney
  2015-03-11 11:03           ` James Hogan
  -1 siblings, 1 reply; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-10 16:59 UTC (permalink / raw)
  To: James Hogan
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, linux-metag

On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
> Hi Paul,
> 
> On 03/03/15 17:42, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > This commit removes the open-coded CPU-offline notification with new
> > common code.  This change avoids calling scheduler code using RCU from
> > an offline CPU that RCU is ignoring.  This commit is compatible with
> > the existing code in not checking for timeout during a prior offline
> > for a given CPU.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: James Hogan <james.hogan@imgtec.com>
> > Cc: <linux-metag@vger.kernel.org>
> 
> I gave this a try via linux-next, but unfortunately it causes the
> following warning every time a CPU goes down:
> META213-Thread0 DSP [LogF] CPU1: unable to kill

That is certainly not what I had in mind, thank you for finding this!

> If I add printks, I see that the state on entry to both cpu_wait_death
> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
> changed from its initial value.
> 
> Should arches other than x86 now be calling cpu_set_state_online()? The
> patchlet below seems to resolve it for Meta (not sure if that is the
> best place in the startup sequence to do it, perhaps it doesn't matter).
> 
> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> index ac3a199e33e7..430e379ec71f 100644
> --- a/arch/metag/kernel/smp.c
> +++ b/arch/metag/kernel/smp.c
> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 * OK, now it's safe to let the boot CPU continue
>  	 */
>  	set_cpu_online(cpu, true);
> +	cpu_set_state_online(cpu);
>  	complete(&cpu_running);
>  
>  	/*
> 
> Looking at the comment before cpu_set_state_online:
> > /*
> >  * Mark the specified CPU online.
> >  *
> >  * Note that it is permissible to omit this call entirely, as is
> >  * done in architectures that do no CPU-hotplug error checking.
> >  */
> 
> Which suggests it wasn't wrong to omit it before your patches came
> along.

And that suggestion is quite correct.  The idea was indeed to accommodate
architectures that do not do error checking.

Does the following patch (on top of current -next) remove the need for
your addition of cpu_set_state_online() above?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 18688e0b0422..80400e019c86 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -460,7 +460,7 @@ bool cpu_report_death(void)
 
 	do {
 		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-		if (oldstate == CPU_ONLINE)
+		if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
 			newstate = CPU_DEAD;
 		else
 			newstate = CPU_DEAD_FROZEN;


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-11 11:03           ` James Hogan
  0 siblings, 0 replies; 59+ messages in thread
From: James Hogan @ 2015-03-11 11:03 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, linux-metag

[-- Attachment #1: Type: text/plain, Size: 4034 bytes --]

On 10/03/15 16:59, Paul E. McKenney wrote:
> On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
>> Hi Paul,
>>
>> On 03/03/15 17:42, Paul E. McKenney wrote:
>>> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>>>
>>> This commit removes the open-coded CPU-offline notification with new
>>> common code.  This change avoids calling scheduler code using RCU from
>>> an offline CPU that RCU is ignoring.  This commit is compatible with
>>> the existing code in not checking for timeout during a prior offline
>>> for a given CPU.
>>>
>>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Cc: James Hogan <james.hogan@imgtec.com>
>>> Cc: <linux-metag@vger.kernel.org>
>>
>> I gave this a try via linux-next, but unfortunately it causes the
>> following warning every time a CPU goes down:
>> META213-Thread0 DSP [LogF] CPU1: unable to kill
> 
> That is certainly not what I had in mind, thank you for finding this!
> 
>> If I add printks, I see that the state on entry to both cpu_wait_death
>> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
>> changed from its initial value.
>>
>> Should arches other than x86 now be calling cpu_set_state_online()? The
>> patchlet below seems to resolve it for Meta (not sure if that is the
>> best place in the startup sequence to do it, perhaps it doesn't matter).
>>
>> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
>> index ac3a199e33e7..430e379ec71f 100644
>> --- a/arch/metag/kernel/smp.c
>> +++ b/arch/metag/kernel/smp.c
>> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
>>  	 * OK, now it's safe to let the boot CPU continue
>>  	 */
>>  	set_cpu_online(cpu, true);
>> +	cpu_set_state_online(cpu);
>>  	complete(&cpu_running);
>>  
>>  	/*
>>
>> Looking at the comment before cpu_set_state_online:
>>> /*
>>>  * Mark the specified CPU online.
>>>  *
>>>  * Note that it is permissible to omit this call entirely, as is
>>>  * done in architectures that do no CPU-hotplug error checking.
>>>  */
>>
>> Which suggests it wasn't wrong to omit it before your patches came
>> along.
> 
> And that suggestion is quite correct.  The idea was indeed to accommodate
> architectures that do not do error checking.
> 
> Does the following patch (on top of current -next) remove the need for
> your addition of cpu_set_state_online() above?

Don't forget the "oldstate == ", otherwise it'll work for the wrong
reason :-/

Checking for CPU_POST_DEAD does seem to fix the immediate problem,
however this still leaves open the possibility of a single timeout
propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've
confirmed that by adding a delay loop only on the second
cpu_report_death() call, and sure enough the 2nd and further offlines
all fail even though the CPU stops immediately after the 2nd one.

If this check is primarily so that CPU_DEAD_FROZEN is set if
cpu_wait_death timed out, would it be better to instead check explicitly
for CPU_BROKEN?

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 18688e0b0422..c697f73d82d6 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -460,7 +460,7 @@ bool cpu_report_death(void)
 
 	do {
 		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-		if (oldstate == CPU_ONLINE)
+		if (oldstate != CPU_BROKEN)
 			newstate = CPU_DEAD;
 		else
 			newstate = CPU_DEAD_FROZEN;

Cheers
James

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 18688e0b0422..80400e019c86 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -460,7 +460,7 @@ bool cpu_report_death(void)
>  
>  	do {
>  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -		if (oldstate == CPU_ONLINE)
> +		if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
>  			newstate = CPU_DEAD;
>  		else
>  			newstate = CPU_DEAD_FROZEN;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-11 11:03           ` James Hogan
  0 siblings, 0 replies; 59+ messages in thread
From: James Hogan @ 2015-03-11 11:03 UTC (permalink / raw)
  To: paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, laijs-BthXqXjhjHXQFUHtdCDX3A,
	dipankar-xthvdsQ13ZrQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	josh-iaAMLnmF4UmaiuxdJuQwMA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
	dvhart-VuQAYsv1563Yd54FQh9/CA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, bobby.prani-Re5JQEeQqe8AvxtiuMwx3w,
	linux-metag-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 4149 bytes --]

On 10/03/15 16:59, Paul E. McKenney wrote:
> On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
>> Hi Paul,
>>
>> On 03/03/15 17:42, Paul E. McKenney wrote:
>>> From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>>
>>> This commit removes the open-coded CPU-offline notification with new
>>> common code.  This change avoids calling scheduler code using RCU from
>>> an offline CPU that RCU is ignoring.  This commit is compatible with
>>> the existing code in not checking for timeout during a prior offline
>>> for a given CPU.
>>>
>>> Signed-off-by: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>> Cc: James Hogan <james.hogan-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
>>> Cc: <linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>
>> I gave this a try via linux-next, but unfortunately it causes the
>> following warning every time a CPU goes down:
>> META213-Thread0 DSP [LogF] CPU1: unable to kill
> 
> That is certainly not what I had in mind, thank you for finding this!
> 
>> If I add printks, I see that the state on entry to both cpu_wait_death
>> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
>> changed from its initial value.
>>
>> Should arches other than x86 now be calling cpu_set_state_online()? The
>> patchlet below seems to resolve it for Meta (not sure if that is the
>> best place in the startup sequence to do it, perhaps it doesn't matter).
>>
>> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
>> index ac3a199e33e7..430e379ec71f 100644
>> --- a/arch/metag/kernel/smp.c
>> +++ b/arch/metag/kernel/smp.c
>> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
>>  	 * OK, now it's safe to let the boot CPU continue
>>  	 */
>>  	set_cpu_online(cpu, true);
>> +	cpu_set_state_online(cpu);
>>  	complete(&cpu_running);
>>  
>>  	/*
>>
>> Looking at the comment before cpu_set_state_online:
>>> /*
>>>  * Mark the specified CPU online.
>>>  *
>>>  * Note that it is permissible to omit this call entirely, as is
>>>  * done in architectures that do no CPU-hotplug error checking.
>>>  */
>>
>> Which suggests it wasn't wrong to omit it before your patches came
>> along.
> 
> And that suggestion is quite correct.  The idea was indeed to accommodate
> architectures that do not do error checking.
> 
> Does the following patch (on top of current -next) remove the need for
> your addition of cpu_set_state_online() above?

Don't forget the "oldstate == ", otherwise it'll work for the wrong
reason :-/

Checking for CPU_POST_DEAD does seem to fix the immediate problem,
however this still leaves open the possibility of a single timeout
propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've
confirmed that by adding a delay loop only on the second
cpu_report_death() call, and sure enough the 2nd and further offlines
all fail even though the CPU stops immediately after the 2nd one.

If this check is primarily so that CPU_DEAD_FROZEN is set if
cpu_wait_death timed out, would it be better to instead check explicitly
for CPU_BROKEN?

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 18688e0b0422..c697f73d82d6 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -460,7 +460,7 @@ bool cpu_report_death(void)
 
 	do {
 		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-		if (oldstate == CPU_ONLINE)
+		if (oldstate != CPU_BROKEN)
 			newstate = CPU_DEAD;
 		else
 			newstate = CPU_DEAD_FROZEN;

Cheers
James

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 18688e0b0422..80400e019c86 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -460,7 +460,7 @@ bool cpu_report_death(void)
>  
>  	do {
>  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -		if (oldstate == CPU_ONLINE)
> +		if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
>  			newstate = CPU_DEAD;
>  		else
>  			newstate = CPU_DEAD_FROZEN;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-11 18:58             ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-11 18:58 UTC (permalink / raw)
  To: James Hogan
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, linux-metag

On Wed, Mar 11, 2015 at 11:03:18AM +0000, James Hogan wrote:
> On 10/03/15 16:59, Paul E. McKenney wrote:
> > On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
> >> Hi Paul,
> >>
> >> On 03/03/15 17:42, Paul E. McKenney wrote:
> >>> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >>>
> >>> This commit removes the open-coded CPU-offline notification with new
> >>> common code.  This change avoids calling scheduler code using RCU from
> >>> an offline CPU that RCU is ignoring.  This commit is compatible with
> >>> the existing code in not checking for timeout during a prior offline
> >>> for a given CPU.
> >>>
> >>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>> Cc: James Hogan <james.hogan@imgtec.com>
> >>> Cc: <linux-metag@vger.kernel.org>
> >>
> >> I gave this a try via linux-next, but unfortunately it causes the
> >> following warning every time a CPU goes down:
> >> META213-Thread0 DSP [LogF] CPU1: unable to kill
> > 
> > That is certainly not what I had in mind, thank you for finding this!
> > 
> >> If I add printks, I see that the state on entry to both cpu_wait_death
> >> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
> >> changed from its initial value.
> >>
> >> Should arches other than x86 now be calling cpu_set_state_online()? The
> >> patchlet below seems to resolve it for Meta (not sure if that is the
> >> best place in the startup sequence to do it, perhaps it doesn't matter).
> >>
> >> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> >> index ac3a199e33e7..430e379ec71f 100644
> >> --- a/arch/metag/kernel/smp.c
> >> +++ b/arch/metag/kernel/smp.c
> >> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
> >>  	 * OK, now it's safe to let the boot CPU continue
> >>  	 */
> >>  	set_cpu_online(cpu, true);
> >> +	cpu_set_state_online(cpu);
> >>  	complete(&cpu_running);
> >>  
> >>  	/*
> >>
> >> Looking at the comment before cpu_set_state_online:
> >>> /*
> >>>  * Mark the specified CPU online.
> >>>  *
> >>>  * Note that it is permissible to omit this call entirely, as is
> >>>  * done in architectures that do no CPU-hotplug error checking.
> >>>  */
> >>
> >> Which suggests it wasn't wrong to omit it before your patches came
> >> along.
> > 
> > And that suggestion is quite correct.  The idea was indeed to accommodate
> > architectures that do not do error checking.
> > 
> > Does the following patch (on top of current -next) remove the need for
> > your addition of cpu_set_state_online() above?
> 
> Don't forget the "oldstate == ", otherwise it'll work for the wrong
> reason :-/

I clearly wasn't doing well yesterday, was I?  :-/

> Checking for CPU_POST_DEAD does seem to fix the immediate problem,
> however this still leaves open the possibility of a single timeout
> propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've
> confirmed that by adding a delay loop only on the second
> cpu_report_death() call, and sure enough the 2nd and further offlines
> all fail even though the CPU stops immediately after the 2nd one.
> 
> If this check is primarily so that CPU_DEAD_FROZEN is set if
> cpu_wait_death timed out, would it be better to instead check explicitly
> for CPU_BROKEN?
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 18688e0b0422..c697f73d82d6 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -460,7 +460,7 @@ bool cpu_report_death(void)
>  
>  	do {
>  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -		if (oldstate == CPU_ONLINE)
> +		if (oldstate != CPU_BROKEN)
>  			newstate = CPU_DEAD;
>  		else
>  			newstate = CPU_DEAD_FROZEN;

This does look much better!  I will incorporate this with attribution.

The idea is to support two use cases.  The first use case provides full
checking, and the second provides minimal checking.

Full checking is used by architectures that require that one of the
surviving CPUs so something to help the offlined CPU go offline,
Xen being one example.  In this case, the architecture invokes
cpu_check_up_prepare(), which returns an error code if the CPU did not
go offline properly.  The architecture can choose to return an error or
to provide the offlining help at that point.  The CPU being onlined then
calls cpu_set_state_online().  When the CPU goes offline, it invokes
cpu_report_death(), which can race with the timing out of one of the
surviving CPUs invoking cpu_wait_death().  If cpu_wait_death() times
out first, or if cpu_report_death() is never called, state is set so
that the next call to cpu_check_up_prepare() can react accordingly.

Minimal checking is what metag does.  The cpu_check_up_prepare() and
cpu_set_state_online() functions are never called, just cpu_report_death()
and cpu_wait_death().

And yes, this time I drew state diagrams.  Which I should have done in
the first place.

						Thanx, Paul

> Cheers
> James
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > index 18688e0b0422..80400e019c86 100644
> > --- a/kernel/smpboot.c
> > +++ b/kernel/smpboot.c
> > @@ -460,7 +460,7 @@ bool cpu_report_death(void)
> >  
> >  	do {
> >  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> > -		if (oldstate == CPU_ONLINE)
> > +		if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
> >  			newstate = CPU_DEAD;
> >  		else
> >  			newstate = CPU_DEAD_FROZEN;
> > 
> 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification code
@ 2015-03-11 18:58             ` Paul E. McKenney
  0 siblings, 0 replies; 59+ messages in thread
From: Paul E. McKenney @ 2015-03-11 18:58 UTC (permalink / raw)
  To: James Hogan
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, laijs-BthXqXjhjHXQFUHtdCDX3A,
	dipankar-xthvdsQ13ZrQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	josh-iaAMLnmF4UmaiuxdJuQwMA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
	dvhart-VuQAYsv1563Yd54FQh9/CA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, bobby.prani-Re5JQEeQqe8AvxtiuMwx3w,
	linux-metag-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 11, 2015 at 11:03:18AM +0000, James Hogan wrote:
> On 10/03/15 16:59, Paul E. McKenney wrote:
> > On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
> >> Hi Paul,
> >>
> >> On 03/03/15 17:42, Paul E. McKenney wrote:
> >>> From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >>>
> >>> This commit removes the open-coded CPU-offline notification with new
> >>> common code.  This change avoids calling scheduler code using RCU from
> >>> an offline CPU that RCU is ignoring.  This commit is compatible with
> >>> the existing code in not checking for timeout during a prior offline
> >>> for a given CPU.
> >>>
> >>> Signed-off-by: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >>> Cc: James Hogan <james.hogan-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
> >>> Cc: <linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> >>
> >> I gave this a try via linux-next, but unfortunately it causes the
> >> following warning every time a CPU goes down:
> >> META213-Thread0 DSP [LogF] CPU1: unable to kill
> > 
> > That is certainly not what I had in mind, thank you for finding this!
> > 
> >> If I add printks, I see that the state on entry to both cpu_wait_death
> >> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
> >> changed from its initial value.
> >>
> >> Should arches other than x86 now be calling cpu_set_state_online()? The
> >> patchlet below seems to resolve it for Meta (not sure if that is the
> >> best place in the startup sequence to do it, perhaps it doesn't matter).
> >>
> >> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> >> index ac3a199e33e7..430e379ec71f 100644
> >> --- a/arch/metag/kernel/smp.c
> >> +++ b/arch/metag/kernel/smp.c
> >> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
> >>  	 * OK, now it's safe to let the boot CPU continue
> >>  	 */
> >>  	set_cpu_online(cpu, true);
> >> +	cpu_set_state_online(cpu);
> >>  	complete(&cpu_running);
> >>  
> >>  	/*
> >>
> >> Looking at the comment before cpu_set_state_online:
> >>> /*
> >>>  * Mark the specified CPU online.
> >>>  *
> >>>  * Note that it is permissible to omit this call entirely, as is
> >>>  * done in architectures that do no CPU-hotplug error checking.
> >>>  */
> >>
> >> Which suggests it wasn't wrong to omit it before your patches came
> >> along.
> > 
> > And that suggestion is quite correct.  The idea was indeed to accommodate
> > architectures that do not do error checking.
> > 
> > Does the following patch (on top of current -next) remove the need for
> > your addition of cpu_set_state_online() above?
> 
> Don't forget the "oldstate == ", otherwise it'll work for the wrong
> reason :-/

I clearly wasn't doing well yesterday, was I?  :-/

> Checking for CPU_POST_DEAD does seem to fix the immediate problem,
> however this still leaves open the possibility of a single timeout
> propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've
> confirmed that by adding a delay loop only on the second
> cpu_report_death() call, and sure enough the 2nd and further offlines
> all fail even though the CPU stops immediately after the 2nd one.
> 
> If this check is primarily so that CPU_DEAD_FROZEN is set if
> cpu_wait_death timed out, would it be better to instead check explicitly
> for CPU_BROKEN?
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 18688e0b0422..c697f73d82d6 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -460,7 +460,7 @@ bool cpu_report_death(void)
>  
>  	do {
>  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -		if (oldstate == CPU_ONLINE)
> +		if (oldstate != CPU_BROKEN)
>  			newstate = CPU_DEAD;
>  		else
>  			newstate = CPU_DEAD_FROZEN;

This does look much better!  I will incorporate this with attribution.

The idea is to support two use cases.  The first use case provides full
checking, and the second provides minimal checking.

Full checking is used by architectures that require that one of the
surviving CPUs so something to help the offlined CPU go offline,
Xen being one example.  In this case, the architecture invokes
cpu_check_up_prepare(), which returns an error code if the CPU did not
go offline properly.  The architecture can choose to return an error or
to provide the offlining help at that point.  The CPU being onlined then
calls cpu_set_state_online().  When the CPU goes offline, it invokes
cpu_report_death(), which can race with the timing out of one of the
surviving CPUs invoking cpu_wait_death().  If cpu_wait_death() times
out first, or if cpu_report_death() is never called, state is set so
that the next call to cpu_check_up_prepare() can react accordingly.

Minimal checking is what metag does.  The cpu_check_up_prepare() and
cpu_set_state_online() functions are never called, just cpu_report_death()
and cpu_wait_death().

And yes, this time I drew state diagrams.  Which I should have done in
the first place.

						Thanx, Paul

> Cheers
> James
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > index 18688e0b0422..80400e019c86 100644
> > --- a/kernel/smpboot.c
> > +++ b/kernel/smpboot.c
> > @@ -460,7 +460,7 @@ bool cpu_report_death(void)
> >  
> >  	do {
> >  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> > -		if (oldstate == CPU_ONLINE)
> > +		if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
> >  			newstate = CPU_DEAD;
> >  		else
> >  			newstate = CPU_DEAD_FROZEN;
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-metag" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2015-03-11 18:58 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03 17:41 [PATCH tip/core/rcu 0/20] CPU hotplug updates for v4.1 Paul E. McKenney
2015-03-03 17:42 ` [PATCH tip/core/rcu 01/20] smpboot: Add common code for notification from dying CPU Paul E. McKenney
2015-03-03 17:42   ` Paul E. McKenney
2015-03-03 17:42   ` [PATCH tip/core/rcu 02/20] x86: Use common outgoing-CPU-notification code Paul E. McKenney
2015-03-03 19:17     ` Boris Ostrovsky
2015-03-03 19:42       ` Paul E. McKenney
2015-03-03 19:42       ` Paul E. McKenney
2015-03-03 20:13         ` Boris Ostrovsky
2015-03-03 20:13         ` Boris Ostrovsky
2015-03-03 21:26           ` Paul E. McKenney
2015-03-03 22:06             ` Boris Ostrovsky
2015-03-03 22:31               ` Paul E. McKenney
2015-03-03 22:31               ` Paul E. McKenney
2015-03-04 14:43                 ` Paul E. McKenney
2015-03-04 14:43                 ` Paul E. McKenney
2015-03-04 14:55                   ` Boris Ostrovsky
2015-03-04 14:55                   ` Boris Ostrovsky
2015-03-04 15:25                     ` Paul E. McKenney
2015-03-05 21:17                       ` Boris Ostrovsky
2015-03-05 21:17                       ` Boris Ostrovsky
2015-03-05 22:00                         ` Paul E. McKenney
2015-03-05 22:00                         ` Paul E. McKenney
2015-03-04 15:25                     ` Paul E. McKenney
2015-03-04 15:45                     ` David Vrabel
2015-03-04 15:45                     ` David Vrabel
2015-03-04 16:10                       ` Boris Ostrovsky
2015-03-04 16:10                       ` Boris Ostrovsky
2015-03-03 22:06             ` Boris Ostrovsky
2015-03-03 21:26           ` Paul E. McKenney
2015-03-03 19:17     ` Boris Ostrovsky
2015-03-03 17:42   ` Paul E. McKenney
2015-03-03 17:42   ` [PATCH tip/core/rcu 03/20] blackfin: " Paul E. McKenney
2015-03-03 17:42   ` [PATCH tip/core/rcu 04/20] metag: " Paul E. McKenney
2015-03-03 17:42     ` Paul E. McKenney
2015-03-10 15:30     ` James Hogan
2015-03-10 15:30       ` James Hogan
2015-03-10 16:59       ` Paul E. McKenney
2015-03-11 11:03         ` James Hogan
2015-03-11 11:03           ` James Hogan
2015-03-11 18:58           ` Paul E. McKenney
2015-03-11 18:58             ` Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 05/20] rcu: Consolidate offline-CPU callback initialization Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 06/20] rcu: Put all orphan-callback-related code under same comment Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 07/20] rcu: Simplify sync_rcu_preempt_exp_init() Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 08/20] rcu: Eliminate empty HOTPLUG_CPU ifdef Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 09/20] rcu: Detect stalls caused by failure to propagate up rcu_node tree Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 10/20] rcu: Provide diagnostic option to slow down grace-period initialization Paul E. McKenney
2015-03-04 10:54     ` Paul Bolle
2015-03-04 14:59       ` Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 11/20] rcutorture: Enable slow grace-period initializations Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 12/20] rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 13/20] rcu: Rework preemptible expedited bitmask handling Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 14/20] rcu: Move rcu_report_unblock_qs_rnp() to common code Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 15/20] rcu: Process offlining and onlining only at grace-period start Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 16/20] rcu: Eliminate ->onoff_mutex from rcu_node structure Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 17/20] cpu: Make CPU-offline idle-loop transition point more precise Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 18/20] rcu: Handle outgoing CPUs on exit from idle loop Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 19/20] rcutorture: Default to grace-period-initialization delays Paul E. McKenney
2015-03-03 17:43   ` [PATCH tip/core/rcu 20/20] rcu: Add diagnostics to grace-period cleanup Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.