[PATCH v2 tip/core/rcu 0/22] CPU hotplug updates for v4.1

* [PATCH v2 tip/core/rcu 0/22] CPU hotplug updates for v4.1
@ 2015-03-16 18:37 Paul E. McKenney
  2015-03-16 18:37   ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2015-03-16 18:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

Hello!

This series updates RCU's handling of CPU hotplug offline operations,
allowing RCU to have a precise notification of when it should start
ignoring a CPU.  This allowed detection of some illegal use of RCU by
offline CPUs, and this series contains fixes for these.  A similar
problem exists for CPU onlining, but will be addressed later.  One
CPU-hotplug dragon at a time.

1.	Add common code for notification from dying CPU.  This is
	part of the fix for issues uncovered by improved detection,
	but is placed first to avoid messing up bisection.

2-4.	Use #1 for x86, blackfin, and metag.  (ARM also has this problem,
	but ARM's maintainers are working on their own fix.)

5.	Remove duplicate offline-CPU callback-list initialization.
	This simplifies later changes to RCU's handling of offlining
	operations.

6.	Improve code readability in rcu_cleanup_dead_cpu().  Simple
	code motion, no semantic change.

7.	Eliminate a boolean variable and "if" statement by rearranging
	sync_rcu_preempt_exp_init()'s checks for CPUs not having blocked
	tasks.

8.	Eliminate empty CONFIG_HOTPLUG_CPU #ifdef.

9.	Add diagnostics to detect when RCU CPU stall warnings have been
	caused by failure to propagate quiescent states up the rcu_node
	combining tree.

10.	Provide CONFIG_RCU_TORTURE_TEST_SLOW_INIT Kconfig option to
	artificially slow down grace-period initialization, thus increasing
	the probability of detecting races with this initialization process.

11.	Update data files to enable CONFIG_RCU_TORTURE_TEST_SLOW_INIT
	by default during rcutorture testing, but leave the default
	time at zero.  This default may be overridden by passing
	"--bootargs rcutree.gp_init_delay=1" to kvm.sh.

12.	Remove event tracing from rcu_cpu_notify(), which is invoked
	by offline CPUs.  (Event tracing uses RCU.)

13.	Change meaning of ->expmask bitmasks to track blocked tasks
	rather than online CPUs.

14.	Move rcu_report_unblock_qs_rnp() to common code.  This will
	make it easier to provide proper locking protection.

15.	Avoid races between CPU-hotplug operations and RCU grace-period
	initialization by processing CPU-hotplug changes only at the
	start of each grace period.  This works because RCU need not
	wait on a CPU that came online after the start of the current
	grace period.

16.	Eliminate the no-longer-needed ->onoff_mutex from the rcu_node
	structure.  This is the only sleeplock acquired during RCU's
	CPU-hotplug processing, which in turn allows rcu_cpu_notify()
	to be invoked from the preemption-disabled idle loop.

17.	Use a per-CPU variable to make the CPU-offline idle-loop
	transition point precise.  (RCU's magic one-jiffy grace-period
	wait for offline CPUs must remain until the analogous online
	issue is addressed.)

18.	Invoke rcu_cpu_notify() with a new CPU_DYING_IDLE op just before
	the idle-loop invocation of arch_cpu_idle_dead().

19.	Now that CPU-hotplug events are applied only during grace-period
	initialization, it is safe to unconditionally enable slow
	grace-period initialization for rcutorture testing.  Note
	that this delay is applied randomly in order to get a good
	mix of fast and slow grace-period initialization.

20.	Add checks that all quiescent states were received at grace-period
	cleanup time.

21.	Add a check for the last task on a given RCU-node structure
	leaving its RCU read-side critical section between the time
	that hotplug information is propagated up the tree and the
	time that the grace period starts.

22.	Add checks for grace-period number to all propagations of
	quiescent states up the rcu_node combining tree.  These are
	required because a new grace period could start during this
	propagation due to the resolution of #21 above.  (Thanks
	to Sasha Levin for exposing this bug during the course of
	his testing.)

Changes since v1:

o	Fixed per-CPU state mechine to work correctly for architectures
	that online CPUs without needing to check whether or not previous
	offline operations completed correctly and on time, thanks to
	James Hogan.

o	Fixed Xen's interfacing to the common-code notifications, thanks
	to Boris Ostrovsky.

o	Added two fixes for handling of quiescent states and grace periods
	given the updated handling of CPU hotplug.

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/kernel-parameters.txt                     |    6 
 b/arch/blackfin/mach-common/smp.c                         |    6 
 b/arch/metag/kernel/smp.c                                 |    5 
 b/arch/x86/include/asm/cpu.h                              |    2 
 b/arch/x86/include/asm/smp.h                              |    2 
 b/arch/x86/kernel/smpboot.c                               |   39 -
 b/arch/x86/xen/smp.c                                      |   46 -
 b/include/linux/cpu.h                                     |   14 
 b/include/linux/rcupdate.h                                |    2 
 b/kernel/cpu.c                                            |    4 
 b/kernel/rcu/tree.c                                       |  356 ++++++++++----
 b/kernel/rcu/tree.h                                       |   11 
 b/kernel/rcu/tree_plugin.h                                |  169 +++---
 b/kernel/rcu/tree_trace.c                                 |    4 
 b/kernel/sched/idle.c                                     |    9 
 b/kernel/smpboot.c                                        |  156 ++++++
 b/lib/Kconfig.debug                                       |   26 -
 b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon |    1 
 18 files changed, 617 insertions(+), 241 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread