[PATCH tip/core/rcu 0/21] Contention reduction for v4.18

* [PATCH tip/core/rcu 0/21] Contention reduction for v4.18
@ 2018-04-23  3:02 Paul E. McKenney
  2018-04-23  3:03 ` [PATCH tip/core/rcu 01/21] rcu: Improve non-root rcu_cbs_completed() accuracy Paul E. McKenney
                   ` (21 more replies)
  0 siblings, 22 replies; 44+ messages in thread
From: Paul E. McKenney @ 2018-04-23  3:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	joel.opensrc, torvalds, npiggin

Hello!

This series reduces lock contention on the root rcu_node structure,
and is also the first precursor to TBD changes to consolidate the three
RCU flavors (RCU-bh, RCU-preempt, and RCU-sched) into one.

1.	Improve non-root rcu_cbs_completed() accuracy, thus reducing the
	need to acquire the root rcu_node structure's ->lock.  This also
	eliminates the need to reassign callbacks to an earlier grace
	period, which enables introduction of funnel locking in a later
	commit, which further reduces contention.

2.	Make rcu_start_future_gp()'s grace-period check more precise,
	eliminating one need for forward-progress failsafe checks
	that acquire the root rcu_node structure's ->lock.

3.	Create (and make use of) accessors for the ->need_future_gp[]
	array to enable easy changes in size.

4.	Make rcu_gp_kthread() check for early-boot activity, which was
	another situation needing failsafe checks.

5.	Make rcu_gp_cleanup() more accurately predict need for new GP.
	This eliminates the need for both failsafe checks and extra
	grace-period kthread wakeups.

6.	Avoid losing ->need_future_gp[] values due to GP start/end races
	by expanding this array from two elements to four.

7.	Make rcu_future_needs_gp() check all ->need_future_gps[] elements,
	again to eliminate a need for both failsafe checks and extra
	grace-period kthread wakeups.

8.	Convert ->need_future_gp[] array to boolean, given that there
	is no longer a need to count the number of requests for a
	future grace period.

9.	Make rcu_migrate_callbacks wake GP kthread when needed, which
	again eliminates a need for failsafe checks.

10.	Avoid __call_rcu_core() root rcu_node ->lock acquisition, which
	was one of the failsafe checks that many of the above patches
	were making safe to remove.

11.	Switch __rcu_process_callbacks() to rcu_accelerate_cbs(), which
	was one of the failsafe checks that many of the above patches
	were making safe to remove.  (Yes, this one also acquired the
	root rcu_node structure's ->lock, and was in fact the lock
	acquisition that was showing up in Nick Piggin's traces.)

12.	Put ->completed into an unsigned long instead of an int.  (The
	"int" was harmless because only the low-order bits were used,
	but it was still an accident waiting to happen.)

13.	Clear requests other than RCU_GP_FLAG_INIT at grace-period end.
	This prevents premature quiescent-state forcing that might
	otherwise occur due to requests posted when the grace period
	was already almost done.

14.	Inline rcu_start_gp_advanced() into rcu_start_future_gp().
	This brings RCU down to only one function to start a grace
	period, in happen contrast to the need to choose correctly
	between three of them before this patch series.

15.	Make rcu_start_future_gp() caller select grace period to avoid
	duplicate grace-period selection.  (We are going to like this
	grace period so much that we selected it twice!)

16.	Add funnel locking to rcu_start_this_gp(), the point being to
	reduce lock contention, especially on large systems.

17.	Make rcu_start_this_gp() check for out-of-range requests.
	If this check triggers, that indicates a bug in a caller of
	rcu_start_this_gp() or that the ->need_future_gp[] array needs
	to be even bigger, most likely the former.  More importantly, it
	avoids one possible cause of otherwise silent grace-period hangs.

18.	The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
	because funnel locking summarizes the need for future
	grace periods in the root rcu_node structure's ->lock,
	which rcu_gp_cleanup() already holds for other reasons.

19.	Simplify and inline cpu_needs_another_gp(), which used to be a key
	part of the no-longer-required forward-progress failsafe checks.

20.	Drop early GP request check from rcu_gp_kthread().  Yes, it
	was added above in order avoid grace-period hangs, but at this
	point in the series is no longer needed.  All in the name of
	bisectability.

21.	Update list of rcu_future_grace_period() trace events to reflect
	strings added above.

							Thanx, Paul

------------------------------------------------------------------------

 include/trace/events/rcu.h |   13 -
 kernel/rcu/rcu_segcblist.c |   18 -
 kernel/rcu/rcu_segcblist.h |    2 
 kernel/rcu/tree.c          |  406 ++++++++++++++++-----------------------------
 kernel/rcu/tree.h          |   24 ++
 kernel/rcu/tree_plugin.h   |   28 ---
 6 files changed, 182 insertions(+), 309 deletions(-)

^ permalink raw reply	[flat|nested] 44+ messages in thread