linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	rcu@vger.kernel.org
Subject: Re: [PATCH 00/19] rcu/nocb: De-offload and re-offload support v4
Date: Wed, 16 Dec 2020 08:59:30 -0800	[thread overview]
Message-ID: <20201216165930.GJ2657@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20201113121334.166723-1-frederic@kernel.org>

On Fri, Nov 13, 2020 at 01:13:15PM +0100, Frederic Weisbecker wrote:
> This keeps growing up. Rest assured, most of it is debug code and sanity
> checks.
> 
> Boqun Feng found that holding rnp lock while updating the offloaded
> state of an rdp isn't needed, and he was right despite my initial
> reaction. The sites that read the offloaded state while holding the rnp
> lock are actually protected because they read it locally in a non
> preemptible context.
> 
> So I removed the rnp lock in "rcu/nocb: De-offloading CB". And just to
> make sure I'm not missing something, I added sanity checks that ensure
> we always read the offloaded state in a safe way (3 last patches).
> 
> Still passes TREE01 (but I had to fight!)
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	rcu/nocb-toggle-v4
> 
> HEAD: 579e15efa48fb6fc4ecf14961804051f385807fe
> 
> Thanks,
> 	Frederic

Thank you both!

> ---
> 
> Frederic Weisbecker (19):
>       rcu/nocb: Turn enabled/offload states into a common flag
>       rcu/nocb: Provide basic callback offloading state machine bits
>       rcu/nocb: Always init segcblist on CPU up
>       rcu/nocb: De-offloading CB kthread
>       rcu/nocb: Don't deoffload an offline CPU with pending work
>       rcu/nocb: De-offloading GP kthread
>       rcu/nocb: Re-offload support
>       rcu/nocb: Shutdown nocb timer on de-offloading
>       rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY
>       rcu/nocb: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading
>       rcu/nocb: Only cond_resched() from actual offloaded batch processing
>       rcu/nocb: Process batch locally as long as offloading isn't complete
>       rcu/nocb: Locally accelerate callbacks as long as offloading isn't complete
>       tools/rcutorture: Support nocb toggle in TREE01

I applied the above, with the usual commit-log wordsmithing.

>       rcutorture: Remove weak nocb declarations
>       rcutorture: Export nocb (de)offloading functions

These I folded into the rcutorture commit, as you suggested.

>       cpu/hotplug: Add lockdep_is_cpus_held()
>       timer: Add timer_curr_running()

I applied these two.

>       rcu/nocb: Detect unsafe checks for offloaded rdp

This one didn't apply, probably due to recent changes in -rcu.  Could
you please take a look?

I do get the occasional rcutorture writer stall in TREE01.  I sent you
the console log offlist, but I suspect that this is related to some
of your questions.  I am retesting with the new TREE01 boot parameters
removed (but with the new rcu_nocbs= layout).  These stalls are temporary,
with torturing progressing normally after the warnings are printed.

This warning usually indicates that grace periods are progressing

I believe that the following enhancements will be needed:

o	Forbid toggling of offlined CPUs.  This is a simple rule that
	results in deterministic success or failure.  The current setup
	(which I freely admit that I suggested) will fail very rarely,
	only if a newly offlined offloaded CPU is toggled before its
	remaining callbacks are invoked.  The current state is therefore
	an accident waiting to happen.

	This will entail fixing a few comments as well.

o	Drain the bypass early in the de-offloading process and prohibit
	queuing onto the bypass anywhere in the toggling process.
	Then the only CPUs permitted to use the bypass are those that
	are fully offloaded.  This approach allows rcu_core() and the
	rcuog kthreads to safely manipulate callbacks and grace periods
	concurrently.  (Famous last words!)  This might address the
	rcutorture writer stall warnings I am seeing.

	This will entail fixing a few comments as well.

o	Avoid double write to rdp->nocb_cb_sleep in nocb_cb_wait().
	Unless there is some reason why this is absolutely required.
	It will cause confusion as it is.

o	What bad thing happens if we de-offload the CPU corresponding to
	the nocb GP kthread?  It does work fine when that CPU is offlined.

	Coincidence or not, all but one of the rcutorture writer stalls
	is followed by a message refusing to de-offload this CPU.

o	The nocb_gp_update_state() function's return value seems backwards.
	What am I missing here?

o	__rcu_nocb_rdp_deoffload(): Why move rcu_nocb_lock_irqsave()
	across a comment?

o	We need better debug.  For example, "Can't deoffload an rdp GP
	leader (yet)".	Well, which CPU?  For another example, we need
	deoffload state in rcutorture writer-stall dumps.

I intend to do the following, and, if feasible, fold them into the
original commits:

o	Make rcu_segcblist_is_offloaded() use "&&" instead of a
	nested "if" statement.

o	nocb_cb_wait() needs the "^^^ Ensure CB invocation follows
	_sleep test" to be adjusted so that it is properly attached to
	its smp_load_acquire().

o	nocb_cb_wait() -- alphabetical order for local variables, please!
	Yeah, I know, others like inverted tree for reasons that escape me
	completely.

							Thanx, Paul

>  include/linux/cpu.h                                |   1 +
>  include/linux/rcu_segcblist.h                      | 119 +++++-
>  include/linux/rcupdate.h                           |   4 +
>  include/linux/timer.h                              |   2 +
>  kernel/cpu.c                                       |   7 +
>  kernel/rcu/rcu_segcblist.c                         |  13 +-
>  kernel/rcu/rcu_segcblist.h                         |  45 ++-
>  kernel/rcu/rcutorture.c                            |   3 -
>  kernel/rcu/tree.c                                  |  49 ++-
>  kernel/rcu/tree.h                                  |   2 +
>  kernel/rcu/tree_plugin.h                           | 416 +++++++++++++++++++--
>  kernel/time/timer.c                                |  13 +
>  .../selftests/rcutorture/configs/rcu/TREE01.boot   |   4 +-
>  13 files changed, 614 insertions(+), 64 deletions(-)

  parent reply	other threads:[~2020-12-16 17:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13 12:13 [PATCH 00/19] rcu/nocb: De-offload and re-offload support v4 Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 01/19] rcu/nocb: Turn enabled/offload states into a common flag Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 02/19] rcu/nocb: Provide basic callback offloading state machine bits Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 03/19] rcu/nocb: Always init segcblist on CPU up Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 04/19] rcu/nocb: De-offloading CB kthread Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 05/19] rcu/nocb: Don't deoffload an offline CPU with pending work Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 06/19] rcu/nocb: De-offloading GP kthread Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 07/19] rcu/nocb: Re-offload support Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 08/19] rcu/nocb: Shutdown nocb timer on de-offloading Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 09/19] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 10/19] rcu/nocb: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 11/19] rcu/nocb: Only cond_resched() from actual offloaded batch processing Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 12/19] rcu/nocb: Process batch locally as long as offloading isn't complete Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 13/19] rcu/nocb: Locally accelerate callbacks " Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 14/19] tools/rcutorture: Support nocb toggle in TREE01 Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 15/19] rcutorture: Remove weak nocb declarations Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 16/19] rcutorture: Export nocb (de)offloading functions Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 17/19] cpu/hotplug: Add lockdep_is_cpus_held() Frederic Weisbecker
2020-12-15 23:49   ` Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 18/19] timer: Add timer_curr_running() Frederic Weisbecker
2020-11-13 12:13 ` [PATCH 19/19] rcu/nocb: Detect unsafe checks for offloaded rdp Frederic Weisbecker
2020-12-08  2:41 ` [PATCH 00/19] rcu/nocb: De-offload and re-offload support v4 Boqun Feng
2020-12-08 12:51   ` Frederic Weisbecker
2020-12-10  1:21     ` Paul E. McKenney
2020-12-10  3:08       ` Boqun Feng
2020-12-16 16:59 ` Paul E. McKenney [this message]
2020-12-29 14:21   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201216165930.GJ2657@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraju@codeaurora.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).