linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 05/16] rcu: De-offloading CB kthread
Date: Wed, 4 Nov 2020 15:31:35 +0100	[thread overview]
Message-ID: <20201104143135.GB467220@lothringen> (raw)
In-Reply-To: <20201102133824.GA2661878@boqun-archlinux>

On Mon, Nov 02, 2020 at 09:38:24PM +0800, Boqun Feng wrote:
> Hi Frederic,
> 
> Could you copy the rcu@vger.kernel.org if you have another version, it
> will help RCU hobbyists like me to catch up news in RCU, thanks! ;-)

Sure! Will do!

> > +static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	struct rcu_segcblist *cblist = &rdp->cblist;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +	bool wake_cb = false;
> > +	unsigned long flags;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +
> > +	rcu_nocb_lock_irqsave(rdp, flags);
> > +	raw_spin_lock_rcu_node(rnp);
> 
> Could you explain why both nocb_lock and node lock are needed here?

So here is the common scheme with rcu nocb locking:

rcu_nocb_lock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_lock(rdp->nocb_lock)
}
....

rcu_nocb_unlock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_unlock(rdp->nocb_lock)
}

Here we first need to check rdp->cblist offloaded state locklessly.
This is dangerous because we now can switch that offloaded state remotely.
So we may run some code under rcu_nocb_lock() without actually locking while at
the same time we switch to offloaded mode and we must run things under the
lock.

Fortunately, rcu_nocb_lock(), and rcu_segcblist_is_offloaded() in general,
is only called under either of the following situations (for which I need to
add debug assertions, but that will be in a subsequent patchset):

   1) hotplug write lock held
   2) rcu_barrier mutex held
   3) nocb lock held
   4) rnp lock held
   5) rdp is local
   6) we run the nocb timer

1) and 2) are covered because we hold them while calling work_on_cpu().
3) and 4) are held when we change the flags of the rdp->cblist.
5) is covered by the fact we are modifying the rdp->cblist state only
   locally (using work_on_cpu()) with irqs disabled.
6) we cancel/rearm the timer before it can see half states.

But you make me realize one thing: we must protect against these situations
especially when we switch the flags:

   a) From SEGCBLIST_SOFTIRQ_ONLY to SEGCBLIST_OFFLOADED (switch from no locking to locking)
   b) From 0 to SEGCBLIST_SOFTIRQ_ONLY (switch from locking to no locking)

Those are the critical state changes that modify the locking behaviour
and we have to hold both rdp->nocb_lock and the rnp lock on these state changes.

Unfortunately in the case b) I omitted the rnp lock. So I need to fix that.

> Besides rcu_nocb_{un}lock_* can skip the lock part based on the offload
> state, which gets modified right here (in a rcu_nocb_{un}lock_* critical
> secion), and I think this is error-prone, because you may get a unpaired
> lock-unlock (not in this patch though). Maybe just use node lock?

So that's why case 1 to 6 must be covered.

> 
> > +	rcu_segcblist_offload(cblist, false);
> > +	raw_spin_unlock_rcu_node(rnp);
> > +
> > +	if (rdp->nocb_cb_sleep) {
> > +		rdp->nocb_cb_sleep = false;
> > +		wake_cb = true;
> > +	}
> > +	rcu_nocb_unlock_irqrestore(rdp, flags);
> > +
> > +	if (wake_cb)
> > +		swake_up_one(&rdp->nocb_cb_wq);
> > +
> > +	swait_event_exclusive(rdp->nocb_state_wq,
> > +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> > +
> > +	return 0;
> > +}
> > +
> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> 
> I think this warning can actually happen, if I understand how workqueue
> works correctly. Consider that the corresponding cpu gets offlined right
> after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
> becomes unbound, and IIUC, workqueues don't do migration during
> cpu-offlining, which means the worker can be scheduled to other CPUs,
> and the work gets executed on another cpu. Am I missing something here?.

We are holding cpus_read_lock() in rcu_nocb_cpu_offload(), this should
prevent from that.

Thanks!

  reply	other threads:[~2020-11-04 14:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-23 14:46 [PATCH 00/16] rcu/nocb: De-offload and re-offload support v3 Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 01/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 02/16] rcu: Turn enabled/offload states into a common flag Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 03/16] rcu: Provide basic callback offloading state machine bits Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 04/16] rcu/nocb: Always init segcblist on CPU up Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 05/16] rcu: De-offloading CB kthread Frederic Weisbecker
2020-11-02 13:38   ` Boqun Feng
2020-11-04 14:31     ` Frederic Weisbecker [this message]
2020-11-04 14:42       ` Boqun Feng
2020-11-04 14:45         ` Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 06/16] rcu/nocb: Don't deoffload an offline CPU with pending work Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 07/16] rcu: De-offloading GP kthread Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 08/16] rcu: Re-offload support Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 09/16] rcu: Shutdown nocb timer on de-offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 10/16] rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 11/16] rcu: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 12/16] rcu/nocb: Only cond_resched() from actual offloaded batch processing Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 13/16] rcu: Process batch locally as long as offloading isn't complete Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 14/16] rcu: Locally accelerate callbacks " Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 15/16] rcutorture: Test runtime toggling of CPUs' callback offloading Frederic Weisbecker
2020-10-23 14:46 ` [PATCH 16/16] tools/rcutorture: Support nocb toggle in TREE01 Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201104143135.GB467220@lothringen \
    --to=frederic@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).