Re: [PATCH 05/16] rcu: De-offloading CB kthread

From: Frederic Weisbecker <frederic@kernel.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 05/16] rcu: De-offloading CB kthread
Date: Wed, 4 Nov 2020 15:31:35 +0100	[thread overview]
Message-ID: <20201104143135.GB467220@lothringen> (raw)
In-Reply-To: <20201102133824.GA2661878@boqun-archlinux>

On Mon, Nov 02, 2020 at 09:38:24PM +0800, Boqun Feng wrote:
> Hi Frederic,
> 
> Could you copy the rcu@vger.kernel.org if you have another version, it
> will help RCU hobbyists like me to catch up news in RCU, thanks! ;-)

Sure! Will do!

> > +static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	struct rcu_segcblist *cblist = &rdp->cblist;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +	bool wake_cb = false;
> > +	unsigned long flags;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +
> > +	rcu_nocb_lock_irqsave(rdp, flags);
> > +	raw_spin_lock_rcu_node(rnp);
> 
> Could you explain why both nocb_lock and node lock are needed here?

So here is the common scheme with rcu nocb locking:

rcu_nocb_lock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_lock(rdp->nocb_lock)
}
....

rcu_nocb_unlock(rdp) {
   if (rcu_segcblist_is_offloaded(rdp->cblist))
       raw_spin_unlock(rdp->nocb_lock)
}

Here we first need to check rdp->cblist offloaded state locklessly.
This is dangerous because we now can switch that offloaded state remotely.
So we may run some code under rcu_nocb_lock() without actually locking while at
the same time we switch to offloaded mode and we must run things under the
lock.

Fortunately, rcu_nocb_lock(), and rcu_segcblist_is_offloaded() in general,
is only called under either of the following situations (for which I need to
add debug assertions, but that will be in a subsequent patchset):

   1) hotplug write lock held
   2) rcu_barrier mutex held
   3) nocb lock held
   4) rnp lock held
   5) rdp is local
   6) we run the nocb timer

1) and 2) are covered because we hold them while calling work_on_cpu().
3) and 4) are held when we change the flags of the rdp->cblist.
5) is covered by the fact we are modifying the rdp->cblist state only
   locally (using work_on_cpu()) with irqs disabled.
6) we cancel/rearm the timer before it can see half states.

But you make me realize one thing: we must protect against these situations
especially when we switch the flags:

   a) From SEGCBLIST_SOFTIRQ_ONLY to SEGCBLIST_OFFLOADED (switch from no locking to locking)
   b) From 0 to SEGCBLIST_SOFTIRQ_ONLY (switch from locking to no locking)

Those are the critical state changes that modify the locking behaviour
and we have to hold both rdp->nocb_lock and the rnp lock on these state changes.

Unfortunately in the case b) I omitted the rnp lock. So I need to fix that.

> Besides rcu_nocb_{un}lock_* can skip the lock part based on the offload
> state, which gets modified right here (in a rcu_nocb_{un}lock_* critical
> secion), and I think this is error-prone, because you may get a unpaired
> lock-unlock (not in this patch though). Maybe just use node lock?

So that's why case 1 to 6 must be covered.

> 
> > +	rcu_segcblist_offload(cblist, false);
> > +	raw_spin_unlock_rcu_node(rnp);
> > +
> > +	if (rdp->nocb_cb_sleep) {
> > +		rdp->nocb_cb_sleep = false;
> > +		wake_cb = true;
> > +	}
> > +	rcu_nocb_unlock_irqrestore(rdp, flags);
> > +
> > +	if (wake_cb)
> > +		swake_up_one(&rdp->nocb_cb_wq);
> > +
> > +	swait_event_exclusive(rdp->nocb_state_wq,
> > +			      !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB));
> > +
> > +	return 0;
> > +}
> > +
> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> 
> I think this warning can actually happen, if I understand how workqueue
> works correctly. Consider that the corresponding cpu gets offlined right
> after the rcu_nocb_cpu_deoffloaed(), and the workqueue of that cpu
> becomes unbound, and IIUC, workqueues don't do migration during
> cpu-offlining, which means the worker can be scheduled to other CPUs,
> and the work gets executed on another cpu. Am I missing something here?.

We are holding cpus_read_lock() in rcu_nocb_cpu_offload(), this should
prevent from that.

Thanks!