Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
Date: Thu, 14 May 2020 00:45:26 +0200	[thread overview]
Message-ID: <20200513224525.GA18303@lenoir> (raw)
In-Reply-To: <20200513183831.GV2869@paulmck-ThinkPad-P72>

On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +	kthread_park(rdp->nocb_cb_kthread);
> 
> I am a bit concerned about this, because from this point until the
> end of this function, no RCU callbacks can be invoked for this CPU.
> This could be a problem if there is a callback flood in progress, and
> such callback floods are in fact one reason that the sysadm might want
> to be switching from offloaded to non-offloaded.  Is it possible to
> move this kthread_park() to the end of this function?  Yes, this can
> result in concurrent invocation of different callbacks for this CPU,
> but because we have excluded rcu_barrier(), that should be OK.
> 
> Or is there some failure mode that I am failing to see?  (Wouldn't
> be the first time...)

Heh I actually worried about that. Ok the issue is that it leaves
a window where nocb_cb and local caller of rcu_do_batch() can
compete but the local caller doesn't lock the nocb_lock.

So there are two ways to solve that.:

1)  - set cblist->offloaded = 0 locally
    - From the kthread while calling rcu_do_batch():
      check the value of cblist->offloaded everytime after
      we call rcu_nocb_lock() and stop messsing with the
      cblist and return when we see cblist->offloaded == 0
    - Allow to handle cblist locally without taking the nocb_lock
    - Park kthread

But there I'm worried about races. Imagine we have:

      Kthread                     Local
      --------                   -------
      rcu_do_batch() {
          rcu_nocb_lock()
          do stuff with cblist
          rcu_nocb_unlock()
                                 rcu_nocb_lock()
                                 set cblist->offloaded = 0
                                 rcu_nocb_unlock()
                                 ===> INT or preemption
                                 rcu_do_batch() {
                                     do stuff with cblist

Are we guaranteed that the Local CPU will see the updates from
the kthread while calling rcu_do_batch()? I would tend to say
yes but I'm not entirely sure...

Oh wait, that solution also implies that we can't re-enqueue
extracted callbacks if we spent took much time in threaded
rcu_do_batch(), as the cblist may have been offloaded while
we executed the extracted callbacks.

That's a lot of corner cases to handle, which is why I prefer
the other solution:

2) enum cblist_offloaded {
        CBLIST_NOT_OFFLOADED,
        CBLIST_(DE)OFFLOADING,
        CBLIST_OFFLOADED
   }

 - Locally set cblist->offloaded =  CBLIST_DEOFFLOADING
 - From the kthread while calling rcu_do_batch(), do as
   usual.
 - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
   rcu_nocb_lock() will take the lock.
 - Park kthread
 - Locally set cblist->offloaded =  CBLIST_NOT_OFFLOADED
 - Local calls to rcu_do_batch() won't take the lock anymore.

> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > +	__rcu_nocb_rdp_deoffload(rdp);
> > +
> > +	return 0;
> > +}
> 
> For example, is the problem caused by invocations of this
> rcu_nocb_rdp_deoffload() function?

How so?

> But if so, do we really need to acquire rcu_state.barrier_mutex?

Indeed it was probably not needed if we parked the kthread before
anything, as we would have kept the callbacks ordering.

But now if we allow concurrent callbacks execution during the small
window, we'll need it.

> 
> That aside, if it is possible to do the switch without interrupting
> callback invocation?  Or is your idea that because we are always executing
> on the CPU being deoffloaded, that CPU has been prevented from posting
> callbacks in any case?

No in the tiny window between kthread_park() and the irqs being disabled,
the workqueue can be preempted and thus call_rcu() can be called anytime.

> If the latter, does that mean that it is not
> possible to deoffload offlined CPUs?  (Not sure whether this restriction
> is a real problem, but figured that I should ask.)

Ah in the case of offlined CPUs I simply call the function directly from the CPU
that disables the nocb remotely. So we remotely park the kthread (that we
always do anyway) and set offlined.

Thanks.