All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU
Date: Thu, 14 May 2020 00:45:26 +0200	[thread overview]
Message-ID: <20200513224525.GA18303@lenoir> (raw)
In-Reply-To: <20200513183831.GV2869@paulmck-ThinkPad-P72>

On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = rdp->mynode;
> > +
> > +	printk("De-offloading %d\n", rdp->cpu);
> > +	kthread_park(rdp->nocb_cb_kthread);
> 
> I am a bit concerned about this, because from this point until the
> end of this function, no RCU callbacks can be invoked for this CPU.
> This could be a problem if there is a callback flood in progress, and
> such callback floods are in fact one reason that the sysadm might want
> to be switching from offloaded to non-offloaded.  Is it possible to
> move this kthread_park() to the end of this function?  Yes, this can
> result in concurrent invocation of different callbacks for this CPU,
> but because we have excluded rcu_barrier(), that should be OK.
> 
> Or is there some failure mode that I am failing to see?  (Wouldn't
> be the first time...)

Heh I actually worried about that. Ok the issue is that it leaves
a window where nocb_cb and local caller of rcu_do_batch() can
compete but the local caller doesn't lock the nocb_lock.

So there are two ways to solve that.:

1)  - set cblist->offloaded = 0 locally
    - From the kthread while calling rcu_do_batch():
      check the value of cblist->offloaded everytime after
      we call rcu_nocb_lock() and stop messsing with the
      cblist and return when we see cblist->offloaded == 0
    - Allow to handle cblist locally without taking the nocb_lock
    - Park kthread

But there I'm worried about races. Imagine we have:


      Kthread                     Local
      --------                   -------
      rcu_do_batch() {
          rcu_nocb_lock()
          do stuff with cblist
          rcu_nocb_unlock()
                                 rcu_nocb_lock()
                                 set cblist->offloaded = 0
                                 rcu_nocb_unlock()
                                 ===> INT or preemption
                                 rcu_do_batch() {
                                     do stuff with cblist

Are we guaranteed that the Local CPU will see the updates from
the kthread while calling rcu_do_batch()? I would tend to say
yes but I'm not entirely sure...

Oh wait, that solution also implies that we can't re-enqueue
extracted callbacks if we spent took much time in threaded
rcu_do_batch(), as the cblist may have been offloaded while
we executed the extracted callbacks.

That's a lot of corner cases to handle, which is why I prefer
the other solution:

2) enum cblist_offloaded {
        CBLIST_NOT_OFFLOADED,
        CBLIST_(DE)OFFLOADING,
        CBLIST_OFFLOADED
   }

 - Locally set cblist->offloaded =  CBLIST_DEOFFLOADING
 - From the kthread while calling rcu_do_batch(), do as
   usual.
 - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
   rcu_nocb_lock() will take the lock.
 - Park kthread
 - Locally set cblist->offloaded =  CBLIST_NOT_OFFLOADED
 - Local calls to rcu_do_batch() won't take the lock anymore.


> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > +	struct rcu_data *rdp = arg;
> > +
> > +	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > +	__rcu_nocb_rdp_deoffload(rdp);
> > +
> > +	return 0;
> > +}
> 
> For example, is the problem caused by invocations of this
> rcu_nocb_rdp_deoffload() function?

How so?

> But if so, do we really need to acquire rcu_state.barrier_mutex?

Indeed it was probably not needed if we parked the kthread before
anything, as we would have kept the callbacks ordering.

But now if we allow concurrent callbacks execution during the small
window, we'll need it.

> 
> That aside, if it is possible to do the switch without interrupting
> callback invocation?  Or is your idea that because we are always executing
> on the CPU being deoffloaded, that CPU has been prevented from posting
> callbacks in any case?

No in the tiny window between kthread_park() and the irqs being disabled,
the workqueue can be preempted and thus call_rcu() can be called anytime.

> If the latter, does that mean that it is not
> possible to deoffload offlined CPUs?  (Not sure whether this restriction
> is a real problem, but figured that I should ask.)

Ah in the case of offlined CPUs I simply call the function directly from the CPU
that disables the nocb remotely. So we remotely park the kthread (that we
always do anyway) and set offlined.

Thanks.

  reply	other threads:[~2020-05-13 22:45 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 16:47 [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints Frederic Weisbecker
2020-05-20 12:29   ` Joel Fernandes
2020-05-22 17:57     ` Paul E. McKenney
2020-05-26 15:21       ` Joel Fernandes
2020-05-26 16:29         ` Paul E. McKenney
2020-05-26 20:18           ` Joel Fernandes
2020-05-26 21:09             ` Paul E. McKenney
2020-05-26 21:27               ` Joel Fernandes
2020-05-26 22:29                 ` Paul E. McKenney
2020-05-27  0:45                   ` Joel Fernandes
2020-05-27  0:58                     ` Paul E. McKenney
2020-06-04 11:41       ` Frederic Weisbecker
2020-06-04 16:36         ` Paul E. McKenney
2020-06-08 12:57           ` Frederic Weisbecker
2020-06-09 18:02             ` Paul E. McKenney
2020-06-10 13:12               ` Frederic Weisbecker
2020-06-10 14:02                 ` Paul E. McKenney
2020-06-10 22:12                   ` Frederic Weisbecker
2020-06-10 23:21                     ` Paul E. McKenney
2020-06-11  1:32                       ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 02/10] rcu: Use direct rdp->nocb_lock operations on local calls Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 03/10] rcu: Make locking explicit in do_nocb_deferred_wakeup_common() Frederic Weisbecker
2020-05-26 19:54   ` Joel Fernandes
2020-05-26 19:59   ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 04/10] rcu: Implement rcu_segcblist_is_offloaded() config dependent Frederic Weisbecker
2020-05-13 18:20   ` Paul E. McKenney
2020-05-13 23:03     ` Frederic Weisbecker
2020-05-14 15:47       ` Paul E. McKenney
2020-05-13 16:47 ` [PATCH 05/10] rcu: Remove useless conditional nocb unlock Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 06/10] rcu: Make nocb_cb kthread parkable Frederic Weisbecker
2020-06-11  1:34   ` Joel Fernandes
2020-05-13 16:47 ` [PATCH 07/10] rcu: Temporarily assume that nohz full CPUs might not be NOCB Frederic Weisbecker
2020-05-13 18:25   ` Paul E. McKenney
2020-05-13 23:08     ` Frederic Weisbecker
2020-05-14 15:50       ` Paul E. McKenney
2020-05-14 22:49         ` Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU Frederic Weisbecker
2020-05-13 18:38   ` Paul E. McKenney
2020-05-13 22:45     ` Frederic Weisbecker [this message]
2020-05-14 15:47       ` Paul E. McKenney
2020-05-14 22:30         ` Frederic Weisbecker
2020-05-14 22:47           ` Paul E. McKenney
2020-05-14 22:55             ` Frederic Weisbecker
2020-05-26 21:20   ` Joel Fernandes
2020-05-26 22:49     ` Joel Fernandes
2020-06-04 13:10       ` Frederic Weisbecker
2020-06-11  1:32         ` Joel Fernandes
2020-06-11 17:03           ` Paul E. McKenney
2020-06-04 13:14     ` Frederic Weisbecker
2020-05-13 16:47 ` [PATCH 09/10] rcu: Allow to re-offload a CPU that used to be nocb Frederic Weisbecker
2020-05-13 18:41   ` Paul E. McKenney
2020-05-13 16:47 ` [PATCH 10/10] rcu: Nocb (de)activate through sysfs Frederic Weisbecker
2020-05-13 18:42   ` Paul E. McKenney
2020-05-13 23:23     ` Frederic Weisbecker
2020-05-14 15:51       ` Paul E. McKenney
2020-05-13 18:15 ` [PATCH 00/10] rcu: Allow a CPU to leave and reenter NOCB state Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513224525.GA18303@lenoir \
    --to=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.