linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	frederic@kernel.org, tglx@linutronix.de
Subject: Re: [PATCH] time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
Date: Thu, 20 Jun 2019 15:13:36 -0700	[thread overview]
Message-ID: <20190620221336.GZ26519@linux.ibm.com> (raw)
In-Reply-To: <20190620211019.GA3436@hirez.programming.kicks-ass.net>

On Thu, Jun 20, 2019 at 11:10:19PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 20, 2019 at 09:01:18AM -0700, Paul E. McKenney wrote:
> 
> > > > +#define TICK_SCHED_REMOTE_OFFLINE	0
> > > > +#define TICK_SCHED_REMOTE_RUNNING	1
> > > > +#define TICK_SCHED_REMOTE_OFFLINING	2
> > > 
> > > That seems a daft set of values; consider { RUNNING, OFFLINING, OFFLINE }
> > > and see below.
> > 
> > As in make it an enum?  I could do that.
> 
> Enum or define, I don't much care, but the 'natural' ordering of the
> states is either: running -> offlining -> offline, or the other way
> around, the given order in the patch just didn't make sense.
> 
> The one with running=0 just seems to work out nicer.

Aside from now needing to be initialized, but so it goes.  Which is
why TICK_SCHED_REMOTE_OFFLINE was zero in my version, in case you were
wondering.  ;-)

> > > > +
> > > > +// State diagram for ->state:
> > > > +//
> > > > +//
> > > > +//      +----->OFFLINE--------------------------+
> > > > +//      |                                       |
> > > > +//      |                                       |
> > > > +//      |                                       | sched_tick_start()
> > > > +//      | sched_tick_remote()                   |
> > > > +//      |                                       |
> > > > +//      |                                       V
> > > > +//      |                        +---------->RUNNING
> > > > +//      |                        |              |
> > > > +//      |                        |              |
> > > > +//      |                        |              |
> > > > +//      |     sched_tick_start() |              | sched_tick_stop()
> > > > +//      |                        |              |
> > > > +//      |                        |              |
> > > > +//      |                        |              |
> > > > +//      +--------------------OFFLINING<---------+
> > > > +//
> > > > +//
> > > > +// Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
> > > > +// and sched_tick_start() are happy to leave the state in RUNNING.
> 
> > > Also, I find it harder to read that needed, maybe a little something
> > > like so:
> > > 
> > > /*
> > >  *              OFFLINE
> > >  *               |   ^
> > >  *               |   | tick_remote()
> > >  *               |   |
> > >  *               +--OFFLINING
> > >  *               |   ^
> > >  *  tick_start() |   | tick_stop()
> > >  *               v   |
> > >  *              RUNNING
> > >  */
> > 
> > As in remove the leading "sched_" from the function names?  (The names
> > were already there, so I left them be.)
> 
> That was just me being lazy, the main part being getting the states in a
> linear order, instead of spread around a 2d grid.

OK, I will expand them.  Including the ones where I was being lazy.

> > > While not wrong, it seems overly complicated; can't we do something
> > > like:
> > > 
> > > tick:
> > 
> > As in sched_tick_remote(), right?
> > 
> > > 	state = atomic_read(->state);
> > > 	if (state) {
> > 
> > You sure you don't want "if (state != RUNNING)"?  But I guess you need
> > to understand that RUNNING==0 to understand the atomic_inc_not_zero().
> 
> Right..
> 
> > 
> > > 		WARN_ON_ONCE(state != OFFLINING);
> > > 		if (atomic_inc_not_zero(->state))
> > 
> > This assumes that there cannot be concurrent calls to sched_tick_remote(),
> > otherwise, you can end up with ->state==3.  Which is a situation that
> > my version does a WARN_ON_ONCE() for, so I guess the only difference is
> > that mine would be guaranteed to complain and yours would complain with
> > high probability.  So fair enough!
> 
> I was assuming there was only a single work per CPU and there'd not be
> concurrency on this path.
> 
> > > 			return;
> > > 	}
> > > 	queue_delayed_work();
> > > 
> > > 
> > > stop:
> > > 	/*
> > > 	 * This is hotplug; even without stop-machine, there cannot be
> > > 	 * concurrency on offlining specific CPUs.
> > > 	 */
> > > 	state = atomic_read(->state);
> > 
> > There cannot be a sched_tick_stop() or sched_tick_stop(), but there really
> > can be a sched_tick_remote() right here in the absence of stop-machine,
> > can't there?  Or am I missing something other than stop-machine that
> > prevents this?
> 
> There can be a remote tick, indeed.
> 
> > Now, you could argue that concurrency is safe: Either sched_tick_remote()
> > sees RUNNING and doesn't touch the value, or it sees offlining and
> > sched_tick_stop() makes no further changes,
> 
> That was indeed the thinking.
> 
> > but I am not sure that this qualifies as simpler...
> 
> There is that I suppose. I think my initial version was a little more
> complicated, but after a few passes this happened :-)
> 
> > > 	WARN_ON_ONCE(state != RUNNING);
> > > 	atomic_set(->state, OFFLINING);
> > 
> > Another option would be to use atomic_xchg() as below instead of the
> > atomic_read()/atomic_set() pair.  Would that work for you?
> 
> Yes, that works I suppose. Is more expensive, but I don't think we
> particularly care about that here.

Not on this code path, agreed.

> > > start:
> > > 	state = atomic_xchg(->state, RUNNING);
> > > 	WARN_ON_ONCE(state == RUNNING);
> > > 	if (state == OFFLINE) {
> > > 		// ...
> > > 		queue_delayed_work();
> > > 	}
> > 
> > This one looks to be an improvement on mine regardless of the other two.

So how about the following patch, which passes very light rcutorture
testing but should otherwise be regarded as being under suspicion?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 874c427742a9..36631d2eff05 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3051,8 +3051,36 @@ void scheduler_tick(void)
 
 struct tick_work {
 	int			cpu;
+	atomic_t		state;
 	struct delayed_work	work;
 };
+/* Values for ->state, see diagram below. */
+#define TICK_SCHED_REMOTE_RUNNING	0
+#define TICK_SCHED_REMOTE_OFFLINING	1
+#define TICK_SCHED_REMOTE_OFFLINE	2
+
+/*
+ * State diagram for ->state:
+ *
+ *
+ *          TICK_SCHED_REMOTE_OFFLINE
+ *                    |   ^
+ *                    |   |
+ *                    |   | sched_tick_remote()
+ *                    |   |
+ *                    |   |
+ *                    +--TICK_SCHED_REMOTE_OFFLINING
+ *                    |   ^
+ *                    |   |
+ * sched_tick_start() |   | sched_tick_stop()
+ *                    |   |
+ *                    V   |
+ *          TICK_SCHED_REMOTE_RUNNING
+ *
+ *
+ * Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
+ * and sched_tick_start() are happy to leave the state in RUNNING.
+ */
 
 static struct tick_work __percpu *tick_work_cpu;
 
@@ -3065,6 +3093,7 @@ static void sched_tick_remote(struct work_struct *work)
 	struct task_struct *curr;
 	struct rq_flags rf;
 	u64 delta;
+	int os;
 
 	/*
 	 * Handle the tick only if it appears the remote CPU is running in full
@@ -3078,7 +3107,7 @@ static void sched_tick_remote(struct work_struct *work)
 
 	rq_lock_irq(rq, &rf);
 	curr = rq->curr;
-	if (is_idle_task(curr))
+	if (is_idle_task(curr) || cpu_is_offline(cpu))
 		goto out_unlock;
 
 	update_rq_clock(rq);
@@ -3098,13 +3127,21 @@ static void sched_tick_remote(struct work_struct *work)
 	/*
 	 * Run the remote tick once per second (1Hz). This arbitrary
 	 * frequency is large enough to avoid overload but short enough
-	 * to keep scheduler internal stats reasonably up to date.
+	 * to keep scheduler internal stats reasonably up to date.  But
+	 * first update state to reflect hotplug activity if required.
 	 */
+	os = atomic_read(&twork->state);
+	if (os) {
+		WARN_ON_ONCE(os != TICK_SCHED_REMOTE_OFFLINING);
+		if (atomic_inc_not_zero(&twork->state))
+			return;
+	}
 	queue_delayed_work(system_unbound_wq, dwork, HZ);
 }
 
 static void sched_tick_start(int cpu)
 {
+	int os;
 	struct tick_work *twork;
 
 	if (housekeeping_cpu(cpu, HK_FLAG_TICK))
@@ -3113,15 +3150,20 @@ static void sched_tick_start(int cpu)
 	WARN_ON_ONCE(!tick_work_cpu);
 
 	twork = per_cpu_ptr(tick_work_cpu, cpu);
-	twork->cpu = cpu;
-	INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
-	queue_delayed_work(system_unbound_wq, &twork->work, HZ);
+	os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_RUNNING);
+	WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING);
+	if (os == TICK_SCHED_REMOTE_OFFLINE) {
+		twork->cpu = cpu;
+		INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
+		queue_delayed_work(system_unbound_wq, &twork->work, HZ);
+	}
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
 static void sched_tick_stop(int cpu)
 {
 	struct tick_work *twork;
+	int os;
 
 	if (housekeeping_cpu(cpu, HK_FLAG_TICK))
 		return;
@@ -3129,14 +3171,21 @@ static void sched_tick_stop(int cpu)
 	WARN_ON_ONCE(!tick_work_cpu);
 
 	twork = per_cpu_ptr(tick_work_cpu, cpu);
-	cancel_delayed_work_sync(&twork->work);
+	/* There cannot be competing actions, but don't rely on stop-machine. */
+	os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_OFFLINING);
+	WARN_ON_ONCE(os != TICK_SCHED_REMOTE_RUNNING);
+	/* Don't cancel, as this would mess up the state machine. */
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
 int __init sched_tick_offload_init(void)
 {
+	int cpu;
+
 	tick_work_cpu = alloc_percpu(struct tick_work);
 	BUG_ON(!tick_work_cpu);
+	for_each_possible_cpu(cpu)
+		atomic_set(&per_cpu(tick_work_cpu, cpu)->state, TICK_SCHED_REMOTE_OFFLINE);
 
 	return 0;
 }


  reply	other threads:[~2019-06-20 22:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 18:19 [PATCH] time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint Paul E. McKenney
2019-06-20 12:10 ` Peter Zijlstra
2019-06-20 16:01   ` Paul E. McKenney
2019-06-20 21:10     ` Peter Zijlstra
2019-06-20 22:13       ` Paul E. McKenney [this message]
2019-06-21 10:55         ` Peter Zijlstra
2019-06-21 12:16           ` Paul E. McKenney
2019-06-21 12:29             ` Peter Zijlstra
2019-06-21 13:34               ` Paul E. McKenney
2019-06-21 17:41                 ` Paul E. McKenney
2019-06-21 17:50                   ` Paul E. McKenney
2019-06-21 23:46                     ` Paul E. McKenney
2019-06-24 23:12                       ` Frederic Weisbecker
2019-06-24 23:44                         ` Paul E. McKenney
2019-06-25  0:43                           ` Frederic Weisbecker
2019-06-25  2:05                             ` Paul E. McKenney
2019-06-25  7:51                             ` Peter Zijlstra
2019-06-25 12:25                               ` Frederic Weisbecker
2019-06-25 13:54                                 ` Paul E. McKenney
2019-06-25 14:05                                   ` Peter Zijlstra
2019-06-25 14:16                                     ` Paul E. McKenney
2019-06-25 16:20                                       ` Frederic Weisbecker
2019-06-25 16:52                                         ` Paul E. McKenney
2019-06-28  7:37                                           ` Peter Zijlstra
2019-06-28 12:17                                             ` Paul E. McKenney
2019-07-25 16:14                                           ` [tip:sched/core] " tip-bot for Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2019-05-27 14:39 [PATCH] " Paul E. McKenney
2019-05-28 20:07 ` Thomas Gleixner
2019-05-29 18:19   ` Paul E. McKenney
2019-05-30 12:58     ` Paul E. McKenney
2019-05-31  1:36       ` Frederic Weisbecker
2019-05-31 13:43         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190620221336.GZ26519@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).