All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@codemonkey.org.uk>,
	Mel Gorman <mgorman@techsingularity.net>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	mingo@kernel.org, Linus Torvalds <torvalds@linux-foundation.org>,
	paul.gortmaker@windriver.com
Subject: Re: weird loadavg on idle machine post 5.7
Date: Tue, 07 Jul 2020 00:56:04 +0100	[thread overview]
Message-ID: <jhj5zb08agb.mognet@arm.com> (raw)
In-Reply-To: <20200706145952.GB597537@hirez.programming.kicks-ass.net>


On 06/07/20 15:59, Peter Zijlstra wrote:
> OK, lots of cursing later, I now have the below...
>
> The TL;DR is that while schedule() doesn't change p->state once it
> starts, it does read it quite a bit, and ttwu() will actually change it
> to TASK_WAKING. So if ttwu() changes it to WAKING before schedule()
> reads it to do loadavg accounting, things go sideways.
>
> The below is extra complicated by the fact that I've had to scrounge up
> a bunch of load-store ordering without actually adding barriers. It adds
> yet another control dependency to ttwu(), so take that C standard :-)
>
> I've booted it, and build a few kernels with it and checked loadavg
> drops to 0 after each build, so from that pov all is well, but since
> I'm not confident I can reproduce the issue, I can't tell this actually
> fixes anything, except maybe phantoms of my imagination.
>

As you said on IRC, the one apparent race would lead into "negative"
rq->nr_uninterruptible accounting, i.e. we'd skip some increments so would
end up with more decrements. As for the described issue, I think we were
both expecting "positive" accounting, i.e. more increments than decrements,
leading into an artificially inflated loadavg.

In any case, this should get rid of any existing race. I'll need some more
time (and more aperol spritz) to go through it all though.

> @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>        *
>        * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
>        * __schedule().  See the comment for smp_mb__after_spinlock().
> +	 *
> +	 * Form a control-dep-acquire with p->on_rq == 0 above, to ensure
> +	 * schedule()'s deactivate_task() has 'happened' and p will no longer
> +	 * care about it's own p->state. See the comment in __schedule().
>        */
> -	smp_rmb();
> +	smp_acquire__after_ctrl_dep();

Apologies for asking again, but I'm foolishly hopeful I'll someday be able
to grok those things without half a dozen tabs open with documentation and
Paul McKenney papers.

Do I get it right that the 'acquire' part hints this is equivalent to
issuing a load-acquire on whatever was needed to figure out whether or not
the take the branch (in this case, p->on_rq, amongst other things); IOW
ensures any memory access appearing later in program order has to happen
after the load?

That at least explains to me the load->{load,store} wording in
smp_acquire__after_ctrl_dep().

> +
> +	/*
> +	 * We're doing the wakeup (@success == 1), they did a dequeue (p->on_rq
> +	 * == 0), which means we need to do an enqueue, change p->state to
> +	 * TASK_WAKING such that we can unlock p->pi_lock before doing the
> +	 * enqueue, such as ttwu_queue_wakelist().
> +	 */
> +	p->state = TASK_WAKING;
>
>       /*
>        * If the owning (remote) CPU is still in the middle of schedule() with

  parent reply	other threads:[~2020-07-06 23:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-02 17:15 weird loadavg on idle machine post 5.7 Dave Jones
2020-07-02 19:46 ` Dave Jones
2020-07-02 21:15 ` Paul Gortmaker
2020-07-03 13:23   ` Paul Gortmaker
2020-07-02 21:36 ` Mel Gorman
2020-07-02 23:11   ` Michal Kubecek
2020-07-02 23:24   ` Dave Jones
2020-07-03  9:02   ` Peter Zijlstra
2020-07-03 10:40     ` Peter Zijlstra
2020-07-03 20:51       ` Dave Jones
2020-07-06 14:59         ` Peter Zijlstra
2020-07-06 21:20           ` Dave Jones
2020-07-07  7:48             ` Peter Zijlstra
2020-07-06 23:56           ` Valentin Schneider [this message]
2020-07-07  8:17             ` Peter Zijlstra
2020-07-07 10:20               ` Valentin Schneider
2020-07-07 10:29               ` Peter Zijlstra
2020-07-08  9:46                 ` [tip: sched/urgent] sched: Fix loadavg accounting race tip-bot2 for Peter Zijlstra
2020-07-07  9:20           ` weird loadavg on idle machine post 5.7 Qais Yousef
2020-07-07  9:47             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhj5zb08agb.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.