linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Loadavg accounting error on arm64
@ 2020-11-16  9:10 Mel Gorman
  2020-11-16 11:49 ` Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Mel Gorman @ 2020-11-16  9:10 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: Davidlohr Bueso, linux-arm-kernel, linux-kernel

Hi,

I got cc'd internal bug report filed against a 5.8 and 5.9 kernel
that loadavg was "exploding" on arch64 on a machines acting as a build
servers. It happened on at least two different arm64 variants. That setup
is complex to replicate but fortunately can be reproduced by running
hackbench-process-pipes while heavily overcomitting a machine with 96
logical CPUs and then checking if loadavg drops afterwards. With an
MMTests clone, I reproduced it as follows

./run-mmtests.sh --config configs/config-workload-hackbench-process-pipes --no-monitor testrun; \
    for i in `seq 1 60`; do cat /proc/loadavg; sleep 60; done

Load should drop to 10 after about 10 minutes and it does on x86-64 but
remained at around 200+ on arm64.

The reproduction case simply hammers the case where a task can be
descheduling while also being woken by another task at the same time. It
takes a long time to run but it makes the problem very obvious. The
expectation is that after hackbench has been running and saturating the
machine for a long time.

Commit dbfb089d360b ("sched: Fix loadavg accounting race") fixed a loadavg
accounting race in the generic case. Later it was documented why the
ordering of when p->sched_contributes_to_load is read/updated relative
to p->on_cpu.  This is critical when a task is descheduling at the same
time it is being activated on another CPU. While the load/stores happen
under the RQ lock, the RQ lock on its own does not give any guarantees
on the task state.

Over the weekend I convinced myself that it must be because the
implementation of smp_load_acquire and smp_store_release do not appear
to implement acquire/release semantics because I didn't find something
arm64 that was playing with p->state behind the schedulers back (I could
have missed it if it was in an assembly portion as I can't reliablyh read
arm assembler). Similarly, it's not clear why the arm64 implementation
does not call smp_acquire__after_ctrl_dep in the smp_load_acquire
implementation. Even when it was introduced, the arm64 implementation
differed significantly from the arm implementation in terms of what
barriers it used for non-obvious reasons.

Unfortunately, making that work similar to the arch-independent version
did not help but it's not helped that I know nothing about the arm64
memory model.

I'll be looking again today to see can I find a mistake in the ordering for
how sched_contributes_to_load is handled but again, the lack of knowledge
on the arm64 memory model means I'm a bit stuck and a second set of eyes
would be nice :(

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-11-19  9:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16  9:10 Loadavg accounting error on arm64 Mel Gorman
2020-11-16 11:49 ` Mel Gorman
2020-11-16 12:00   ` Mel Gorman
2020-11-16 12:53   ` Peter Zijlstra
2020-11-16 12:58     ` Peter Zijlstra
2020-11-16 15:29       ` Mel Gorman
2020-11-16 16:42         ` Mel Gorman
2020-11-16 16:49         ` Peter Zijlstra
2020-11-16 17:24           ` Mel Gorman
2020-11-16 17:41             ` Will Deacon
2020-11-16 12:46 ` Peter Zijlstra
2020-11-16 12:58   ` Mel Gorman
2020-11-16 13:11 ` Will Deacon
2020-11-16 13:37   ` Mel Gorman
2020-11-16 14:20     ` Peter Zijlstra
2020-11-16 15:52       ` Mel Gorman
2020-11-16 16:54         ` Peter Zijlstra
2020-11-16 17:16           ` Mel Gorman
2020-11-16 19:31       ` Mel Gorman
2020-11-17  8:30         ` [PATCH] sched: Fix data-race in wakeup Peter Zijlstra
2020-11-17  9:15           ` Will Deacon
2020-11-17  9:29             ` Peter Zijlstra
2020-11-17  9:46               ` Peter Zijlstra
2020-11-17 10:36                 ` Will Deacon
2020-11-17 12:52                 ` Valentin Schneider
2020-11-17 15:37                   ` Valentin Schneider
2020-11-17 16:13                     ` Peter Zijlstra
2020-11-17 19:32                       ` Valentin Schneider
2020-11-18  8:05                         ` Peter Zijlstra
2020-11-18  9:51                           ` Valentin Schneider
2020-11-18 13:33               ` Marco Elver
2020-11-17  9:38           ` [PATCH] sched: Fix rq->nr_iowait ordering Peter Zijlstra
2020-11-17 11:43             ` Mel Gorman
2020-11-19  9:55             ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2020-11-17 12:40           ` [PATCH] sched: Fix data-race in wakeup Mel Gorman
2020-11-19  9:55           ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).