linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups
@ 2019-01-09  3:49 Andrea Arcangeli
  2019-01-09  3:49 ` [PATCH 1/1] " Andrea Arcangeli
  2019-01-09  4:19 ` [PATCH 0/1] RFC: " Mike Galbraith
  0 siblings, 2 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2019-01-09  3:49 UTC (permalink / raw)
  To: Peter Zijlstra, Mel Gorman; +Cc: linux-kernel

Hello,

we noticed some unexpected performance regressions in the scheduler by
switching the guest CPU topology from "-smp 2,sockets=2,cores=1" to
"-smp 2,sockets=1,cores=2".

With sockets=2,cores=1 localhost message passing (pipes, AF_UNIX etc..)
runs serially at 100% CPU load of a single vcpu with optimal
performance. With sockets=1,cores=2 the load is spread across both
vcpus and performance is reduced.

With SCHED_MC=n on older kernels the problem goes away (but that's far
from ideal for heavily multithreaded workloads which then regress)
because that basically disables the last part of select_idle_sibling().

On bare metal with SCHED_MC=y on any recent multicore CPU the
scheduler (as expected) behaves like sockets=1,cores=2, so it won't
run the tasks serially.

The reason is that select_idle_sibling() can disregard the "sync" hint
and all decisions done up to that point and at the last minute it can
decide to move the waken task to an arbitrary idle core.

To test the above theory I implemented this patch which seems to
confirm the reason the tasks won't run serially anymore with
sockets=1,cores=2 is select_idle_sibling() overriding the "sync" hint.

You worked on the wake_affine() before so you may want to review this
issue, if you agree these sync workloads should run serially even in
presence of idle cores in the system. I don't know if the current
behavior is on purpose but if it is, it'd be interesting to know
why. This is just a RFC.

To test I used this trivial program.

/*
 *  pipe-loop.c
 *
 *  Copyright (C) 2019 Red Hat, Inc.
 *
 *  This work is licensed under the terms of the GNU GPL, version 2.
 */

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char ** argv)
{
	char buf[1];
	int n = 1000000;

	int pipe1[2], pipe2[2];

	pipe(pipe1);
	pipe(pipe2);

	if (fork()) {
		while (n--) {
			read(pipe1[0], buf, 1);
			write(pipe2[1], buf, 1);
		}
		wait(NULL);
	} else {
		while (n--) {
			write(pipe1[1], buf, 1);
			read(pipe2[0], buf, 1);
		}
	}

	return 0;
}

Andrea Arcangeli (1):
  sched/fair: skip select_idle_sibling() in presence of sync wakeups

 kernel/sched/fair.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-09 18:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09  3:49 [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups Andrea Arcangeli
2019-01-09  3:49 ` [PATCH 1/1] " Andrea Arcangeli
2019-01-09  4:19 ` [PATCH 0/1] RFC: " Mike Galbraith
2019-01-09 10:07   ` Mel Gorman
2019-01-09 18:24     ` Andrea Arcangeli
2019-01-09 18:02   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).