linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <song.bao.hua@hisilicon.com>
To: <vincent.guittot@linaro.org>, <mingo@redhat.com>,
	<peterz@infradead.org>, <dietmar.eggemann@arm.com>,
	<rostedt@goodmis.org>, <bsegall@google.com>, <mgorman@suse.de>
Cc: <valentin.schneider@arm.com>, <juri.lelli@redhat.com>,
	<bristot@redhat.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <xuwei5@huawei.com>,
	<prime.zeng@hisilicon.com>, <guodong.xu@linaro.org>,
	<yangyicong@huawei.com>, <liguozhu@hisilicon.com>,
	<linuxarm@openeuler.org>, <wanghuiqiang@huawei.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	"Yongjia Xie" <xieyongjia1@huawei.com>
Subject: [PATCH] sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt
Date: Tue, 27 Apr 2021 14:37:58 +1200	[thread overview]
Message-ID: <20210427023758.4048-1-song.bao.hua@hisilicon.com> (raw)

a severe qperf performance decrease was reported in the below use case:
For a hardware with 2 NUMA nodes, node0 has cpu0-31, node1 has cpu32-63.
Ethernet is located in node1.

Run the below commands:
$ taskset -c 32-63 stress -c 32 &
$ qperf 192.168.50.166 tcp_lat
tcp_lat:
	latency = 2.95ms.
Normally the latency should be less than 20us. But in the above test,
latency increased dramatically to 2.95ms.

This is caused by ping-pong of qperf between node0 and node1. Since it
is a sync wake-up and waker's nr_running == 1, WAKE_AFFINE will pull
qperf to node1, but LB will soon migrate qperf back to node0.
Not like a normal sync wake-up coming from a task, the waker in the above
test is an interrupt and nr_running happens to be 1 since stress starts
32 threads on node1 with 32 cpus.

Testing also shows the performance of qperf won't drop if the number
of threads are increased to 64, 96 or larger values:
$ taskset -c 32-63 stress -c 96 &
$ qperf 192.168.50.166 tcp_lat
tcp_lat:
	latency = 14.7us.

Obviously "-c 96" makes "cpu_rq(this_cpu)->nr_running == 1" false in
wake_affine_idle() so WAKE_AFFINE won't pull qperf to node1.

To fix this issue, this patch checks the waker of sync wake-up is a task
but not an interrupt. In this case, the waker will schedule out and give
CPU to wakee.

Reported-by: Yongjia Xie <xieyongjia1@huawei.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 kernel/sched/fair.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d73bdbb2d40..8ad2d732033d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5829,7 +5829,12 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 	if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
 		return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
 
-	if (sync && cpu_rq(this_cpu)->nr_running == 1)
+	/*
+	 * If this is a sync wake-up and the only running thread is just
+	 * waker, thus, waker is not interrupt, we assume wakee will get
+	 * the cpu of waker soon
+	 */
+	if (sync && cpu_rq(this_cpu)->nr_running == 1 && in_task())
 		return this_cpu;
 
 	if (available_idle_cpu(prev_cpu))
-- 
2.25.1


             reply	other threads:[~2021-04-27  2:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  2:37 Barry Song [this message]
2021-04-27  4:21 ` [PATCH] sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt Mike Galbraith
2021-04-27  4:44   ` Song Bao Hua (Barry Song)
2021-04-27  5:54     ` Mike Galbraith
2021-04-27  6:05       ` Song Bao Hua (Barry Song)
2021-04-27  6:15         ` Mike Galbraith
2021-05-05 13:19 ` [sched/fair] 5f94d1b650: stress-ng.sock.ops_per_sec -25.2% regression kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210427023758.4048-1-song.bao.hua@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=guodong.xu@linaro.org \
    --cc=juri.lelli@redhat.com \
    --cc=liguozhu@hisilicon.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wanghuiqiang@huawei.com \
    --cc=xieyongjia1@huawei.com \
    --cc=xuwei5@huawei.com \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).