From: Tianchen Ding <dtcccc@linux.alibaba.com>
To: Valentin Schneider <vschneid@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] sched: Fix the check of nr_running at queue wakelist
Date: Wed, 8 Jun 2022 23:38:33 +0800 [thread overview]
Message-ID: <e34de686-4e85-bde1-9f3c-9bbc86b38627@linux.alibaba.com> (raw)
In-Reply-To: <xhsmh4k0y84ah.mognet@vschneid.remote.csb>
On 2022/6/6 18:39, Valentin Schneider wrote:
> On 02/06/22 12:06, Tianchen Ding wrote:
>> The commit 2ebb17717550 ("sched/core: Offload wakee task activation if it
>> the wakee is descheduling") checked nr_running <= 1 to avoid task
>> stacking when WF_ON_CPU. Consider the order of p->on_rq and p->on_cpu,
>> if we have WF_ON_CPU here then we must have !p->on_rq, so the deactivate
>> has happened, thus the task being alone on the rq implies nr_running==0.
>> Change the check to !cpu_rq(cpu)->nr_running to fix it.
>>
>
> I'd flesh this out a bit as in the below:
>
> """
> The commit 2ebb17717550 ("sched/core: Offload wakee task activation if it
> the wakee is descheduling") checked rq->nr_running <= 1 to avoid task
> stacking when WF_ON_CPU.
>
> Per the ordering of writes to p->on_rq and p->on_cpu, observing p->on_cpu
> (WF_ON_CPU) in ttwu_queue_cond() implies !p->on_rq, IOW p has gone through
> the deactivate_task() in __schedule(), thus p has been accounted out of
> rq->nr_running. As such, the task being the only runnable task on the rq
> implies reading rq->nr_running == 0 at that point.
>
> Change the check to !cpu_rq(cpu)->nr_running.
> """
>
> Also, this is lacking some mention of tests that have been run to verify
> this isn't causing a regression. This does however make sense to me, so as
> long as nothing gets hurt by the change:
>
I've run the complete test cases of unixbench and it seems no regression (which
is expected). The result of Pipe-based Context Switching seems to be stable when
I simply type "./Run" to run all cases. :-/
On x86 (Intel Xeon Platinum 8269CY):
schbench -m 2 -t 8
Latency percentiles (usec) before after
50.0000th: 8 8
75.0000th: 10 10
90.0000th: 11 11
95.0000th: 12 12
*99.0000th: 15 13
99.5000th: 16 15
99.9000th: 20 18
Unixbench with full threads (104)
before after
Dhrystone 2 using register variables 3004715731 3011862938 0.24%
Double-Precision Whetstone 616685.8 617119.3 0.07%
Execl Throughput 27162.1 27667.3 1.86%
File Copy 1024 bufsize 2000 maxblocks 786221.4 785871.4 -0.04%
File Copy 256 bufsize 500 maxblocks 209420.6 210113.6 0.33%
File Copy 4096 bufsize 8000 maxblocks 2340458.8 2328862.2 -0.50%
Pipe Throughput 145249195.6 145535622.8 0.20%
Pipe-based Context Switching 3195567.7 3221686.4 0.82%
Process Creation 100597.6 101347.1 0.75%
Shell Scripts (1 concurrent) 120943.6 120193.5 -0.62%
Shell Scripts (8 concurrent) 17289.7 17233.4 -0.33%
System Call Overhead 5286847.6 5300604.8 0.26%
On arm64 (Ampere Altra):
schbench -m 2 -t 8
Latency percentiles (usec) before after
50.0000th: 14 14
75.0000th: 19 19
90.0000th: 22 22
95.0000th: 23 23
*99.0000th: 23 24
99.5000th: 23 24
99.9000th: 28 28
Unixbench with full threads (80)
before after
Dhrystone 2 using register variables 3536273441 3536194249 0.00%
Double-Precision Whetstone 629406.9 629383.6 0.00%
Execl Throughput 66419.3 65920.5 -0.75%
File Copy 1024 bufsize 2000 maxblocks 1060185.2 1063722.8 0.33%
File Copy 256 bufsize 500 maxblocks 317495.4 322684.5 1.63%
File Copy 4096 bufsize 8000 maxblocks 2350706.8 2348285.3 -0.10%
Pipe Throughput 133516462.4 133542875.3 0.02%
Pipe-based Context Switching 3227430.6 3215356.1 -0.37%
Process Creation 108958.3 108520.5 -0.40%
Shell Scripts (1 concurrent) 122821.4 122636.3 -0.15%
Shell Scripts (8 concurrent) 17456.5 17462.1 0.03%
System Call Overhead 4430303.2 4429998.9 -0.01%
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
>
Thanks. Will update my patch and send v4 soon.
next prev parent reply other threads:[~2022-06-08 15:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-02 4:06 [PATCH v3 0/2] sched: Queue task on wakelist in the same llc if the wakee cpu is idle Tianchen Ding
2022-06-02 4:06 ` [PATCH v3 1/2] sched: Fix the check of nr_running at queue wakelist Tianchen Ding
2022-06-06 10:39 ` Valentin Schneider
2022-06-08 15:38 ` Tianchen Ding [this message]
2022-06-02 4:06 ` [PATCH v3 2/2] sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle Tianchen Ding
2022-06-06 10:39 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e34de686-4e85-bde1-9f3c-9bbc86b38627@linux.alibaba.com \
--to=dtcccc@linux.alibaba.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.