All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chen Yu <yu.c.chen@intel.com>
To: Tianchen Ding <dtcccc@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] sched: Clear ttwu_pending after enqueue_task
Date: Fri, 4 Nov 2022 16:00:35 +0800	[thread overview]
Message-ID: <Y2TGozI0YZQ7BCxc@chenyu5-mobl1> (raw)
In-Reply-To: <20221104023601.12844-1-dtcccc@linux.alibaba.com>

On 2022-11-04 at 10:36:01 +0800, Tianchen Ding wrote:
> We found a long tail latency in schbench whem m*t is close to nr_cpus.
> (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)
> 
> This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
> too early, and idle_cpu() will return true until the wakee task enqueued.
> This will mislead the waker when selecting idle cpu, and wake multiple
> worker threads on the same wakee cpu. This situation is enlarged by
> commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
> wakelist if wakee cpu is idle") because it tends to use wakelist.
> 
> Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
> (Intel(R) Xeon(R) Platinum 8369B).
> 
> Latency percentiles (usec):
>                 base      base+revert_f3dd3f674555   base+this_patch
> 50.0000th:         9                            13                 9
> 75.0000th:        12                            19                12
> 90.0000th:        15                            22                15
> 95.0000th:        18                            24                17
> *99.0000th:       27                            31                24
> 99.5000th:      3364                            33                27
> 99.9000th:     12560                            36                30
> 
> We also tested on unixbench and hackbench, and saw no performance
> change.
> 
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> ---
> v2:
> Update commit log about other benchmarks.
> Add comment in code.
> Move the code before rq_unlock. This can make ttwu_pending updated a bit
> earlier than v1 so that it can reflect the real condition more timely,
> maybe.
> 
> v1: https://lore.kernel.org/all/20221101073630.2797-1-dtcccc@linux.alibaba.com/
> ---
>  kernel/sched/core.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 87c9cdf37a26..7a04b5565389 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg)
>  	if (!llist)
>  		return;
>  
> -	/*
> -	 * rq::ttwu_pending racy indication of out-standing wakeups.
> -	 * Races such that false-negatives are possible, since they
> -	 * are shorter lived that false-positives would be.
> -	 */
> -	WRITE_ONCE(rq->ttwu_pending, 0);
> -
>  	rq_lock_irqsave(rq, &rf);
>  	update_rq_clock(rq);
>  
> @@ -3759,6 +3752,17 @@ void sched_ttwu_pending(void *arg)
>  		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
>  	}
>  
> +	/*
> +	 * Must be after enqueueing at least once task such that
> +	 * idle_cpu() does not observe a false-negative -- if it does,
> +	 * it is possible for select_idle_siblings() to stack a number
> +	 * of tasks on this CPU during that window.
> +	 *
> +	 * It is ok to clear ttwu_pending when another task pending.
> +	 * We will receive IPI after local irq enabled and then enqueue it.
> +	 * Since now nr_running > 0, idle_cpu() will always get correct result.
> +	 */
> +	WRITE_ONCE(rq->ttwu_pending, 0);
>  	rq_unlock_irqrestore(rq, &rf);
>  }
>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu
  

  reply	other threads:[~2022-11-04  8:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01  7:36 [PATCH] sched: Clear ttwu_pending after enqueue_task Tianchen Ding
2022-11-01 10:34 ` Peter Zijlstra
2022-11-01 13:51   ` Chen Yu
2022-11-01 14:59     ` Peter Zijlstra
2022-11-02  3:01       ` Chen Yu
2022-11-02  6:40       ` Tianchen Ding
2022-11-02  6:40   ` Tianchen Ding
2022-11-04  2:36 ` [PATCH v2] " Tianchen Ding
2022-11-04  8:00   ` Chen Yu [this message]
2022-11-14 15:27   ` Mel Gorman
2022-11-16  9:22   ` [tip: sched/core] sched: Clear ttwu_pending after enqueue_task() tip-bot2 for Tianchen Ding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2TGozI0YZQ7BCxc@chenyu5-mobl1 \
    --to=yu.c.chen@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dtcccc@linux.alibaba.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.