All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@redhat.com, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, linux-kernel@vger.kernel.org,
	zhangqiao22@huawei.com
Subject: Re: [PATCH 4/4] sched/fair: limit sched slice duration
Date: Fri, 9 Sep 2022 13:14:07 +0200	[thread overview]
Message-ID: <Yxsf/5ErmVoKFucb@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20220825122726.20819-5-vincent.guittot@linaro.org>


Picked up the first three.

On Thu, Aug 25, 2022 at 02:27:26PM +0200, Vincent Guittot wrote:
> In presence of a lot of small weight tasks like sched_idle tasks, normal
> or high weight tasks can see their ideal runtime (sched_slice) to increase
> to hundreds ms whereas it normally stays below sysctl_sched_latency.
> 
> 2 normal tasks running on a CPU will have a max sched_slice of 12ms
> (half of the sched_period). This means that they will make progress
> every sysctl_sched_latency period.
> 
> If we now add 1000 idle tasks on the CPU, the sched_period becomes

Surely people aren't actually having that many runnable tasks and this
is a device for the argument?

> 3006 ms and the ideal runtime of the normal tasks becomes 609 ms.
> It will even become 1500ms if the idle tasks belongs to an idle cgroup.
> This means that the scheduler will look for picking another waiting task
> after 609ms running time (1500ms respectively). The idle tasks change
> significantly the way the 2 normal tasks interleave their running time
> slot whereas they should have a small impact.
> 
> Such long sched_slice can delay significantly the release of resources
> as the tasks can wait hundreds of ms before the next running slot just
> because of idle tasks queued on the rq.
> 
> Cap the ideal_runtime to sysctl_sched_latency when comparing to the next
> waiting task to make sure that tasks will regularly make progress and will
> not be significantly impacted by idle/background tasks queued on the rq.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> While studying the problem, I have also considered to substract
> cfs.idle_h_nr_running before computing the sched_slice but we can have
> quite similar problem with low weight bormal task/cgroup so I have decided
> to keep this solution.

That ^... my proposal below has the same problem.

This:

> Also, this solution doesn't completly remove the impact of idle tasks
> in the scheduling pattern but cap the running slice of a task to a max
> value of 2*sysctl_sched_latency.

I'm failing to see how.

>  kernel/sched/fair.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 260a55ac462f..96fedd0ab5fa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4599,6 +4599,8 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
>  	if (delta < 0)
>  		return;

(I'm thinking that early return is a bit pointless, a negative value
won't be larger than ideal_time anyway)

> +	ideal_runtime =  min_t(u64, ideal_runtime, sysctl_sched_latency);
> +

(superfluous whitespace -- twice, once after the = once this whole extra
line)

>  	if (delta > ideal_runtime)
>  		resched_curr(rq_of(cfs_rq));
>  }

Urgghhhh..

so delta is in vtime here, while sched_latency is not, so the heavier
the queue, the larger this value becomes.

Given those 1000 idle tasks, rq-weight would be around 2048; however due
to nr_running being insane, sched_slice() ends up being something like:

  1000 * min_gran * 2 / 2048

which is around ~min_gran and so won't come near to latency.


since we already have idle_min_gran; how about something like this?


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index efceb670e755..8dd18fc0affa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -706,12 +706,14 @@ static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
  *
  * p = (nr <= nl) ? l : l*nr/nl
  */
-static u64 __sched_period(unsigned long nr_running)
+static u64 __sched_period(unsigned long nr_running, unsigned long nr_idle)
 {
-	if (unlikely(nr_running > sched_nr_latency))
-		return nr_running * sysctl_sched_min_granularity;
-	else
-		return sysctl_sched_latency;
+	u64 period = 0;
+
+	period += nr_running * sysctl_sched_min_granularity;
+	period += nr_idle    * sysctl_sched_idle_min_granularity;
+
+	return max_t(u64, period, sysctl_sched_latency);
 }
 
 static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
@@ -724,15 +726,25 @@ static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	unsigned int nr_running = cfs_rq->nr_running;
+	unsigned int nr_idle = cfs_rq->idle_nr_running;
+	unsigned int nr_running = cfs_rq->nr_running - nr_idle;
 	struct sched_entity *init_se = se;
 	unsigned int min_gran;
 	u64 slice;
 
-	if (sched_feat(ALT_PERIOD))
-		nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
+	if (sched_feat(ALT_PERIOD)) {
+		nr_idle = rq_of(cfs_rq)->cfs.idle_h_nr_running;
+		nr_running = rq_of(cfs_rq)->cfs.h_nr_running - nr_idle;
+	}
+
+	if (!se->on_rq) {
+		if (se_is_idle(se))
+			nr_idle++;
+		else
+			nr_running++;
+	}
 
-	slice = __sched_period(nr_running + !se->on_rq);
+	slice = __sched_period(nr_running, nr_idle);
 
 	for_each_sched_entity(se) {
 		struct load_weight *load;


This changes how the compute the period depending on the composition. It
suffers the exact same problem you had earlier though in that it doesn't
work for the other low-weight cases. But perhaps we can come up with a
better means of computing the period that *does* consider them?

As said before;... urgh! bit of a sticky problem this.

  reply	other threads:[~2022-09-09 11:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 12:27 [PATCH 0/4] sched/fair: fixes in presence of lot of sched_idle tasks Vincent Guittot
2022-08-25 12:27 ` [PATCH 1/4] sched/fair: make sure to try to detach at least one movable task Vincent Guittot
2022-09-12  8:44   ` Dietmar Eggemann
2022-09-13  8:54     ` Vincent Guittot
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Make " tip-bot2 for Vincent Guittot
2024-02-12 20:28   ` [PATCH 1/4] sched/fair: make " Josh Don
2024-03-20 16:57     ` Vincent Guittot
2024-03-21 20:25       ` Josh Don
2024-03-22 17:16         ` Vincent Guittot
2024-03-22 19:49           ` Josh Don
2022-08-25 12:27 ` [PATCH 2/4] sched/fair: cleanup loop_max and loop_break Vincent Guittot
2022-09-12  8:45   ` Dietmar Eggemann
2022-09-13  8:37     ` Vincent Guittot
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Cleanup " tip-bot2 for Vincent Guittot
2022-08-25 12:27 ` [PATCH 3/4] sched/fair: move call to list_last_entry() in detach_tasks Vincent Guittot
2022-09-12  8:46   ` Dietmar Eggemann
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Move " tip-bot2 for Vincent Guittot
2022-08-25 12:27 ` [PATCH 4/4] sched/fair: limit sched slice duration Vincent Guittot
2022-09-09 11:14   ` Peter Zijlstra [this message]
2022-09-09 14:03     ` Vincent Guittot
2022-09-09 15:05       ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yxsf/5ErmVoKFucb@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.