All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Cc: "Aaron Lu" <aaron.lwe@gmail.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Julien Desfossez" <jdesfossez@digitalocean.com>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Aaron Lu" <aaron.lu@linux.alibaba.com>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Li, Aubrey" <aubrey.li@linux.intel.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Joel Fernandes" <joelaf@google.com>,
	"Joel Fernandes" <joel@joelfernandes.org>
Subject: Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison
Date: Fri, 15 May 2020 12:38:44 +0200	[thread overview]
Message-ID: <20200515103844.GG2978@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CANaguZD_ZknCrnUA8TYs4rc0TLJZ9J2_FcWmW5cxEMWDTdL6hg@mail.gmail.com>

On Thu, May 14, 2020 at 06:51:27PM -0400, Vineeth Remanan Pillai wrote:
> On Thu, May 14, 2020 at 9:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > A little something like so, this syncs min_vruntime when we switch to
> > single queue mode. This is very much SMT2 only, I got my head in twist
> > when thikning about more siblings, I'll have to try again later.
> >
> Thanks for the quick patch! :-)
> 
> For SMT-n, would it work if sync vruntime if atleast one sibling is
> forced idle? Since force_idle is for all the rqs, I think it would
> be correct to sync the vruntime if atleast one cpu is forced idle.

It's complicated ;-)

So this sync is basically a relative reset of S to 0.

So with 2 queues, when one goes idle, we drop them both to 0 and one
then increases due to not being idle, and the idle one builds up lag to
get re-elected. So far so simple, right?

When there's 3, we can have the situation where 2 run and one is idle,
we sync to 0 and let the idle one build up lag to get re-election. Now
suppose another one also drops idle. At this point dropping all to 0
again would destroy the built-up lag from the queue that was already
idle, not good.

So instead of syncing everything, we can:

  less := !((s64)(s_a - s_b) <= 0)

  (v_a - S_a) - (v_b - S_b) == v_a - v_b - S_a + S_b
                            == v_a - (v_b - S_a + S_b)

IOW, we can recast the (lag) comparison to a one-sided difference.
So if then, instead of syncing the whole queue, sync the idle queue
against the active queue with S_a + S_b at the point where we sync.

(XXX consider the implication of living in a cyclic group: N / 2^n N)

This gives us means of syncing single queues against the active queue,
and for already idle queues to preseve their build-up lag.

Of course, then we get the situation where there's 2 active and one
going idle, who do we pick to sync against? Theory would have us sync
against the combined S, but as we've already demonstated, there is no
such thing in infeasible weight scenarios.

One thing I've considered; and this is where that core_active rudiment
came from, is having active queues sync up between themselves after
every tick. This limits the observed divergence due to the work
conservance.

On top of that, we can improve upon things by moving away from our
horrible (10) hack and moving to (9) and employing (13) here.

Anyway, I got partway through that in the past days, but then my head
hurt. I'll consider it some more :-)

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > -               if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
> > -                       rq_i->core_forceidle = true;
> > +               if (is_idle_task(rq_i->core_pick)) {
> > +                       if (rq_i->nr_running)
> > +                               rq_i->core_forceidle = true;
> > +               } else {
> > +                       new_active++;
> I think we need to reset new_active on restarting the selection.

But this loop is after selection has been done; we don't modify
new_active during selection.

> > +               /*
> > +                * We just dropped into single-rq mode, increment the sequence
> > +                * count to trigger the vruntime sync.
> > +                */
> > +               rq->core->core_sync_seq++;
> > +       }
> > +       rq->core->core_active = new_active;
> core_active seems to be unused.

Correct; that's rudiments from an SMT-n attempt.

> > +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> > +{
> > +       struct sched_entity *se_a = &a->se, *se_b = &b->se;
> > +       struct cfs_rq *cfs_rq_a, *cfa_rq_b;
> > +       u64 vruntime_a, vruntime_b;
> > +
> > +       while (!is_same_tg(se_a, se_b)) {
> > +               int se_a_depth = se_a->depth;
> > +               int se_b_depth = se_b->depth;
> > +
> > +               if (se_a_depth <= se_b_depth)
> > +                       se_b = parent_entity(se_b);
> > +               if (se_a_depth >= se_b_depth)
> > +                       se_a = parent_entity(se_a);
> > +       }
> > +
> > +       cfs_rq_a = cfs_rq_of(se_a);
> > +       cfs_rq_b = cfs_rq_of(se_b);
> > +
> > +       vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime;
> > +       vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime;
> Should we be using core_vruntime conditionally? should it be min_vruntime for
> default comparisons and core_vruntime during force_idle?

At the very least it should be min_vruntime when cfs_rq_a == cfs_rq_b,
ie. when we're on the same CPU.

For the other case I was considering that tick based active sync, but
never got that finished and admittedly it all looks a bit weird. But I
figured I'd send it out so we can at least advance the discussion.

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -469,7 +469,7 @@ bool cfs_prio_less(struct task_struct *a
 {
 	struct sched_entity *se_a = &a->se, *se_b = &b->se;
 	struct cfs_rq *cfs_rq_a, *cfa_rq_b;
-	u64 vruntime_a, vruntime_b;
+	u64 s_a, s_b, S_a, S_b;
 
 	while (!is_same_tg(se_a, se_b)) {
 		int se_a_depth = se_a->depth;
@@ -484,10 +484,16 @@ bool cfs_prio_less(struct task_struct *a
 	cfs_rq_a = cfs_rq_of(se_a);
 	cfs_rq_b = cfs_rq_of(se_b);
 
-	vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime;
-	vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime;
+	S_a = cfs_rq_a->core_vruntime;
+	S_b = cfs_rq_b->core_vruntime;
 
-	return !((s64)(vruntime_a - vruntime_b) <= 0);
+	if (cfs_rq_a == cfs_rq_b)
+		S_a = S_b = cfs_rq_a->min_vruntime;
+
+	s_a = se_a->vruntime - S_a;
+	s_b = se_b->vruntime - S_b;
+
+	return !((s64)(s_a - s_b) <= 0);
 }
 
 static __always_inline

  reply	other threads:[~2020-05-15 10:39 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:59 [RFC PATCH 00/13] Core scheduling v5 vpillai
2020-03-04 16:59 ` [RFC PATCH 01/13] sched: Wrap rq::lock access vpillai
2020-03-04 16:59 ` [RFC PATCH 02/13] sched: Introduce sched_class::pick_task() vpillai
2020-03-04 16:59 ` [RFC PATCH 03/13] sched: Core-wide rq->lock vpillai
2020-04-01 11:42   ` [PATCH] sched/arm64: store cpu topology before notify_cpu_starting Cheng Jian
2020-04-01 13:23     ` Valentin Schneider
2020-04-01 13:23       ` Valentin Schneider
2020-04-06  8:00       ` chengjian (D)
2020-04-06  8:00         ` chengjian (D)
2020-04-09  9:59       ` Sudeep Holla
2020-04-09  9:59         ` Sudeep Holla
2020-04-09 10:32         ` Valentin Schneider
2020-04-09 10:32           ` Valentin Schneider
2020-04-09 11:08           ` Sudeep Holla
2020-04-09 11:08             ` Sudeep Holla
2020-04-09 17:54     ` Joel Fernandes
2020-04-10 13:49       ` chengjian (D)
2020-04-14 11:36   ` [RFC PATCH 03/13] sched: Core-wide rq->lock Peter Zijlstra
2020-04-14 21:35     ` Vineeth Remanan Pillai
2020-04-15 10:55       ` Peter Zijlstra
2020-04-14 14:32   ` Peter Zijlstra
2020-03-04 16:59 ` [RFC PATCH 04/13] sched/fair: Add a few assertions vpillai
2020-03-04 16:59 ` [RFC PATCH 05/13] sched: Basic tracking of matching tasks vpillai
2020-03-04 16:59 ` [RFC PATCH 06/13] sched: Update core scheduler queue when taking cpu online/offline vpillai
2020-03-04 16:59 ` [RFC PATCH 07/13] sched: Add core wide task selection and scheduling vpillai
2020-04-14 13:35   ` Peter Zijlstra
2020-04-16 23:32     ` Tim Chen
2020-04-17 10:57       ` Peter Zijlstra
2020-04-16  3:39   ` Chen Yu
2020-04-16 19:59     ` Vineeth Remanan Pillai
2020-04-17 11:18     ` Peter Zijlstra
2020-04-19 15:31       ` Chen Yu
2020-05-21 23:14   ` Joel Fernandes
2020-05-21 23:16     ` Joel Fernandes
2020-05-22  2:35     ` Joel Fernandes
2020-05-22  3:44       ` Aaron Lu
2020-05-22 20:13         ` Joel Fernandes
2020-03-04 16:59 ` [RFC PATCH 08/13] sched/fair: wrapper for cfs_rq->min_vruntime vpillai
2020-03-04 16:59 ` [RFC PATCH 09/13] sched/fair: core wide vruntime comparison vpillai
2020-04-14 13:56   ` Peter Zijlstra
2020-04-15  3:34     ` Aaron Lu
2020-04-15  4:07       ` Aaron Lu
2020-04-15 21:24         ` Vineeth Remanan Pillai
2020-04-17  9:40           ` Aaron Lu
2020-04-20  8:07             ` [PATCH updated] sched/fair: core wide cfs task priority comparison Aaron Lu
2020-04-20 22:26               ` Vineeth Remanan Pillai
2020-04-21  2:51                 ` Aaron Lu
2020-04-24 14:24                   ` [PATCH updated v2] " Aaron Lu
2020-05-06 14:35                     ` Peter Zijlstra
2020-05-08  8:44                       ` Aaron Lu
2020-05-08  9:09                         ` Peter Zijlstra
2020-05-08 12:34                           ` Aaron Lu
2020-05-14 13:02                             ` Peter Zijlstra
2020-05-14 22:51                               ` Vineeth Remanan Pillai
2020-05-15 10:38                                 ` Peter Zijlstra [this message]
2020-05-15 10:43                                   ` Peter Zijlstra
2020-05-15 14:24                                   ` Vineeth Remanan Pillai
2020-05-16  3:42                               ` Aaron Lu
2020-05-22  9:40                                 ` Aaron Lu
2020-06-08  1:41                               ` Ning, Hongyu
2020-03-04 17:00 ` [RFC PATCH 10/13] sched: Trivial forced-newidle balancer vpillai
2020-03-04 17:00 ` [RFC PATCH 11/13] sched: migration changes for core scheduling vpillai
2020-06-12 13:21   ` Joel Fernandes
2020-06-12 21:32     ` Vineeth Remanan Pillai
2020-06-13  2:25       ` Joel Fernandes
2020-06-13 18:59         ` Vineeth Remanan Pillai
2020-06-15  2:05           ` Li, Aubrey
2020-03-04 17:00 ` [RFC PATCH 12/13] sched: cgroup tagging interface " vpillai
2020-06-26 15:06   ` Vineeth Remanan Pillai
2020-03-04 17:00 ` [RFC PATCH 13/13] sched: Debug bits vpillai
2020-03-04 17:36 ` [RFC PATCH 00/13] Core scheduling v5 Tim Chen
2020-03-04 17:42   ` Vineeth Remanan Pillai
2020-04-14 14:21 ` Peter Zijlstra
2020-04-15 16:32   ` Joel Fernandes
2020-04-17 11:12     ` Peter Zijlstra
2020-04-17 12:35       ` Alexander Graf
2020-04-17 13:08         ` Peter Zijlstra
2020-04-18  2:25       ` Joel Fernandes
2020-05-09 14:35   ` Dario Faggioli
     [not found] ` <38805656-2e2f-222a-c083-692f4b113313@linux.intel.com>
2020-05-09  3:39   ` Ning, Hongyu
2020-05-14 20:51     ` FW: " Gruza, Agata
2020-05-10 23:46 ` [PATCH RFC] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-11 13:49   ` Peter Zijlstra
2020-05-11 14:54     ` Joel Fernandes
2020-05-20 22:26 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-05-21  4:09   ` [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail) benbjiang(蒋彪)
2020-05-21 13:49     ` Joel Fernandes
2020-05-21  8:51   ` [PATCH RFC] sched: Add a per-thread core scheduling interface Peter Zijlstra
2020-05-21 13:47     ` Joel Fernandes
2020-05-21 20:20       ` Vineeth Remanan Pillai
2020-05-22 12:59       ` Peter Zijlstra
2020-05-22 21:35         ` Joel Fernandes
2020-05-24 14:00           ` Phil Auld
2020-05-28 14:51             ` Joel Fernandes
2020-05-28 17:01             ` Peter Zijlstra
2020-05-28 18:17               ` Phil Auld
2020-05-28 18:34                 ` Phil Auld
2020-05-28 18:23               ` Joel Fernandes
2020-05-21 18:31   ` Linus Torvalds
2020-05-21 20:40     ` Joel Fernandes
2020-05-21 21:58       ` Jesse Barnes
2020-05-22 16:33         ` Linus Torvalds
2020-05-20 22:37 ` [PATCH RFC v2] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-20 22:48 ` [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic Joel Fernandes (Google)
2020-05-21 22:52   ` Paul E. McKenney
2020-05-22  1:26     ` Joel Fernandes
2020-06-25 20:12 ` [RFC PATCH 00/13] Core scheduling v5 Vineeth Remanan Pillai
2020-06-26  1:47   ` Joel Fernandes
2020-06-26 14:36     ` Vineeth Remanan Pillai
2020-06-26 15:10       ` Joel Fernandes
2020-06-26 15:12         ` Joel Fernandes
2020-06-27 16:21         ` Joel Fernandes
2020-06-30 14:11         ` Phil Auld
2020-06-29 12:33   ` Li, Aubrey
2020-06-29 19:41     ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515103844.GG2978@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joelaf@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.