linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: David Vernet <void@manifault.com>,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, joshdon@google.com,
	roman.gushchin@linux.dev, tj@kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
Date: Fri, 23 Jun 2023 15:20:15 +0530	[thread overview]
Message-ID: <ZJVq1+dSkMAsOIKw@BLR-5CG11610CF.amd.com> (raw)
In-Reply-To: <20230622102935.GG4253@hirez.programming.kicks-ass.net>

On Thu, Jun 22, 2023 at 12:29:35PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 22, 2023 at 02:41:57PM +0530, Gautham R. Shenoy wrote:
> 
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -215,21 +215,17 @@ static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
> >  {
> >  	unsigned long flags;
> >  	struct swqueue *swqueue;
> > -	bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
> > -	bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
> >  
> >  	/*
> >  	 * Only enqueue the task in the shared wakequeue if:
> >  	 *
> >  	 * - SWQUEUE is enabled
> > -	 * - The task is on the wakeup path
> > -	 * - The task wasn't purposefully migrated to the current rq by
> > -	 *   select_task_rq()
> > -	 * - The task isn't pinned to a specific CPU
> > +	 * - select_idle_sibling() didn't find an idle CPU to enqueue this wakee task.
> >  	 */
> > -	if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
> 
> You lost the nr_cpus_allowed == 1 case, that actually made sense, except
> he didn't don't have a hook in set_cpus_allowed() to fix it back up.
>

I didn't retain the "nr_cpus_allowed == 1" check because
select_task_rq() calls select_task_rq_fair() only when
p->nr_cpus_allowed > 1.

So if p was affined to a single CPU, it would always have
p->sched_add_to_swq set to 0, thereby not queueing it on the 'queue'.


> > +	if (!READ_ONCE(p->sched_add_to_swq))
> >  		return;
> 
> So suppose we fill all CPUs with tasks T_{0..n-n1}, sis finds them a
> nice idle cpu and we don't get queued.

Yes. 

> 
> Then wake up another n tasks T_{n...2n-1}, none of them find an idle
> CPU, as per the above them all taken. These all do get queued.

Yes.

> 
> However, suppose T>=n does a wakeup-preemption, and we end up with T>=n
> running while T<n get queued on the rq, but not the 'queue' (I so hate
> the swqueue name).

Yes. But note that prior to running T>=n on the CPU, it will be
removed from the 'queue'. So if every T>=n preempts the corresponding
T<n, then, the queue becomes empty. And yes, in this case the queue
served no purpose and does not justify the additional overhead.

> 
> Then one CPU has both it's tasks go 'away' and becomes idle.

Yes, and this CPU prior to newidle_balance() will now search the
'queue'.

> 
> At this point the 'queue' contains only running tasks that are not
> migratable and all the tasks that could be migrated are not on the queue
> -- IOW it is completely useless.

At this point, the CPU will call swqueue_pick_next_task() and we can
have one of the three cases:

    * The 'queue' is empty: As mentioned above, the queue didn't do
      anything for the tasks and was cmpletely useless.

    * The task at the head of 'queue' is not runnable on the CPU which
      is going idle. Again, wasted effort of adding this task to the
      queue, because in the current implementation, the task is
      removed from the 'queue' even when it is not runnable on this
      CPU.

    * The 'queue' has some task which can be run on the CPU going
      idle. The task is successfully migrated to this CPU, which no
      longer has to do newidle_balance() and the task has reduced it
      waiting period.

> 
> Is this the intention?

IIUC, the intention is to reduce the avg post-wakeup wait time of the
task by running it on a newidle CPU, when the task's CPU is still busy
running something else. This is assuming that the task won't win
wakeup-preemption. But the flipside is the additional cost of swqueue
enqueue/dequeue.

I have queued a run across a set of benchmarks, but just to get an
idea how the patch performs, I ran hackbench and this is what the
results look like on a 2 Socket Gen3 EPYC configured in NPS=2 with 64
cores 128 threads per socket:

Test:            tip                  tip_swq_disabled        tip_swq_enabled
=============================================================================
1-groups:       3.82 (0.00 pct)       3.84 (-0.52 pct)        3.46 (+9.42 pct)
2-groups:       4.40 (0.00 pct)       4.42 (-0.45 pct)        3.66 (+16.81 pct)
4-groups:       4.84 (0.00 pct)       4.82 (+0.41 pct)        4.50 (+7.02 pct)
8-groups:       5.45 (0.00 pct)       5.43 (+0.36 pct)        5.85 (-7.33 pct)
16-groups:      6.94 (0.00 pct)       6.90 (+0.57 pct)        8.41 (-21.18 pct)
 
We can see that patchset does quite well with 1,2,4 groups, but with 8
and 16 groups as the system becomes overloaded, we can see the
performance degrade. In the overloaded case, I could observe
native_queued_spin_lock_slowpath() looming large in the perf profiles
when the swqueue feature was enabled. So perhaps the cost of swqueue
enqueue/dequeue is not justifiable, especially if the tasks are anyway
going to run on their target CPUs.

I will post more results later.
--
Thanks and Regards
gautham.

  reply	other threads:[~2023-06-23  9:51 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13  5:20 [RFC PATCH 0/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  5:20 ` [RFC PATCH 1/3] sched: Make migrate_task_to() take any task David Vernet
2023-06-21 13:04   ` Peter Zijlstra
2023-06-22  2:07     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 2/3] sched/fair: Add SWQUEUE sched feature and skeleton calls David Vernet
2023-06-21 12:49   ` Peter Zijlstra
2023-06-22 14:53     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  8:32   ` Peter Zijlstra
2023-06-14  4:35     ` Aaron Lu
2023-06-14  9:27       ` Peter Zijlstra
2023-06-15  0:01       ` David Vernet
2023-06-15  4:49         ` Aaron Lu
2023-06-15  7:31           ` Aaron Lu
2023-06-15 23:26             ` David Vernet
2023-06-16  0:53               ` Aaron Lu
2023-06-20 17:36                 ` David Vernet
2023-06-21  2:35                   ` Aaron Lu
2023-06-21  2:43                     ` David Vernet
2023-06-21  4:54                       ` Aaron Lu
2023-06-21  5:43                         ` David Vernet
2023-06-21  6:03                           ` Aaron Lu
2023-06-22 15:57                             ` Chris Mason
2023-06-13  8:41   ` Peter Zijlstra
2023-06-14 20:26     ` David Vernet
2023-06-16  8:08   ` Vincent Guittot
2023-06-20 19:54     ` David Vernet
2023-06-20 21:37       ` Roman Gushchin
2023-06-21 14:22       ` Peter Zijlstra
2023-06-19  6:13   ` Gautham R. Shenoy
2023-06-20 20:08     ` David Vernet
2023-06-21  8:17       ` Gautham R. Shenoy
2023-06-22  1:43         ` David Vernet
2023-06-22  9:11           ` Gautham R. Shenoy
2023-06-22 10:29             ` Peter Zijlstra
2023-06-23  9:50               ` Gautham R. Shenoy [this message]
2023-06-26  6:04                 ` Gautham R. Shenoy
2023-06-27  3:17                   ` David Vernet
2023-06-27 16:31                     ` Chris Mason
2023-06-21 14:20   ` Peter Zijlstra
2023-06-21 20:34     ` David Vernet
2023-06-22 10:58       ` Peter Zijlstra
2023-06-22 14:43         ` David Vernet
2023-07-10 11:57 ` [RFC PATCH 0/3] " K Prateek Nayak
2023-07-11  4:43   ` David Vernet
2023-07-11  5:06     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJVq1+dSkMAsOIKw@BLR-5CG11610CF.amd.com \
    --to=gautham.shenoy@amd.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).