All of lore.kernel.org
 help / color / mirror / Atom feed
* RE:  Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC
@ 2021-04-29  2:38 Song Bao Hua (Barry Song)
  0 siblings, 0 replies; only message in thread
From: Song Bao Hua (Barry Song) @ 2021-04-29  2:38 UTC (permalink / raw)
  To: Dietmar Eggemann, Vincent Guittot
  Cc: tim.c.chen, catalin.marinas, will, rjw, bp, tglx, mingo, lenb,
	rostedt, bsegall, mgorman, msys.mizuma, valentin.schneider,
	gregkh, Jonathan Cameron, juri.lelli, mark.rutland, sudeep.holla,
	aubrey.li, linux-kernel, linux-acpi, x86, xuwei (O), Zengtao (B),
	guodong.xu, yangyicong, Liguozhu (Kenneth),
	hpa, linux-arm-kernel, peterz

> >>
> >>> -----Original Message-----
> >>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> 
> [...]
> 
> >>> On 20/04/2021 02:18, Barry Song wrote:
> 
> [...]
> 
> >> I am really confused. The whole code has only checked if wake_flags
> >> has WF_TTWU, it has never checked if sd_domain has SD_BALANCE_WAKE flag.
> >
> > look at :
> > #define WF_TTWU     0x08 /* Wakeup;            maps to SD_BALANCE_WAKE */
> >
> > so  when wake_wide return false, we use the wake_affine mecanism but
> > if it's false then we fllback to default mode which looks for:
> > if (tmp->flags & sd_flag)
> >
> > This means looking for SD_BALANCE_WAKE which is never set
> >
> > so sd will stay NULL and you will end up calling select_idle_sibling anyway
> >
> >>
> >> static int
> >> select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >> {
> >>         ...
> >>
> >>         if (wake_flags & WF_TTWU) {
> >>                 record_wakee(p);
> >>
> >>                 if (sched_energy_enabled()) {
> >>                         new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> >>                         if (new_cpu >= 0)
> >>                                 return new_cpu;
> >>                         new_cpu = prev_cpu;
> >>                 }
> >>
> >>                 want_affine = !wake_wide(p) && cpumask_test_cpu(cpu,
> p->cpus_ptr);
> >>         }
> >> }
> >>
> >> And try_to_wake_up() has always set WF_TTWU:
> >> static int
> >> try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >> {
> >>         cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
> >>         ...
> >> }
> >>
> >> So the change in wake_wide will actually affect the value of want_affine.
> >> And I did also see code entered slow path during my benchmark.
> 
> Yes, this is happening but IMHO not for wakeups. Check wake_flags for
> the tasks which go through `slow path` on your machine. They should have
> WF_EXEC or WF_FORK, not WF_TTWU (& WF_SYNC).

Yes. Both of you are right. The slow path I reported yesterday came from
WF_FORK actually.

> 
> >> One issue I mentioned during linaro open discussion is that
> >> since I have moved to use cluster size to decide the value
> >> of wake_wide, relatively less tasks will make wake_wide()
> >> decide to go to slow path, thus, tasks begin to spread to
> >> other NUMA,  but actually llc_size might be able to contain
> >> those tasks. So a possible model might be:
> >> static int wake_wide(struct task_struct *p)
> >> {
> >>         tasksize < cluster : scan cluster
> >>         tasksize > llc      : slow path
> >>         tasksize > cluster && tasksize < llc: scan llc
> >> }
> >>
> >> thoughts?
> 
> Like Vincent explained, the return value of wake_wide() doesn't matter.
> For wakeups you always end up in sis().

Though we will never go to slow path, wake_wide() will affect want_affine,
so eventually affect the "new_cpu"?

	for_each_domain(cpu, tmp) {
		/*
		 * If both 'cpu' and 'prev_cpu' are part of this domain,
		 * cpu is a valid SD_WAKE_AFFINE target.
		 */
		if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
		    cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
			if (cpu != prev_cpu)
				new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);

			sd = NULL; /* Prefer wake_affine over balance flags */
			break;
		}

		if (tmp->flags & sd_flag)
			sd = tmp;
		else if (!want_affine)
			break;
	}

If wake_affine is false, the above won't execute, new_cpu(target) will
always be "prev_cpu"? so when "task size > cluster size" in wake_wide(),
this means we won't pull the wakee to the cluster of waker since target
is always prev_cpu? It seems sensible.

Thanks
Barry


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-04-29  2:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  2:38 Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC Song Bao Hua (Barry Song)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.