All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/fair: bring back select_idle_smt, but differently
@ 2021-03-21 19:03 Rik van Riel
  2021-03-22 11:03 ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2021-03-21 19:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

    Mel Gorman did some nice work in 9fe1f127b913
    ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
    being more efficient at finding an idle CPU, and in tasks spending less
    time waiting to be run, both according to the schedstats run_delay
    numbers, and according to measured application latencies. Yay.
    
    The flip side of this is that we see more task migrations (about
    30% more), higher cache misses, higher memory bandwidth utilization,
    and higher CPU use, for the same number of requests/second.
    
    This is most pronounced on a memcache type workload, which saw
    a consistent 1-3% increase in total CPU use on the system, due
    to those increased task migrations leading to higher L2 cache
    miss numbers, and higher memory utilization. The exclusive L3
    cache on Skylake does us no favors there.
    
    On our web serving workload, that effect is usually negligible,
    but occasionally as large as a 9% regression in the number of
    requests served, due to some interaction between scheduler latency
    and the web load balancer.
    
    It appears that the increased number of CPU migrations is generally
    a good thing, since it leads to lower cpu_delay numbers, reflecting
    the fact that tasks get to run faster. However, the reduced locality
    and the corresponding increase in L2 cache misses hurts a little.
    
    The patch below appears to fix the regression, while keeping the
    benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
    with a twist: when a socket has no idle cores, check to see if the
    sibling of "prev" is idle, before searching all the other CPUs.
    
    This fixes both the occasional 9% regression on the web serving
    workload, and the continuous 2% CPU use regression on the memcache
    type workload.
    
    With Mel's patches and this patch together, the p95 and p99 response
    times for the memcache type application improve by about 20% over what
    they were before Mel's patches got merged.
    
    Signed-off-by: Rik van Riel <riel@surriel.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..0c986972f4cd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
+target)
+{
+	int cpu;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -1;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
@@ -6155,6 +6182,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		time = cpu_clock(this);
 	}
 
+	if (!smt && cpus_share_cache(prev, target)) {
+		/* No idle core. Check if prev has an idle sibling. */
+		i = select_idle_smt(p, sd, prev);
+		if ((unsigned int)i < nr_cpumask_bits)
+			return i;
+	}
+
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (smt) {
 			i = select_idle_core(p, cpu, cpus, &idle_cpu);
@@ -6307,7 +6341,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	i = select_idle_cpu(p, sd, prev, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/fair: bring back select_idle_smt, but differently
  2021-03-21 19:03 [PATCH v2] sched/fair: bring back select_idle_smt, but differently Rik van Riel
@ 2021-03-22 11:03 ` Mel Gorman
  2021-03-22 15:07   ` Rik van Riel
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
  0 siblings, 2 replies; 26+ messages in thread
From: Mel Gorman @ 2021-03-22 11:03 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

On Sun, Mar 21, 2021 at 03:03:58PM -0400, Rik van Riel wrote:
>     Mel Gorman did some nice work in 9fe1f127b913
>     ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
>     being more efficient at finding an idle CPU, and in tasks spending less
>     time waiting to be run, both according to the schedstats run_delay
>     numbers, and according to measured application latencies. Yay.
>     

Other than unusual indentation of the changelog, yey for part 1.

>     The flip side of this is that we see more task migrations (about
>     30% more), higher cache misses, higher memory bandwidth utilization,
>     and higher CPU use, for the same number of requests/second.
>     

I am having difficulty with this part and whether this patch affects task
migrations in particular.

>     This is most pronounced on a memcache type workload, which saw
>     a consistent 1-3% increase in total CPU use on the system, due
>     to those increased task migrations leading to higher L2 cache
>     miss numbers, and higher memory utilization. The exclusive L3
>     cache on Skylake does us no favors there.
>     

Out of curiousity, what is the load generator for memcache or is this
based on analysis of a production workload? I ask because mutilate
(https://github.com/leverich/mutilate) is allegedly a load generator
that can simulate FaceBook patterns but it is old. I would be interested
in hearing if mutilate is used and if so, what parameters the load
generator is given.

>     On our web serving workload, that effect is usually negligible,
>     but occasionally as large as a 9% regression in the number of
>     requests served, due to some interaction between scheduler latency
>     and the web load balancer.
>     
>     It appears that the increased number of CPU migrations is generally
>     a good thing, since it leads to lower cpu_delay numbers, reflecting
>     the fact that tasks get to run faster. However, the reduced locality
>     and the corresponding increase in L2 cache misses hurts a little.
>     

This is understandable because finding an idle CPU and incurring the
migration cost can be better than stacking a task on a busy CPU for
workloads that are sensitive to wakeup latency.

>     The patch below appears to fix the regression, while keeping the
>     benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
>     with a twist: when a socket has no idle cores, check to see if the
>     sibling of "prev" is idle, before searching all the other CPUs.
>     

But this is the part I'm having a problem with. If the sibling of prev is
idle then a task migration cost is still incurred. The main difference is
that it's more likely to share a L1 or L2 cache and with no idle cores,
some sharing of resources between HT siblings is inevitable.

Can you confirm that task migrations are still higher with this patch?

>     This fixes both the occasional 9% regression on the web serving
>     workload, and the continuous 2% CPU use regression on the memcache
>     type workload.
>     
>     With Mel's patches and this patch together, the p95 and p99 response
>     times for the memcache type application improve by about 20% over what
>     they were before Mel's patches got merged.
>     

Again, I would be very interested in hearing how this conclusion was
reached because I can update mmtests accordingly and wire it into the
"standard battery of scheduler workloads".

>     Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..0c986972f4cd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
>  	return -1;
>  }
>  
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
> +target)
> +{
> +	int cpu;
> +
> +	if (!static_branch_likely(&sched_smt_present))
> +		return -1;
> +
> +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +			continue;
> +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +			return cpu;
> +	}
> +
> +	return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>  

Take a look at Barry's patch
"!static_branch_likely(&sched_smt_present)". If that is applied first then
you can remove the static_branch_likely check here as it'll be covered
by an earlier test_idle_cores() call.

As the ordering is not known, just note that it's a potential follow-up
if Barry's patch was merged after yours.

>  static inline void set_idle_cores(int cpu, int val)
> @@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
>  	return __select_idle_cpu(core);
>  }
>  
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +	return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>  
>  /*
> @@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
>  {
>  	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>  	int i, cpu, idle_cpu = -1, nr = INT_MAX;
> @@ -6155,6 +6182,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  		time = cpu_clock(this);
>  	}
>  
> +	if (!smt && cpus_share_cache(prev, target)) {
> +		/* No idle core. Check if prev has an idle sibling. */
> +		i = select_idle_smt(p, sd, prev);
> +		if ((unsigned int)i < nr_cpumask_bits)
> +			return i;
> +	}
> +
>  	for_each_cpu_wrap(cpu, cpus, target) {
>  		if (smt) {
>  			i = select_idle_core(p, cpu, cpus, &idle_cpu);

Please consider moving this block within the SIS_PROP && !smt check
above. It could become something like

if (!smt) {
	/* No idle core. Check if prev has an idle sibling. */
	i = select_idle_smt(p, sd, prev);
	if ((unsigned int)i < nr_cpumask_bits)
		return i;

	if (sched_feat(SIS_PROP)) {
		SIS_PROP STUFF
	}
}

That way you avoid some calculations and a clock read if the HT sibling
is idle.

Second, select_idle_smt() does not use the cpus mask so consider moving
the cpus initialisation after select_idle_smt() has been called.
Specifically this initialisation

	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

Alternatively, clear the bits in the SMT sibling scan to avoid checking
the siblings twice. It's a tradeoff because initialising and clearing
bits is not free and the cost is wasted if a sibling is free.

A third concern, although it is mild, is that the SMT scan ignores the
SIS_PROP limits on the depth search. This patch may increase the scan
depth as a result. It's only a mild concern as limiting the depth of a
search is a magic number anyway. 

Otherwise, the biggest difference from historical behaviour is that we are
explicitly favouring SMT siblings when !idle_cores and !idle_cores can
be inaccurate. That is going to be a "win some, lose some" scenario and
might show up in overloaded workloads that rapidly idle (e.g. hackbench).
I don't have a strong opinion on this one because *some* HT sibling
is going to be used and it's workload and hardware dependant what the
impact is.

I think I'll run this through a short run of some scheduler loads but
I'll wait to see if you decide to create another version based on this
review.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/fair: bring back select_idle_smt, but differently
  2021-03-22 11:03 ` Mel Gorman
@ 2021-03-22 15:07   ` Rik van Riel
  2021-03-22 15:33     ` Mel Gorman
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
  1 sibling, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2021-03-22 15:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

[-- Attachment #1: Type: text/plain, Size: 8813 bytes --]

On Mon, 2021-03-22 at 11:03 +0000, Mel Gorman wrote:
> On Sun, Mar 21, 2021 at 03:03:58PM -0400, Rik van Riel wrote:
> >     Mel Gorman did some nice work in 9fe1f127b913
> >     ("sched/fair: Merge select_idle_core/cpu()"), resulting in the
> > kernel
> >     being more efficient at finding an idle CPU, and in tasks
> > spending less
> >     time waiting to be run, both according to the schedstats
> > run_delay
> >     numbers, and according to measured application latencies. Yay.
> >     
> 
> Other than unusual indentation of the changelog, yey for part 1.
> 
> >     The flip side of this is that we see more task migrations
> > (about
> >     30% more), higher cache misses, higher memory bandwidth
> > utilization,
> >     and higher CPU use, for the same number of requests/second.
> >     
> 
> I am having difficulty with this part and whether this patch affects
> task
> migrations in particular.

Sorry, I should be more clear in the changelog for the
next version. Task migrations continue to be high with
this patch applied, but memory bandwidth and L2 cache
misses go back down, due to better cache locality.

> >     This is most pronounced on a memcache type workload, which saw
> >     a consistent 1-3% increase in total CPU use on the system, due
> >     to those increased task migrations leading to higher L2 cache
> >     miss numbers, and higher memory utilization. The exclusive L3
> >     cache on Skylake does us no favors there.
> >     
> 
> Out of curiousity, what is the load generator for memcache or is this
> based on analysis of a production workload? I ask because mutilate
> (https://github.com/leverich/mutilate) is allegedly a load generator
> that can simulate FaceBook patterns but it is old. I would be
> interested
> in hearing if mutilate is used and if so, what parameters the load
> generator is given.

I had never heard of mutilate, I'll take a look at that.

I am running systems that get real production queries, but
at a higher average load than regular production systems.
Also, the same queries get replicated out to 3 systems on
the A and B side each, which seems to be enough to factor
out random noise for this workload.

> This is understandable because finding an idle CPU and incurring the
> migration cost can be better than stacking a task on a busy CPU for
> workloads that are sensitive to wakeup latency.

It absolutely is, most of the time. Our web servers have
"only" about 70k context switches/second on a 36 CPU system,
while the memcache style workload has about 170k context
switches/second on the same hardware.

> But this is the part I'm having a problem with. If the sibling of
> prev is
> idle then a task migration cost is still incurred. The main
> difference is
> that it's more likely to share a L1 or L2 cache and with no idle
> cores,
> some sharing of resources between HT siblings is inevitable.
> 
> Can you confirm that task migrations are still higher with this
> patch?

They are higher. In fact, with my patch the number of task
migrations increases ever so slightly over what it is with
just your patches.  However, memory bandwidth used, and the
number of L2 cache misses go down.

> >     This fixes both the occasional 9% regression on the web serving
> >     workload, and the continuous 2% CPU use regression on the
> > memcache
> >     type workload.
> >     
> >     With Mel's patches and this patch together, the p95 and p99
> > response
> >     times for the memcache type application improve by about 20%
> > over what
> >     they were before Mel's patches got merged.
> 
> Again, I would be very interested in hearing how this conclusion was
> reached because I can update mmtests accordingly and wire it into the
> "standard battery of scheduler workloads".

I monitor a number of different metrics during this test:
- number of task migrations
- number of context switches
- p95 & p99 application response latency
- memory bandwidth used
- L2 cache misses per 1000 instructions
- CPU utilization (total, user, system)
- the cpu_delay schedstat number

There are other metrics too, but these seem the most useful
ones.

With just your patches applied, I see an increase in:
- number of task migrations
- amount of CPU time used
- memory bandwidth & L2 cache misses
and a reduction in:
- cpu_delay
- p95 & p99 application response latency

With my patch on top, I continue to see the benefits in
the cpu_delay and application response latency metrics,
while the CPU time, memory bandwidth, and L2 cache miss
metrics all go back down.

I suspect this is an artifact of the L3 CPU cache on
Skylake being an eviction cache, rather than an
inclusive cache. When a task is moved from one CPU
core to another, it cannot load stuff from the L3
cache that hasn't yet been evicted from another CPU
core's L2 cache.

> >     Signed-off-by: Rik van Riel <riel@surriel.com>
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 794c2cb945f8..0c986972f4cd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6098,6 +6098,28 @@ static int select_idle_core(struct
> > task_struct *p, int core, struct cpumask *cpu
> >  	return -1;
> >  }
> >  
> > +/*
> > + * Scan the local SMT mask for idle CPUs.
> > + */
> > +static int select_idle_smt(struct task_struct *p, struct
> > sched_domain *sd, int
> > +target)
> > +{
> > +	int cpu;
> > +
> > +	if (!static_branch_likely(&sched_smt_present))
> > +		return -1;
> > +
> > +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> > +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> > +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > +			continue;
> > +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > +			return cpu;
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> >  #else /* CONFIG_SCHED_SMT */
> >  
> 
> Take a look at Barry's patch
> "!static_branch_likely(&sched_smt_present)". If that is applied first
> then
> you can remove the static_branch_likely check here as it'll be
> covered
> by an earlier test_idle_cores() call.
> 
> As the ordering is not known, just note that it's a potential follow-
> up
> if Barry's patch was merged after yours.

I'll keep an eye out for those.

> > +	if (!smt && cpus_share_cache(prev, target)) {
> > +		/* No idle core. Check if prev has an idle sibling. */
> > +		i = select_idle_smt(p, sd, prev);
> > +		if ((unsigned int)i < nr_cpumask_bits)
> > +			return i;
> > +	}
> > +
> >  	for_each_cpu_wrap(cpu, cpus, target) {
> >  		if (smt) {
> >  			i = select_idle_core(p, cpu, cpus, &idle_cpu);
> 
> Please consider moving this block within the SIS_PROP && !smt check
> above. It could become something like

I'll try that right now. That is a nice cleanup, and
potential optimization.

> Second, select_idle_smt() does not use the cpus mask so consider
> moving
> the cpus initialisation after select_idle_smt() has been called.
> Specifically this initialisation
> 
> 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> 
> Alternatively, clear the bits in the SMT sibling scan to avoid
> checking
> the siblings twice. It's a tradeoff because initialising and clearing
> bits is not free and the cost is wasted if a sibling is free.

If we're doing that, should we also clear "target" and "prev"
from the mask?  After all, we might scan those twice with
the current code...

> A third concern, although it is mild, is that the SMT scan ignores
> the
> SIS_PROP limits on the depth search. This patch may increase the scan
> depth as a result. It's only a mild concern as limiting the depth of
> a
> search is a magic number anyway. 

Agreed, placing the search inside the SIS_PROP block is
going to clip the search differently than placing it
outside, too.

Probably no big deal, but I'll push a kernel with
that change into the tests, anyway :)

> Otherwise, the biggest difference from historical behaviour is that
> we are
> explicitly favouring SMT siblings when !idle_cores and !idle_cores
> can
> be inaccurate. That is going to be a "win some, lose some" scenario
> and
> might show up in overloaded workloads that rapidly idle (e.g.
> hackbench).
> I don't have a strong opinion on this one because *some* HT sibling
> is going to be used and it's workload and hardware dependant what the
> impact is.
> 
> I think I'll run this through a short run of some scheduler loads but
> I'll wait to see if you decide to create another version based on
> this
> review.

I'll make the changes you suggested right now, and will
kick off a test. If things go as expected, I'll send out
a v3 in a few hours :)

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/fair: bring back select_idle_smt, but differently
  2021-03-22 15:07   ` Rik van Riel
@ 2021-03-22 15:33     ` Mel Gorman
  2021-03-23  2:08       ` Rik van Riel
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2021-03-22 15:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

On Mon, Mar 22, 2021 at 11:07:47AM -0400, Rik van Riel wrote:
> > >     The flip side of this is that we see more task migrations
> > > (about
> > >     30% more), higher cache misses, higher memory bandwidth
> > > utilization,
> > >     and higher CPU use, for the same number of requests/second.
> > >     
> > 
> > I am having difficulty with this part and whether this patch affects
> > task
> > migrations in particular.
> 
> Sorry, I should be more clear in the changelog for the
> next version. Task migrations continue to be high with
> this patch applied, but memory bandwidth and L2 cache
> misses go back down, due to better cache locality.
> 

That's completely fine and matches what I expected.

> > >     This is most pronounced on a memcache type workload, which saw

> > >     a consistent 1-3% increase in total CPU use on the system, due
> > >     to those increased task migrations leading to higher L2 cache
> > >     miss numbers, and higher memory utilization. The exclusive L3
> > >     cache on Skylake does us no favors there.
> > >     
> > 
> > Out of curiousity, what is the load generator for memcache or is this
> > based on analysis of a production workload? I ask because mutilate
> > (https://github.com/leverich/mutilate) is allegedly a load generator
> > that can simulate FaceBook patterns but it is old. I would be
> > interested
> > in hearing if mutilate is used and if so, what parameters the load
> > generator is given.
> 
> I had never heard of mutilate, I'll take a look at that.
> 
> I am running systems that get real production queries, but
> at a higher average load than regular production systems.
> Also, the same queries get replicated out to 3 systems on
> the A and B side each, which seems to be enough to factor
> out random noise for this workload.
> 

If you do look into mutilate and can give it a distribution that
approximates the production test then then I'd love to hear the
configuration details so I can put it into mmtests. If that is not feasible
or it's excessively time consuming, don't worry about it.

> > > <SNIP>
> > > +	if (!smt && cpus_share_cache(prev, target)) {
> > > +		/* No idle core. Check if prev has an idle sibling. */
> > > +		i = select_idle_smt(p, sd, prev);
> > > +		if ((unsigned int)i < nr_cpumask_bits)
> > > +			return i;
> > > +	}
> > > +
> > >  	for_each_cpu_wrap(cpu, cpus, target) {
> > >  		if (smt) {
> > >  			i = select_idle_core(p, cpu, cpus, &idle_cpu);
> > 
> > Please consider moving this block within the SIS_PROP && !smt check
> > above. It could become something like
> 
> I'll try that right now. That is a nice cleanup, and
> potential optimization.
> 

Great.

> > Second, select_idle_smt() does not use the cpus mask so consider
> > moving
> > the cpus initialisation after select_idle_smt() has been called.
> > Specifically this initialisation
> > 
> > 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > 
> > Alternatively, clear the bits in the SMT sibling scan to avoid
> > checking
> > the siblings twice. It's a tradeoff because initialising and clearing
> > bits is not free and the cost is wasted if a sibling is free.
> 
> If we're doing that, should we also clear "target" and "prev"
> from the mask?  After all, we might scan those twice with
> the current code...
> 

If trying that, I would put that in a separate patch. At one point
I did play with clearing prev, target and recent but hit problems.
Initialising the mask and clearing them in select_idle_sibling() hurt
the fast path and doing it later was not much better. IIRC, the problem
I hit was that the cost of clearing multiple CPUs before the search was
not offset by gains from a more efficient search.

If I had to guess, simply initialising cpumask after calling
select_idle_smt() will be faster for your particular case because you
have a reasonable expectation that prev's SMT sibling is idle when there
are no idle cores. Checking if prev's sibling is free when there are no
idle cores is fairly cheap in comparison to a cpumask initialisation and
partial clearing.

If you have the testing capacity and time, test both.

> > A third concern, although it is mild, is that the SMT scan ignores
> > the
> > SIS_PROP limits on the depth search. This patch may increase the scan
> > depth as a result. It's only a mild concern as limiting the depth of
> > a
> > search is a magic number anyway. 
> 
> Agreed, placing the search inside the SIS_PROP block is
> going to clip the search differently than placing it
> outside, too.
> 
> Probably no big deal, but I'll push a kernel with
> that change into the tests, anyway :)
> 

Best plan because select_idle_sibling is always surprising :)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/fair: bring back select_idle_smt, but differently
  2021-03-22 15:33     ` Mel Gorman
@ 2021-03-23  2:08       ` Rik van Riel
  0 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2021-03-23  2:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Mon, 2021-03-22 at 15:33 +0000, Mel Gorman wrote:

> If trying that, I would put that in a separate patch. At one point
> I did play with clearing prev, target and recent but hit problems.
> Initialising the mask and clearing them in select_idle_sibling() hurt
> the fast path and doing it later was not much better. IIRC, the
> problem
> I hit was that the cost of clearing multiple CPUs before the search
> was
> not offset by gains from a more efficient search.

I'm definitely avoiding the more expensive operations,
and am only using __cpumask_clear_cpu now :)

> If I had to guess, simply initialising cpumask after calling
> select_idle_smt() will be faster for your particular case because you
> have a reasonable expectation that prev's SMT sibling is idle when
> there
> are no idle cores. Checking if prev's sibling is free when there are
> no
> idle cores is fairly cheap in comparison to a cpumask initialisation
> and
> partial clearing.
> 
> If you have the testing capacity and time, test both.

Kicking off more tests soon. I'll get back with a v3 patch
on Wednesday.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-03-22 11:03 ` Mel Gorman
  2021-03-22 15:07   ` Rik van Riel
@ 2021-03-26 19:19   ` Rik van Riel
  2021-03-28 15:36     ` Mel Gorman
                       ` (3 more replies)
  1 sibling, 4 replies; 26+ messages in thread
From: Rik van Riel @ 2021-03-26 19:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

On Mon, 22 Mar 2021 11:03:06 +0000
Mel Gorman <mgorman@techsingularity.net> wrote:


> Second, select_idle_smt() does not use the cpus mask so consider moving
> the cpus initialisation after select_idle_smt() has been called.
> Specifically this initialisation
> 
> 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> 
> Alternatively, clear the bits in the SMT sibling scan to avoid checking
> the siblings twice. It's a tradeoff because initialising and clearing
> bits is not free and the cost is wasted if a sibling is free.

I tried a number of different variations on moving the CPU mask
initialization, and clearing CPUs from the mask, and failed to
get any clear results from those in testing, even in workloads
with lots of context switches.

Below is a simple version that seems to perform identically to
more complicated versions :)

---8<---
sched,fair: bring back select_idle_smt, but differently

Mel Gorman did some nice work in 9fe1f127b913
("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
being more efficient at finding an idle CPU, and in tasks spending less
time waiting to be run, both according to the schedstats run_delay
numbers, and according to measured application latencies. Yay.

The flip side of this is that we see more task migrations (about
30% more), higher cache misses, higher memory bandwidth utilization,
and higher CPU use, for the same number of requests/second.

This is most pronounced on a memcache type workload, which saw
a consistent 1-3% increase in total CPU use on the system, due
to those increased task migrations leading to higher L2 cache
miss numbers, and higher memory utilization. The exclusive L3
cache on Skylake does us no favors there.

On our web serving workload, that effect is usually negligible.

It appears that the increased number of CPU migrations is generally
a good thing, since it leads to lower cpu_delay numbers, reflecting
the fact that tasks get to run faster. However, the reduced locality
and the corresponding increase in L2 cache misses hurts a little.

The patch below appears to fix the regression, while keeping the
benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
with a twist: when a socket has no idle cores, check to see if the
sibling of "prev" is idle, before searching all the other CPUs.

This fixes both the occasional 9% regression on the web serving
workload, and the continuous 2% CPU use regression on the memcache
type workload.

With Mel's patches and this patch together, task migrations are still
high, but L2 cache misses, memory bandwidth, and CPU time used are back
down to what they were before. The p95 and p99 response times for the
memcache type application improve by about 10% over what they were
before Mel's patches got merged.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 kernel/sched/fair.c | 68 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..69680158963f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
+target)
+{
+	int cpu;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -1;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
@@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP) && !smt) {
-		u64 avg_cost, avg_idle, span_avg;
+	if (!smt) {
+		if (cpus_share_cache(prev, target)) {
+			/* No idle core. Check if prev has an idle sibling. */
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
 
-		/*
-		 * Due to large variance we need a large fuzz factor;
-		 * hackbench in particularly is sensitive here.
-		 */
-		avg_idle = this_rq()->avg_idle / 512;
-		avg_cost = this_sd->avg_scan_cost + 1;
+		if (sched_feat(SIS_PROP)) {
+			u64 avg_cost, avg_idle, span_avg;
 
-		span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
-			nr = div_u64(span_avg, avg_cost);
-		else
-			nr = 4;
+			/*
+			 * Due to large variance we need a large fuzz factor;
+			 * hackbench in particularly is sensitive here.
+			 */
+			avg_idle = this_rq()->avg_idle / 512;
+			avg_cost = this_sd->avg_scan_cost + 1;
 
-		time = cpu_clock(this);
+			span_avg = sd->span_weight * avg_idle;
+			if (span_avg > 4*avg_cost)
+				nr = div_u64(span_avg, avg_cost);
+			else
+				nr = 4;
+
+			time = cpu_clock(this);
+		}
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
@@ -6307,7 +6343,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	i = select_idle_cpu(p, sd, prev, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
@ 2021-03-28 15:36     ` Mel Gorman
  2021-04-06 15:10     ` Vincent Guittot
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2021-03-28 15:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Valentin Schneider

On Fri, Mar 26, 2021 at 03:19:32PM -0400, Rik van Riel wrote:
> ---8<---
> sched,fair: bring back select_idle_smt, but differently
> 
> Mel Gorman did some nice work in 9fe1f127b913
> ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
> being more efficient at finding an idle CPU, and in tasks spending less
> time waiting to be run, both according to the schedstats run_delay
> numbers, and according to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about
> 30% more), higher cache misses, higher memory bandwidth utilization,
> and higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw
> a consistent 1-3% increase in total CPU use on the system, due
> to those increased task migrations leading to higher L2 cache
> miss numbers, and higher memory utilization. The exclusive L3
> cache on Skylake does us no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally
> a good thing, since it leads to lower cpu_delay numbers, reflecting
> the fact that tasks get to run faster. However, the reduced locality
> and the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
> with a twist: when a socket has no idle cores, check to see if the
> sibling of "prev" is idle, before searching all the other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are back
> down to what they were before. The p95 and p99 response times for the
> memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>

FWIW, v3 appears to have performed faster than v2 on the few tests I ran
and the patch looks fine.

Reviewed-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
  2021-03-28 15:36     ` Mel Gorman
@ 2021-04-06 15:10     ` Vincent Guittot
  2021-04-06 15:26       ` Rik van Riel
  2021-04-09 11:24     ` [tip: sched/core] sched/fair: Bring back select_idle_smt(), " tip-bot2 for Rik van Riel
  2021-04-09 16:14     ` tip-bot2 for Rik van Riel
  3 siblings, 1 reply; 26+ messages in thread
From: Vincent Guittot @ 2021-04-06 15:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com> wrote:
>
> On Mon, 22 Mar 2021 11:03:06 +0000
> Mel Gorman <mgorman@techsingularity.net> wrote:
>
>
> > Second, select_idle_smt() does not use the cpus mask so consider moving
> > the cpus initialisation after select_idle_smt() has been called.
> > Specifically this initialisation
> >
> >       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >
> > Alternatively, clear the bits in the SMT sibling scan to avoid checking
> > the siblings twice. It's a tradeoff because initialising and clearing
> > bits is not free and the cost is wasted if a sibling is free.
>
> I tried a number of different variations on moving the CPU mask
> initialization, and clearing CPUs from the mask, and failed to
> get any clear results from those in testing, even in workloads
> with lots of context switches.
>
> Below is a simple version that seems to perform identically to
> more complicated versions :)
>
> ---8<---
> sched,fair: bring back select_idle_smt, but differently
>
> Mel Gorman did some nice work in 9fe1f127b913
> ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
> being more efficient at finding an idle CPU, and in tasks spending less
> time waiting to be run, both according to the schedstats run_delay
> numbers, and according to measured application latencies. Yay.
>
> The flip side of this is that we see more task migrations (about
> 30% more), higher cache misses, higher memory bandwidth utilization,
> and higher CPU use, for the same number of requests/second.
>
> This is most pronounced on a memcache type workload, which saw
> a consistent 1-3% increase in total CPU use on the system, due
> to those increased task migrations leading to higher L2 cache
> miss numbers, and higher memory utilization. The exclusive L3
> cache on Skylake does us no favors there.
>
> On our web serving workload, that effect is usually negligible.
>
> It appears that the increased number of CPU migrations is generally
> a good thing, since it leads to lower cpu_delay numbers, reflecting
> the fact that tasks get to run faster. However, the reduced locality
> and the corresponding increase in L2 cache misses hurts a little.
>
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
> with a twist: when a socket has no idle cores, check to see if the
> sibling of "prev" is idle, before searching all the other CPUs.
>
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
>
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are back
> down to what they were before. The p95 and p99 response times for the
> memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  kernel/sched/fair.c | 68 ++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..69680158963f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
>         return -1;
>  }
>
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
> +target)
> +{
> +       int cpu;
> +
> +       if (!static_branch_likely(&sched_smt_present))
> +               return -1;
> +
> +       for_each_cpu(cpu, cpu_smt_mask(target)) {
> +               if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +                   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +                       continue;
> +               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +                       return cpu;
> +       }
> +
> +       return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
>         return __select_idle_cpu(core);
>  }
>
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +       return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>
>  /*
> @@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
>  {
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> -       if (sched_feat(SIS_PROP) && !smt) {
> -               u64 avg_cost, avg_idle, span_avg;
> +       if (!smt) {
> +               if (cpus_share_cache(prev, target)) {

Have you checked the impact on no smt system ? would worth a static branch.

Also, this doesn't need to be in select_idle_cpu() which aims to loop
the sched_domain becaus you only compare  target and prev. So you can
move this call to select_idle_smt() in select_idle_sibling() as part
of all the others tests done on prev/target/recent_used_cpu before
calling select_idle_cpu() and even save a useless  sd =
rcu_dereference(per_cpu(sd_llc, target));


> +                       /* No idle core. Check if prev has an idle sibling. */
> +                       i = select_idle_smt(p, sd, prev);
> +                       if ((unsigned int)i < nr_cpumask_bits)
> +                               return i;
> +               }
>
> -               /*
> -                * Due to large variance we need a large fuzz factor;
> -                * hackbench in particularly is sensitive here.
> -                */
> -               avg_idle = this_rq()->avg_idle / 512;
> -               avg_cost = this_sd->avg_scan_cost + 1;
> +               if (sched_feat(SIS_PROP)) {
> +                       u64 avg_cost, avg_idle, span_avg;
>
> -               span_avg = sd->span_weight * avg_idle;
> -               if (span_avg > 4*avg_cost)
> -                       nr = div_u64(span_avg, avg_cost);
> -               else
> -                       nr = 4;
> +                       /*
> +                        * Due to large variance we need a large fuzz factor;
> +                        * hackbench in particularly is sensitive here.
> +                        */
> +                       avg_idle = this_rq()->avg_idle / 512;
> +                       avg_cost = this_sd->avg_scan_cost + 1;
>
> -               time = cpu_clock(this);
> +                       span_avg = sd->span_weight * avg_idle;
> +                       if (span_avg > 4*avg_cost)
> +                               nr = div_u64(span_avg, avg_cost);
> +                       else
> +                               nr = 4;
> +
> +                       time = cpu_clock(this);
> +               }
>         }
>
>         for_each_cpu_wrap(cpu, cpus, target) {
> @@ -6307,7 +6343,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>         if (!sd)
>                 return target;
>
> -       i = select_idle_cpu(p, sd, target);
> +       i = select_idle_cpu(p, sd, prev, target);
>         if ((unsigned)i < nr_cpumask_bits)
>                 return i;
>
> --
> 2.25.4
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:10     ` Vincent Guittot
@ 2021-04-06 15:26       ` Rik van Riel
  2021-04-06 15:31         ` Vincent Guittot
  2021-04-07  7:17         ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Rik van Riel @ 2021-04-06 15:26 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com> wrote:
> 
> > -static int select_idle_cpu(struct task_struct *p, struct
> > sched_domain *sd, int target)
> > +static int select_idle_cpu(struct task_struct *p, struct
> > sched_domain *sd, int prev, int target)
> >  {
> >         struct cpumask *cpus =
> > this_cpu_cpumask_var_ptr(select_idle_mask);
> >         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > task_struct *p, struct sched_domain *sd, int t
> > 
> >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > 
> > -       if (sched_feat(SIS_PROP) && !smt) {
> > -               u64 avg_cost, avg_idle, span_avg;
> > +       if (!smt) {
> > +               if (cpus_share_cache(prev, target)) {
> 
> Have you checked the impact on no smt system ? would worth a static
> branch.
> 
> Also, this doesn't need to be in select_idle_cpu() which aims to loop
> the sched_domain becaus you only compare  target and prev. So you can
> move this call to select_idle_smt() in select_idle_sibling()

After Mel's rewrite, there no longer are calls to
select_idle_core() or select_idle_smt() in select_idle_sibling().

Everything got folded into one single loop in select_idle_cpu()

I would be happy to pull the static branch out of select_idle_smt()
and place it into this if condition, though. You are right that
would save some overhead on non-smt systems.

Peter, would you prefer a follow-up patch for that or a version 4
of the patch?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:26       ` Rik van Riel
@ 2021-04-06 15:31         ` Vincent Guittot
  2021-04-06 15:33           ` Vincent Guittot
  2021-04-06 15:55           ` Rik van Riel
  2021-04-07  7:17         ` Peter Zijlstra
  1 sibling, 2 replies; 26+ messages in thread
From: Vincent Guittot @ 2021-04-06 15:31 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

On Tue, 6 Apr 2021 at 17:26, Rik van Riel <riel@surriel.com> wrote:
>
> On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com> wrote:
> >
> > > -static int select_idle_cpu(struct task_struct *p, struct
> > > sched_domain *sd, int target)
> > > +static int select_idle_cpu(struct task_struct *p, struct
> > > sched_domain *sd, int prev, int target)
> > >  {
> > >         struct cpumask *cpus =
> > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > >         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > task_struct *p, struct sched_domain *sd, int t
> > >
> > >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > >
> > > -       if (sched_feat(SIS_PROP) && !smt) {
> > > -               u64 avg_cost, avg_idle, span_avg;
> > > +       if (!smt) {
> > > +               if (cpus_share_cache(prev, target)) {
> >
> > Have you checked the impact on no smt system ? would worth a static
> > branch.
> >
> > Also, this doesn't need to be in select_idle_cpu() which aims to loop
> > the sched_domain becaus you only compare  target and prev. So you can
> > move this call to select_idle_smt() in select_idle_sibling()
>
> After Mel's rewrite, there no longer are calls to
> select_idle_core() or select_idle_smt() in select_idle_sibling().

select_idle_smt() had even disappeared that why it was not in
select_idle_sibling

>
> Everything got folded into one single loop in select_idle_cpu()

but this is done completely out of the loop so we don't need to
complify the function with unrelated stuff

>
> I would be happy to pull the static branch out of select_idle_smt()
> and place it into this if condition, though. You are right that
> would save some overhead on non-smt systems.
>
> Peter, would you prefer a follow-up patch for that or a version 4
> of the patch?
>
> --
> All Rights Reversed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:31         ` Vincent Guittot
@ 2021-04-06 15:33           ` Vincent Guittot
  2021-04-06 15:55           ` Rik van Riel
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent Guittot @ 2021-04-06 15:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

On Tue, 6 Apr 2021 at 17:31, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Tue, 6 Apr 2021 at 17:26, Rik van Riel <riel@surriel.com> wrote:
> >
> > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com> wrote:
> > >
> > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int target)
> > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int prev, int target)
> > > >  {
> > > >         struct cpumask *cpus =
> > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > >         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > task_struct *p, struct sched_domain *sd, int t
> > > >
> > > >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > >
> > > > -       if (sched_feat(SIS_PROP) && !smt) {
> > > > -               u64 avg_cost, avg_idle, span_avg;
> > > > +       if (!smt) {
> > > > +               if (cpus_share_cache(prev, target)) {
> > >
> > > Have you checked the impact on no smt system ? would worth a static
> > > branch.
> > >
> > > Also, this doesn't need to be in select_idle_cpu() which aims to loop
> > > the sched_domain becaus you only compare  target and prev. So you can
> > > move this call to select_idle_smt() in select_idle_sibling()
> >
> > After Mel's rewrite, there no longer are calls to
> > select_idle_core() or select_idle_smt() in select_idle_sibling().
>
> select_idle_smt() had even disappeared that why it was not in
> select_idle_sibling
>
> >
> > Everything got folded into one single loop in select_idle_cpu()
>
> but this is done completely out of the loop so we don't need to
> complify the function with unrelated stuff


s/complify/complexify/

>
> >
> > I would be happy to pull the static branch out of select_idle_smt()
> > and place it into this if condition, though. You are right that
> > would save some overhead on non-smt systems.
> >
> > Peter, would you prefer a follow-up patch for that or a version 4
> > of the patch?
> >
> > --
> > All Rights Reversed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:31         ` Vincent Guittot
  2021-04-06 15:33           ` Vincent Guittot
@ 2021-04-06 15:55           ` Rik van Riel
  2021-04-06 16:13             ` Vincent Guittot
  1 sibling, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2021-04-06 15:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

[-- Attachment #1: Type: text/plain, Size: 2217 bytes --]

On Tue, 2021-04-06 at 17:31 +0200, Vincent Guittot wrote:
> On Tue, 6 Apr 2021 at 17:26, Rik van Riel <riel@surriel.com> wrote:
> > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com>
> > > wrote:
> > > 
> > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int target)
> > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int prev, int target)
> > > >  {
> > > >         struct cpumask *cpus =
> > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > >         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > task_struct *p, struct sched_domain *sd, int t
> > > > 
> > > >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > > 
> > > > -       if (sched_feat(SIS_PROP) && !smt) {
> > > > -               u64 avg_cost, avg_idle, span_avg;
> > > > +       if (!smt) {
> > > > +               if (cpus_share_cache(prev, target)) {
> > > 
> > > Have you checked the impact on no smt system ? would worth a
> > > static
> > > branch.
> > > 
> > > Also, this doesn't need to be in select_idle_cpu() which aims to
> > > loop
> > > the sched_domain becaus you only compare  target and prev. So you
> > > can
> > > move this call to select_idle_smt() in select_idle_sibling()
> > 
> > After Mel's rewrite, there no longer are calls to
> > select_idle_core() or select_idle_smt() in select_idle_sibling().
> 
> select_idle_smt() had even disappeared that why it was not in
> select_idle_sibling
> 
> > Everything got folded into one single loop in select_idle_cpu()
> 
> but this is done completely out of the loop so we don't need to
> complify the function with unrelated stuff

Not entirely. The call to select_idle_smt() is still
conditional on test_idle_cores() returning false.

We only look for the
other sibling if there is no idle
core in the LLC. If there is an idle core, we prefer
that.

Pulling the select_idle_smt() call out of select_idle_cpu()
would mean having to test_idle_cores() twice.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:55           ` Rik van Riel
@ 2021-04-06 16:13             ` Vincent Guittot
  0 siblings, 0 replies; 26+ messages in thread
From: Vincent Guittot @ 2021-04-06 16:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, linux-kernel, Kernel Team, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider

On Tue, 6 Apr 2021 at 17:55, Rik van Riel <riel@surriel.com> wrote:
>
> On Tue, 2021-04-06 at 17:31 +0200, Vincent Guittot wrote:
> > On Tue, 6 Apr 2021 at 17:26, Rik van Riel <riel@surriel.com> wrote:
> > > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel <riel@surriel.com>
> > > > wrote:
> > > >
> > > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > > sched_domain *sd, int target)
> > > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > > sched_domain *sd, int prev, int target)
> > > > >  {
> > > > >         struct cpumask *cpus =
> > > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > > >         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > > task_struct *p, struct sched_domain *sd, int t
> > > > >
> > > > >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > > >
> > > > > -       if (sched_feat(SIS_PROP) && !smt) {
> > > > > -               u64 avg_cost, avg_idle, span_avg;
> > > > > +       if (!smt) {
> > > > > +               if (cpus_share_cache(prev, target)) {
> > > >
> > > > Have you checked the impact on no smt system ? would worth a
> > > > static
> > > > branch.
> > > >
> > > > Also, this doesn't need to be in select_idle_cpu() which aims to
> > > > loop
> > > > the sched_domain becaus you only compare  target and prev. So you
> > > > can
> > > > move this call to select_idle_smt() in select_idle_sibling()
> > >
> > > After Mel's rewrite, there no longer are calls to
> > > select_idle_core() or select_idle_smt() in select_idle_sibling().
> >
> > select_idle_smt() had even disappeared that why it was not in
> > select_idle_sibling
> >
> > > Everything got folded into one single loop in select_idle_cpu()
> >
> > but this is done completely out of the loop so we don't need to
> > complify the function with unrelated stuff
>
> Not entirely. The call to select_idle_smt() is still
> conditional on test_idle_cores() returning false.
>
> We only look for the
> other sibling if there is no idle
> core in the LLC. If there is an idle core, we prefer
> that.
>
> Pulling the select_idle_smt() call out of select_idle_cpu()
> would mean having to test_idle_cores() twice.

In this case passes  the results test_idle_cores as a parameters instead of prev

>
> --
> All Rights Reversed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-06 15:26       ` Rik van Riel
  2021-04-06 15:31         ` Vincent Guittot
@ 2021-04-07  7:17         ` Peter Zijlstra
  2021-04-07  9:41           ` Mel Gorman
  2021-04-07  9:42           ` Vincent Guittot
  1 sibling, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-04-07  7:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Vincent Guittot, Mel Gorman, linux-kernel, Kernel Team,
	Ingo Molnar, Valentin Schneider

On Tue, Apr 06, 2021 at 11:26:37AM -0400, Rik van Riel wrote:
> I would be happy to pull the static branch out of select_idle_smt()
> and place it into this if condition, though. You are right that
> would save some overhead on non-smt systems.
> 
> Peter, would you prefer a follow-up patch for that or a version 4
> of the patch?

Sorry, I was side-tracked with that core scheduling crap.. Something
like the below then?

(Also fixed that stray line-wrap)

---
Subject: sched/fair: Bring back select_idle_smt(), but differently
From: Rik van Riel <riel@surriel.com>
Date: Fri, 26 Mar 2021 15:19:32 -0400

From: Rik van Riel <riel@surriel.com>

Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
select_idle_core/cpu()"), resulting in the kernel being more efficient
at finding an idle CPU, and in tasks spending less time waiting to be
run, both according to the schedstats run_delay numbers, and according
to measured application latencies. Yay.

The flip side of this is that we see more task migrations (about 30%
more), higher cache misses, higher memory bandwidth utilization, and
higher CPU use, for the same number of requests/second.

This is most pronounced on a memcache type workload, which saw a
consistent 1-3% increase in total CPU use on the system, due to those
increased task migrations leading to higher L2 cache miss numbers, and
higher memory utilization. The exclusive L3 cache on Skylake does us
no favors there.

On our web serving workload, that effect is usually negligible.

It appears that the increased number of CPU migrations is generally a
good thing, since it leads to lower cpu_delay numbers, reflecting the
fact that tasks get to run faster. However, the reduced locality and
the corresponding increase in L2 cache misses hurts a little.

The patch below appears to fix the regression, while keeping the
benefit of the lower cpu_delay numbers, by reintroducing
select_idle_smt with a twist: when a socket has no idle cores, check
to see if the sibling of "prev" is idle, before searching all the
other CPUs.

This fixes both the occasional 9% regression on the web serving
workload, and the continuous 2% CPU use regression on the memcache
type workload.

With Mel's patches and this patch together, task migrations are still
high, but L2 cache misses, memory bandwidth, and CPU time used are
back down to what they were before. The p95 and p99 response times for
the memcache type application improve by about 10% over what they were
before Mel's patches got merged.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210326151932.2c187840@imladris.surriel.com
---
 kernel/sched/fair.c |   39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	int cpu;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -1;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6149,11 @@ static inline int select_idle_core(struc
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,7 +6161,7 @@ static inline int select_idle_core(struc
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
@@ -6148,6 +6174,15 @@ static int select_idle_cpu(struct task_s
 	if (!this_sd)
 		return -1;
 
+	/* If we have SMT but there are no idle cores */
+	if (static_branch_likely(&sched_smt_presernt) && !smt) {
+		if (cpus_share_cache(prev, target)) {
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
+	}
+
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
 	if (sched_feat(SIS_PROP) && !smt) {
@@ -6321,7 +6356,7 @@ static int select_idle_sibling(struct ta
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	i = select_idle_cpu(p, sd, prev, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  7:17         ` Peter Zijlstra
@ 2021-04-07  9:41           ` Mel Gorman
  2021-04-07 10:15             ` Peter Zijlstra
  2021-04-07  9:42           ` Vincent Guittot
  1 sibling, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2021-04-07  9:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Vincent Guittot, linux-kernel, Kernel Team,
	Ingo Molnar, Valentin Schneider

On Wed, Apr 07, 2021 at 09:17:18AM +0200, Peter Zijlstra wrote:
> Subject: sched/fair: Bring back select_idle_smt(), but differently
> From: Rik van Riel <riel@surriel.com>
> Date: Fri, 26 Mar 2021 15:19:32 -0400
> 
> From: Rik van Riel <riel@surriel.com>
> 
> Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
> select_idle_core/cpu()"), resulting in the kernel being more efficient
> at finding an idle CPU, and in tasks spending less time waiting to be
> run, both according to the schedstats run_delay numbers, and according
> to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about 30%
> more), higher cache misses, higher memory bandwidth utilization, and
> higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw a
> consistent 1-3% increase in total CPU use on the system, due to those
> increased task migrations leading to higher L2 cache miss numbers, and
> higher memory utilization. The exclusive L3 cache on Skylake does us
> no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally a
> good thing, since it leads to lower cpu_delay numbers, reflecting the
> fact that tasks get to run faster. However, the reduced locality and
> the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing
> select_idle_smt with a twist: when a socket has no idle cores, check
> to see if the sibling of "prev" is idle, before searching all the
> other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are
> back down to what they were before. The p95 and p99 response times for
> the memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20210326151932.2c187840@imladris.surriel.com

I think this is still ok and should not invalidate the previous tests on
v3. While test_idle_cores() was checked on target, as long as target/prev
share cache, the test should be equivalent other than there is a minor
race so

Reviewed-by: Mel Gorman <mgorman@techsingularity.net>

One minor question below though which previously confused me.

> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
>  	return -1;
>  }
>  
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +	int cpu;
> +
> +	if (!static_branch_likely(&sched_smt_present))
> +		return -1;
> +
> +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +			continue;

While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
done previously, I found it hard to believe that the test matters. If
target/prev share a the LLC domain, why would the SMT siblings *not*
share a LLC?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  7:17         ` Peter Zijlstra
  2021-04-07  9:41           ` Mel Gorman
@ 2021-04-07  9:42           ` Vincent Guittot
  2021-04-07  9:54             ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Vincent Guittot @ 2021-04-07  9:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar,
	Valentin Schneider

Le mercredi 07 avril 2021 à 09:17:18 (+0200), Peter Zijlstra a écrit :
> On Tue, Apr 06, 2021 at 11:26:37AM -0400, Rik van Riel wrote:
> > I would be happy to pull the static branch out of select_idle_smt()
> > and place it into this if condition, though. You are right that
> > would save some overhead on non-smt systems.
> > 
> > Peter, would you prefer a follow-up patch for that or a version 4
> > of the patch?
> 
> Sorry, I was side-tracked with that core scheduling crap.. Something
> like the below then?
> 
> (Also fixed that stray line-wrap)
> 
> ---
> Subject: sched/fair: Bring back select_idle_smt(), but differently
> From: Rik van Riel <riel@surriel.com>
> Date: Fri, 26 Mar 2021 15:19:32 -0400
> 
> From: Rik van Riel <riel@surriel.com>
> 
> Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
> select_idle_core/cpu()"), resulting in the kernel being more efficient
> at finding an idle CPU, and in tasks spending less time waiting to be
> run, both according to the schedstats run_delay numbers, and according
> to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about 30%
> more), higher cache misses, higher memory bandwidth utilization, and
> higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw a
> consistent 1-3% increase in total CPU use on the system, due to those
> increased task migrations leading to higher L2 cache miss numbers, and
> higher memory utilization. The exclusive L3 cache on Skylake does us
> no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally a
> good thing, since it leads to lower cpu_delay numbers, reflecting the
> fact that tasks get to run faster. However, the reduced locality and
> the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing
> select_idle_smt with a twist: when a socket has no idle cores, check
> to see if the sibling of "prev" is idle, before searching all the
> other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are
> back down to what they were before. The p95 and p99 response times for
> the memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20210326151932.2c187840@imladris.surriel.com
> ---
>  kernel/sched/fair.c |   39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
>  	return -1;
>  }
>  
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +	int cpu;
> +
> +	if (!static_branch_likely(&sched_smt_present))
> +		return -1;
> +
> +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +			continue;
> +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +			return cpu;
> +	}
> +
> +	return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>  
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6128,6 +6149,11 @@ static inline int select_idle_core(struc
>  	return __select_idle_cpu(core);
>  }
>  
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +	return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>  
>  /*
> @@ -6135,7 +6161,7 @@ static inline int select_idle_core(struc
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
>  {
>  	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>  	int i, cpu, idle_cpu = -1, nr = INT_MAX;
> @@ -6148,6 +6174,15 @@ static int select_idle_cpu(struct task_s
>  	if (!this_sd)
>  		return -1;
>  
> +	/* If we have SMT but there are no idle cores */
> +	if (static_branch_likely(&sched_smt_presernt) && !smt) {
> +		if (cpus_share_cache(prev, target)) {
> +			i = select_idle_smt(p, sd, prev);
> +			if ((unsigned int)i < nr_cpumask_bits)
> +				return i;
> +		}
> +	}
> +
>  	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>  
>  	if (sched_feat(SIS_PROP) && !smt) {
> @@ -6321,7 +6356,7 @@ static int select_idle_sibling(struct ta
>  	if (!sd)
>  		return target;
>  
> -	i = select_idle_cpu(p, sd, target);
> +	i = select_idle_cpu(p, sd, prev, target);
>  	if ((unsigned)i < nr_cpumask_bits)
>  		return i;

I would really prefer to keep that out of select_idle_cpu which aims to merge in one
single loop the walk through sd_llc. In the case of select_idle_smt, this is done outside
the loop:

I would prefer the below which also removed a number of useless and duplicated static_branch_likely(&sched_smt_presernt) in test_idle_cores and select_idle_smt

---
 kernel/sched/fair.c | 48 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d73bdbb2d40..01d0bacedc8d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int cpu, bool def)
 {
 	struct sched_domain_shared *sds;
 
-	if (static_branch_likely(&sched_smt_present)) {
-		sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
-		if (sds)
-			return READ_ONCE(sds->has_idle_cores);
-	}
+	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (sds)
+		return READ_ONCE(sds->has_idle_cores);
 
 	return def;
 }
@@ -6112,6 +6110,25 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
+target)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6145,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,11 +6157,10 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool smt, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
-	bool smt = test_idle_cores(target, false);
 	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
@@ -6242,6 +6263,7 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
+	bool smt = false;
 	struct sched_domain *sd;
 	unsigned long task_util;
 	int i, recent_used_cpu;
@@ -6317,11 +6339,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 		}
 	}
 
+	if (static_branch_likely(&sched_smt_present)) {
+		smt = test_idle_cores(target, false);
+		if (!smt && cpus_share_cache(prev, target)) {
+			/* No idle core. Check if prev has an idle sibling. */
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
+	}
+
 	sd = rcu_dereference(per_cpu(sd_llc, target));
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	i = select_idle_cpu(p, sd, smt, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-- 
2.17.1

>  

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  9:42           ` Vincent Guittot
@ 2021-04-07  9:54             ` Peter Zijlstra
  2021-04-07  9:57               ` Vincent Guittot
  2021-04-07 10:19               ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-04-07  9:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rik van Riel, Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar,
	Valentin Schneider

On Wed, Apr 07, 2021 at 11:42:17AM +0200, Vincent Guittot wrote:
> I would really prefer to keep that out of select_idle_cpu which aims to merge in one
> single loop the walk through sd_llc. In the case of select_idle_smt, this is done outside
> the loop:

Fair enough.

> @@ -6317,11 +6339,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  		}
>  	}
>  
> +	if (static_branch_likely(&sched_smt_present)) {
> +		smt = test_idle_cores(target, false);
> +		if (!smt && cpus_share_cache(prev, target)) {
> +			/* No idle core. Check if prev has an idle sibling. */
> +			i = select_idle_smt(p, sd, prev);
> +			if ((unsigned int)i < nr_cpumask_bits)
> +				return i;
> +		}
> +	}
> +
>  	sd = rcu_dereference(per_cpu(sd_llc, target));
>  	if (!sd)
>  		return target;

It needs to be here, otherwise you're using @sd uninitialized.

> -	i = select_idle_cpu(p, sd, target);
> +	i = select_idle_cpu(p, sd, smt, target);
>  	if ((unsigned)i < nr_cpumask_bits)
>  		return i;

Let me have another poke at it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  9:54             ` Peter Zijlstra
@ 2021-04-07  9:57               ` Vincent Guittot
  2021-04-07 10:19               ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent Guittot @ 2021-04-07  9:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar,
	Valentin Schneider

On Wed, 7 Apr 2021 at 11:55, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 07, 2021 at 11:42:17AM +0200, Vincent Guittot wrote:
> > I would really prefer to keep that out of select_idle_cpu which aims to merge in one
> > single loop the walk through sd_llc. In the case of select_idle_smt, this is done outside
> > the loop:
>
> Fair enough.
>
> > @@ -6317,11 +6339,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >               }
> >       }
> >
> > +     if (static_branch_likely(&sched_smt_present)) {
> > +             smt = test_idle_cores(target, false);
> > +             if (!smt && cpus_share_cache(prev, target)) {
> > +                     /* No idle core. Check if prev has an idle sibling. */
> > +                     i = select_idle_smt(p, sd, prev);
> > +                     if ((unsigned int)i < nr_cpumask_bits)
> > +                             return i;
> > +             }
> > +     }
> > +
> >       sd = rcu_dereference(per_cpu(sd_llc, target));
> >       if (!sd)
> >               return target;
>
> It needs to be here, otherwise you're using @sd uninitialized.

argh yes...

>
> > -     i = select_idle_cpu(p, sd, target);
> > +     i = select_idle_cpu(p, sd, smt, target);
> >       if ((unsigned)i < nr_cpumask_bits)
> >               return i;
>
> Let me have another poke at it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  9:41           ` Mel Gorman
@ 2021-04-07 10:15             ` Peter Zijlstra
  2021-04-07 10:47               ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2021-04-07 10:15 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Vincent Guittot, linux-kernel, Kernel Team,
	Ingo Molnar, Valentin Schneider

On Wed, Apr 07, 2021 at 10:41:06AM +0100, Mel Gorman wrote:

> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
> >  	return -1;
> >  }
> >  
> > +/*
> > + * Scan the local SMT mask for idle CPUs.
> > + */
> > +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> > +{
> > +	int cpu;
> > +
> > +	if (!static_branch_likely(&sched_smt_present))
> > +		return -1;
> > +
> > +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> > +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> > +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > +			continue;
> 
> While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
> done previously, I found it hard to believe that the test matters. If
> target/prev share a the LLC domain, why would the SMT siblings *not*
> share a LLC?

I think the reason for it is that a cpuset might have split the siblings
apart and disabled load-balancing across them or something.

Then the affinity mask can still cross the partition, but we shouldn't
ever move into it through balancing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07  9:54             ` Peter Zijlstra
  2021-04-07  9:57               ` Vincent Guittot
@ 2021-04-07 10:19               ` Peter Zijlstra
  2021-04-07 10:24                 ` Vincent Guittot
  2021-04-08 15:52                 ` Rik van Riel
  1 sibling, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-04-07 10:19 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rik van Riel, Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar,
	Valentin Schneider

On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:

> Let me have another poke at it.

Pretty much what you did, except I also did s/smt/has_idle_core/ and
fixed that @sd thing.

Like so then?

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int c
 {
 	struct sched_domain_shared *sds;
 
-	if (static_branch_likely(&sched_smt_present)) {
-		sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
-		if (sds)
-			return READ_ONCE(sds->has_idle_cores);
-	}
+	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (sds)
+		return READ_ONCE(sds->has_idle_cores);
 
 	return def;
 }
@@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6144,11 @@ static inline int select_idle_core(struc
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,11 +6156,10 @@ static inline int select_idle_core(struc
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool has_idle_core, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
-	bool smt = test_idle_cores(target, false);
 	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
@@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_s
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_s
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
-		if (smt) {
+		if (has_idle_core) {
 			i = select_idle_core(p, cpu, cpus, &idle_cpu);
 			if ((unsigned int)i < nr_cpumask_bits)
 				return i;
@@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_s
 		}
 	}
 
-	if (smt)
+	if (has_idle_core)
 		set_idle_cores(this, false);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
@@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(in
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
+	bool has_idle_core = false;
 	struct sched_domain *sd;
 	unsigned long task_util;
 	int i, recent_used_cpu;
@@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct ta
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	if (static_branch_likely(&sched_smt_present)) {
+		has_idle_core = test_idle_cores(target, false);
+
+		if (!has_idle_core && cpus_share_cache(prev, target)) {
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
+	}
+
+	i = select_idle_cpu(p, sd, has_idle_core, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07 10:19               ` Peter Zijlstra
@ 2021-04-07 10:24                 ` Vincent Guittot
  2021-04-08 15:52                 ` Rik van Riel
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent Guittot @ 2021-04-07 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar,
	Valentin Schneider

On Wed, 7 Apr 2021 at 12:19, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:
>
> > Let me have another poke at it.
>
> Pretty much what you did, except I also did s/smt/has_idle_core/ and
> fixed that @sd thing.
>
> Like so then?

Yes. Looks good to me

>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int c
>  {
>         struct sched_domain_shared *sds;
>
> -       if (static_branch_likely(&sched_smt_present)) {
> -               sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> -               if (sds)
> -                       return READ_ONCE(sds->has_idle_cores);
> -       }
> +       sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> +       if (sds)
> +               return READ_ONCE(sds->has_idle_cores);
>
>         return def;
>  }
> @@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_
>         return -1;
>  }
>
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +       int cpu;
> +
> +       for_each_cpu(cpu, cpu_smt_mask(target)) {
> +               if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +                   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +                       continue;
> +               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +                       return cpu;
> +       }
> +
> +       return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6128,6 +6144,11 @@ static inline int select_idle_core(struc
>         return __select_idle_cpu(core);
>  }
>
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> +{
> +       return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>
>  /*
> @@ -6135,11 +6156,10 @@ static inline int select_idle_core(struc
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool has_idle_core, int target)
>  {
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>         int i, cpu, idle_cpu = -1, nr = INT_MAX;
> -       bool smt = test_idle_cores(target, false);
>         int this = smp_processor_id();
>         struct sched_domain *this_sd;
>         u64 time;
> @@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_s
>
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> -       if (sched_feat(SIS_PROP) && !smt) {
> +       if (sched_feat(SIS_PROP) && !has_idle_core) {
>                 u64 avg_cost, avg_idle, span_avg;
>
>                 /*
> @@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_s
>         }
>
>         for_each_cpu_wrap(cpu, cpus, target) {
> -               if (smt) {
> +               if (has_idle_core) {
>                         i = select_idle_core(p, cpu, cpus, &idle_cpu);
>                         if ((unsigned int)i < nr_cpumask_bits)
>                                 return i;
> @@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_s
>                 }
>         }
>
> -       if (smt)
> +       if (has_idle_core)
>                 set_idle_cores(this, false);
>
> -       if (sched_feat(SIS_PROP) && !smt) {
> +       if (sched_feat(SIS_PROP) && !has_idle_core) {
>                 time = cpu_clock(this) - time;
>                 update_avg(&this_sd->avg_scan_cost, time);
>         }
> @@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(in
>   */
>  static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  {
> +       bool has_idle_core = false;
>         struct sched_domain *sd;
>         unsigned long task_util;
>         int i, recent_used_cpu;
> @@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct ta
>         if (!sd)
>                 return target;
>
> -       i = select_idle_cpu(p, sd, target);
> +       if (static_branch_likely(&sched_smt_present)) {
> +               has_idle_core = test_idle_cores(target, false);
> +
> +               if (!has_idle_core && cpus_share_cache(prev, target)) {
> +                       i = select_idle_smt(p, sd, prev);
> +                       if ((unsigned int)i < nr_cpumask_bits)
> +                               return i;
> +               }
> +       }
> +
> +       i = select_idle_cpu(p, sd, has_idle_core, target);
>         if ((unsigned)i < nr_cpumask_bits)
>                 return i;
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07 10:15             ` Peter Zijlstra
@ 2021-04-07 10:47               ` Mel Gorman
  2021-04-07 12:10                 ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2021-04-07 10:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Vincent Guittot, linux-kernel, Kernel Team,
	Ingo Molnar, Valentin Schneider

On Wed, Apr 07, 2021 at 12:15:13PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 10:41:06AM +0100, Mel Gorman wrote:
> 
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
> > >  	return -1;
> > >  }
> > >  
> > > +/*
> > > + * Scan the local SMT mask for idle CPUs.
> > > + */
> > > +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> > > +{
> > > +	int cpu;
> > > +
> > > +	if (!static_branch_likely(&sched_smt_present))
> > > +		return -1;
> > > +
> > > +	for_each_cpu(cpu, cpu_smt_mask(target)) {
> > > +		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> > > +		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > > +			continue;
> > 
> > While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
> > done previously, I found it hard to believe that the test matters. If
> > target/prev share a the LLC domain, why would the SMT siblings *not*
> > share a LLC?
> 
> I think the reason for it is that a cpuset might have split the siblings
> apart and disabled load-balancing across them or something.
> 
> Then the affinity mask can still cross the partition, but we shouldn't
> ever move into it through balancing.

Ok, cpusets do split domains. I can't imagine the logic of splitting SMT
siblings across cpusets but if it's possible, it has to be checked and
protecting that with cpusets_enabled() would be a little overkill and
possibly miss some other corner case :(

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07 10:47               ` Mel Gorman
@ 2021-04-07 12:10                 ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-04-07 12:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Vincent Guittot, linux-kernel, Kernel Team,
	Ingo Molnar, Valentin Schneider

On Wed, Apr 07, 2021 at 11:47:17AM +0100, Mel Gorman wrote:

> Ok, cpusets do split domains. I can't imagine the logic of splitting SMT
> siblings across cpusets but if it's possible, it has to be checked and
> protecting that with cpusets_enabled() would be a little overkill and
> possibly miss some other corner case :(

Quite. The logic is that some people can't be arsed to look at how the
topology is enumerated and simply split logical CPUs by a random number.

Imagine we have 4 cores, with 2 threads each, for 8 logical CPUs.

Suppose you want a partition of 6 'cpus' and 2 spares. Hoping for a 3:1
core split.

If things are enumerated like: core0{smt0, smt1}, core1{smt0, smt1} ...
then 0-5 will get you 3 whole cores.

If OTOH things are enumerated like: smt0{core0, core1, core2, core3},
smt1{...} then 0-5 will get you the loonie side of the moon.

Funny thing is that x86 can iterate either way depending on how the BIOS
monkey filled out the tables.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently
  2021-04-07 10:19               ` Peter Zijlstra
  2021-04-07 10:24                 ` Vincent Guittot
@ 2021-04-08 15:52                 ` Rik van Riel
  1 sibling, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2021-04-08 15:52 UTC (permalink / raw)
  To: Peter Zijlstra, Vincent Guittot
  Cc: Mel Gorman, linux-kernel, Kernel Team, Ingo Molnar, Valentin Schneider

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On Wed, 2021-04-07 at 12:19 +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:
> 
> > Let me have another poke at it.
> 
> Pretty much what you did, except I also did s/smt/has_idle_core/ and
> fixed that @sd thing.
> 
> Like so then?

Looks good to me. Thank you.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip: sched/core] sched/fair: Bring back select_idle_smt(), but differently
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
  2021-03-28 15:36     ` Mel Gorman
  2021-04-06 15:10     ` Vincent Guittot
@ 2021-04-09 11:24     ` tip-bot2 for Rik van Riel
  2021-04-09 16:14     ` tip-bot2 for Rik van Riel
  3 siblings, 0 replies; 26+ messages in thread
From: tip-bot2 for Rik van Riel @ 2021-04-09 11:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Rik van Riel, Peter Zijlstra (Intel),
	Mel Gorman, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     6bcd3e21ba278098920d26d4888f5e6f4087c61d
Gitweb:        https://git.kernel.org/tip/6bcd3e21ba278098920d26d4888f5e6f4087c61d
Author:        Rik van Riel <riel@surriel.com>
AuthorDate:    Fri, 26 Mar 2021 15:19:32 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 08 Apr 2021 23:09:44 +02:00

sched/fair: Bring back select_idle_smt(), but differently

Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
select_idle_core/cpu()"), resulting in the kernel being more efficient
at finding an idle CPU, and in tasks spending less time waiting to be
run, both according to the schedstats run_delay numbers, and according
to measured application latencies. Yay.

The flip side of this is that we see more task migrations (about 30%
more), higher cache misses, higher memory bandwidth utilization, and
higher CPU use, for the same number of requests/second.

This is most pronounced on a memcache type workload, which saw a
consistent 1-3% increase in total CPU use on the system, due to those
increased task migrations leading to higher L2 cache miss numbers, and
higher memory utilization. The exclusive L3 cache on Skylake does us
no favors there.

On our web serving workload, that effect is usually negligible.

It appears that the increased number of CPU migrations is generally a
good thing, since it leads to lower cpu_delay numbers, reflecting the
fact that tasks get to run faster. However, the reduced locality and
the corresponding increase in L2 cache misses hurts a little.

The patch below appears to fix the regression, while keeping the
benefit of the lower cpu_delay numbers, by reintroducing
select_idle_smt with a twist: when a socket has no idle cores, check
to see if the sibling of "prev" is idle, before searching all the
other CPUs.

This fixes both the occasional 9% regression on the web serving
workload, and the continuous 2% CPU use regression on the memcache
type workload.

With Mel's patches and this patch together, task migrations are still
high, but L2 cache misses, memory bandwidth, and CPU time used are
back down to what they were before. The p95 and p99 response times for
the memcache type application improve by about 10% over what they were
before Mel's patches got merged.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210326151932.2c187840@imladris.surriel.com
---
 kernel/sched/fair.c | 55 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d73bdb..d0bd861 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int cpu, bool def)
 {
 	struct sched_domain_shared *sds;
 
-	if (static_branch_likely(&sched_smt_present)) {
-		sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
-		if (sds)
-			return READ_ONCE(sds->has_idle_cores);
-	}
+	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (sds)
+		return READ_ONCE(sds->has_idle_cores);
 
 	return def;
 }
@@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6144,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,11 +6156,10 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool has_idle_core, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
-	bool smt = test_idle_cores(target, false);
 	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
@@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
-		if (smt) {
+		if (has_idle_core) {
 			i = select_idle_core(p, cpu, cpus, &idle_cpu);
 			if ((unsigned int)i < nr_cpumask_bits)
 				return i;
@@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		}
 	}
 
-	if (smt)
+	if (has_idle_core)
 		set_idle_cores(this, false);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
@@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
+	bool has_idle_core = false;
 	struct sched_domain *sd;
 	unsigned long task_util;
 	int i, recent_used_cpu;
@@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	if (static_branch_likely(&sched_smt_present)) {
+		has_idle_core = test_idle_cores(target, false);
+
+		if (!has_idle_core && cpus_share_cache(prev, target)) {
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
+	}
+
+	i = select_idle_cpu(p, sd, has_idle_core, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip: sched/core] sched/fair: Bring back select_idle_smt(), but differently
  2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
                       ` (2 preceding siblings ...)
  2021-04-09 11:24     ` [tip: sched/core] sched/fair: Bring back select_idle_smt(), " tip-bot2 for Rik van Riel
@ 2021-04-09 16:14     ` tip-bot2 for Rik van Riel
  3 siblings, 0 replies; 26+ messages in thread
From: tip-bot2 for Rik van Riel @ 2021-04-09 16:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Rik van Riel, Peter Zijlstra (Intel),
	Mel Gorman, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c722f35b513f807629603bbf24640b1a48be21b5
Gitweb:        https://git.kernel.org/tip/c722f35b513f807629603bbf24640b1a48be21b5
Author:        Rik van Riel <riel@surriel.com>
AuthorDate:    Fri, 26 Mar 2021 15:19:32 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 09 Apr 2021 18:01:39 +02:00

sched/fair: Bring back select_idle_smt(), but differently

Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
select_idle_core/cpu()"), resulting in the kernel being more efficient
at finding an idle CPU, and in tasks spending less time waiting to be
run, both according to the schedstats run_delay numbers, and according
to measured application latencies. Yay.

The flip side of this is that we see more task migrations (about 30%
more), higher cache misses, higher memory bandwidth utilization, and
higher CPU use, for the same number of requests/second.

This is most pronounced on a memcache type workload, which saw a
consistent 1-3% increase in total CPU use on the system, due to those
increased task migrations leading to higher L2 cache miss numbers, and
higher memory utilization. The exclusive L3 cache on Skylake does us
no favors there.

On our web serving workload, that effect is usually negligible.

It appears that the increased number of CPU migrations is generally a
good thing, since it leads to lower cpu_delay numbers, reflecting the
fact that tasks get to run faster. However, the reduced locality and
the corresponding increase in L2 cache misses hurts a little.

The patch below appears to fix the regression, while keeping the
benefit of the lower cpu_delay numbers, by reintroducing
select_idle_smt with a twist: when a socket has no idle cores, check
to see if the sibling of "prev" is idle, before searching all the
other CPUs.

This fixes both the occasional 9% regression on the web serving
workload, and the continuous 2% CPU use regression on the memcache
type workload.

With Mel's patches and this patch together, task migrations are still
high, but L2 cache misses, memory bandwidth, and CPU time used are
back down to what they were before. The p95 and p99 response times for
the memcache type application improve by about 10% over what they were
before Mel's patches got merged.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210326151932.2c187840@imladris.surriel.com
---
 kernel/sched/fair.c | 55 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d73bdb..bc34e35 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int cpu, bool def)
 {
 	struct sched_domain_shared *sds;
 
-	if (static_branch_likely(&sched_smt_present)) {
-		sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
-		if (sds)
-			return READ_ONCE(sds->has_idle_cores);
-	}
+	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+	if (sds)
+		return READ_ONCE(sds->has_idle_cores);
 
 	return def;
 }
@@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6144,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,11 +6156,10 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool has_idle_core, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
-	bool smt = test_idle_cores(target, false);
 	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
@@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
-		if (smt) {
+		if (has_idle_core) {
 			i = select_idle_core(p, cpu, cpus, &idle_cpu);
 			if ((unsigned int)i < nr_cpumask_bits)
 				return i;
@@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		}
 	}
 
-	if (smt)
+	if (has_idle_core)
 		set_idle_cores(this, false);
 
-	if (sched_feat(SIS_PROP) && !smt) {
+	if (sched_feat(SIS_PROP) && !has_idle_core) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
@@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
+	bool has_idle_core = false;
 	struct sched_domain *sd;
 	unsigned long task_util;
 	int i, recent_used_cpu;
@@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	if (sched_smt_active()) {
+		has_idle_core = test_idle_cores(target, false);
+
+		if (!has_idle_core && cpus_share_cache(prev, target)) {
+			i = select_idle_smt(p, sd, prev);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+		}
+	}
+
+	i = select_idle_cpu(p, sd, has_idle_core, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-09 16:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-21 19:03 [PATCH v2] sched/fair: bring back select_idle_smt, but differently Rik van Riel
2021-03-22 11:03 ` Mel Gorman
2021-03-22 15:07   ` Rik van Riel
2021-03-22 15:33     ` Mel Gorman
2021-03-23  2:08       ` Rik van Riel
2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
2021-03-28 15:36     ` Mel Gorman
2021-04-06 15:10     ` Vincent Guittot
2021-04-06 15:26       ` Rik van Riel
2021-04-06 15:31         ` Vincent Guittot
2021-04-06 15:33           ` Vincent Guittot
2021-04-06 15:55           ` Rik van Riel
2021-04-06 16:13             ` Vincent Guittot
2021-04-07  7:17         ` Peter Zijlstra
2021-04-07  9:41           ` Mel Gorman
2021-04-07 10:15             ` Peter Zijlstra
2021-04-07 10:47               ` Mel Gorman
2021-04-07 12:10                 ` Peter Zijlstra
2021-04-07  9:42           ` Vincent Guittot
2021-04-07  9:54             ` Peter Zijlstra
2021-04-07  9:57               ` Vincent Guittot
2021-04-07 10:19               ` Peter Zijlstra
2021-04-07 10:24                 ` Vincent Guittot
2021-04-08 15:52                 ` Rik van Riel
2021-04-09 11:24     ` [tip: sched/core] sched/fair: Bring back select_idle_smt(), " tip-bot2 for Rik van Riel
2021-04-09 16:14     ` tip-bot2 for Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.