LKML Archive on lore.kernel.org
 help / color / Atom feed
* Abysmal scheduler performance in Linus' tree?
@ 2017-09-06  5:13 Andy Lutomirski
  2017-09-06  8:25 ` Peter Zijlstra
  2017-09-06  9:15 ` Abysmal scheduler performance in Linus' tree? Chris Wilson
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2017-09-06  5:13 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, linux-kernel

I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
today, and I'm seeing abysmal scheduler performance.  Running make -j4
ends up with all the tasks on CPU 3 most of the time (on my
4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
0, but plain while true; do true; done puts the infinite loop on CPU 3
right along with the make -j4 tasks.

This is on Fedora 26, and I don't think I'm doing anything weird.
systemd has enabled the cpu controller, but it doesn't seem to have
configured anything or created any non-root cgroups.

Just a heads up.  I haven't tried to diagnose it at all.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  5:13 Abysmal scheduler performance in Linus' tree? Andy Lutomirski
@ 2017-09-06  8:25 ` Peter Zijlstra
  2017-09-06  8:59   ` Andy Lutomirski
  2017-09-06  9:15 ` Abysmal scheduler performance in Linus' tree? Chris Wilson
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06  8:25 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Ingo Molnar, linux-kernel

On Tue, Sep 05, 2017 at 10:13:39PM -0700, Andy Lutomirski wrote:
> I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
> today, and I'm seeing abysmal scheduler performance.  Running make -j4
> ends up with all the tasks on CPU 3 most of the time (on my
> 4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
> 0, but plain while true; do true; done puts the infinite loop on CPU 3
> right along with the make -j4 tasks.
> 
> This is on Fedora 26, and I don't think I'm doing anything weird.
> systemd has enabled the cpu controller, but it doesn't seem to have
> configured anything or created any non-root cgroups.
> 
> Just a heads up.  I haven't tried to diagnose it at all.

"make O=defconfig-build -j80" results in:

%Cpu0  : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 88.7 us, 11.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 86.8 us, 13.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 89.7 us, 10.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 96.3 us,  3.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 95.3 us,  4.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 94.4 us,  5.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  : 91.7 us,  8.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  : 94.3 us,  5.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 : 96.2 us,  3.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 : 91.5 us,  8.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 : 97.2 us,  2.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 : 89.7 us, 10.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 : 93.4 us,  6.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 : 94.4 us,  5.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 : 91.6 us,  8.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 : 93.4 us,  5.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st
%Cpu27 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 : 88.8 us, 11.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 : 93.4 us,  6.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 : 97.2 us,  2.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 : 92.6 us,  7.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Do you have a .config somewhere? Are you running with the systemd? Is it
creating cpu cgroups?

Any specifics on your setup?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  8:25 ` Peter Zijlstra
@ 2017-09-06  8:59   ` Andy Lutomirski
  2017-09-06  9:03     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-09-06  8:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel



> On Sep 6, 2017, at 1:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Tue, Sep 05, 2017 at 10:13:39PM -0700, Andy Lutomirski wrote:
>> I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
>> today, and I'm seeing abysmal scheduler performance.  Running make -j4
>> ends up with all the tasks on CPU 3 most of the time (on my
>> 4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
>> 0, but plain while true; do true; done puts the infinite loop on CPU 3
>> right along with the make -j4 tasks.
>> 
>> This is on Fedora 26, and I don't think I'm doing anything weird.
>> systemd has enabled the cpu controller, but it doesn't seem to have
>> configured anything or created any non-root cgroups.
>> 
>> Just a heads up.  I haven't tried to diagnose it at all.
> 
> "make O=defconfig-build -j80" results in:
> 
> %Cpu0  : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu1  : 88.7 us, 11.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu2  : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu3  : 86.8 us, 13.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu4  : 89.7 us, 10.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu5  : 96.3 us,  3.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu6  : 95.3 us,  4.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu7  : 94.4 us,  5.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu8  : 91.7 us,  8.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu9  : 94.3 us,  5.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu10 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu11 : 96.2 us,  3.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu12 : 91.5 us,  8.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu13 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu14 : 97.2 us,  2.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu15 : 89.7 us, 10.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu16 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu17 : 93.4 us,  6.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu18 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu19 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu20 : 94.4 us,  5.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu21 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu22 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu23 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu24 : 91.6 us,  8.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu25 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu26 : 93.4 us,  5.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st
> %Cpu27 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu28 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu29 : 88.8 us, 11.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu30 : 90.6 us,  9.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu31 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu32 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu33 : 93.4 us,  6.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu34 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu35 : 93.5 us,  6.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu36 : 90.7 us,  9.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu37 : 97.2 us,  2.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu38 : 92.5 us,  7.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> %Cpu39 : 92.6 us,  7.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> 
> Do you have a .config somewhere?

I'll attach tomorrow.  I'll also test in a VM.

> Are you running with the systemd? Is it
> creating cpu cgroups?

Yes systemd, no cgroups.

> 
> Any specifics on your setup?

On further fiddling, I only see this after a suspend and resume cycle.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  8:59   ` Andy Lutomirski
@ 2017-09-06  9:03     ` Peter Zijlstra
  2017-09-06 16:14       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06  9:03 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, Sep 06, 2017 at 01:59:14AM -0700, Andy Lutomirski wrote:
> > Any specifics on your setup?
> 
> On further fiddling, I only see this after a suspend and resume cycle.

Ah, ok. That's not something I otherwise test. Lets see if I can force
this brick of mine through a suspend-resume cycle :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  5:13 Abysmal scheduler performance in Linus' tree? Andy Lutomirski
  2017-09-06  8:25 ` Peter Zijlstra
@ 2017-09-06  9:15 ` Chris Wilson
  2017-09-06  9:24   ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-09-06  9:15 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Peter Zijlstra, linux-kernel

Quoting Andy Lutomirski (2017-09-06 06:13:39)
> I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
> today, and I'm seeing abysmal scheduler performance.  Running make -j4
> ends up with all the tasks on CPU 3 most of the time (on my
> 4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
> 0, but plain while true; do true; done puts the infinite loop on CPU 3
> right along with the make -j4 tasks.
> 
> This is on Fedora 26, and I don't think I'm doing anything weird.
> systemd has enabled the cpu controller, but it doesn't seem to have
> configured anything or created any non-root cgroups.
> 
> Just a heads up.  I haven't tried to diagnose it at all.

There is an issue on !llc machines arising from numa_wake_wide() where
processes are not spread widely across the low-power cores:
https://patchwork.kernel.org/patch/9875581/

The patch we are using to fix the regression is
https://cgit.freedesktop.org/drm-intel/commit/?h=topic/core-for-CI&id=6c362d9657293d700a8299acbeb2f1e24378f488
-Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  9:15 ` Abysmal scheduler performance in Linus' tree? Chris Wilson
@ 2017-09-06  9:24   ` Peter Zijlstra
       [not found]     ` <150469312649.28581.17626550155735691534@mail.alporthouse.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06  9:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, Sep 06, 2017 at 10:15:33AM +0100, Chris Wilson wrote:
> Quoting Andy Lutomirski (2017-09-06 06:13:39)
> > I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
> > today, and I'm seeing abysmal scheduler performance.  Running make -j4
> > ends up with all the tasks on CPU 3 most of the time (on my
> > 4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
> > 0, but plain while true; do true; done puts the infinite loop on CPU 3
> > right along with the make -j4 tasks.
> > 
> > This is on Fedora 26, and I don't think I'm doing anything weird.
> > systemd has enabled the cpu controller, but it doesn't seem to have
> > configured anything or created any non-root cgroups.
> > 
> > Just a heads up.  I haven't tried to diagnose it at all.
> 
> There is an issue on !llc machines arising from numa_wake_wide() where
> processes are not spread widely across the low-power cores:
> https://patchwork.kernel.org/patch/9875581/
> 
> The patch we are using to fix the regression is
> https://cgit.freedesktop.org/drm-intel/commit/?h=topic/core-for-CI&id=6c362d9657293d700a8299acbeb2f1e24378f488


Urgh. that's the broken one, note how it has:

+static void get_llc_stats(struct llc_stats *stats, int cpu)
+{
+	struct sched_domain_shared *sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+
+	if (!sds) {
+		memset(&stats, 0, sizeof(*stats));
                       ^

0-day reported that that exploded with gcc-4.8 but worked with newer
compilers. But either way its fairly broken code.

The one that we committed and hit Linus' tree already (if my git-foo is
any good today) is:


commit 90001d67be2fa2acbe3510d1f64fa6533efa30ef
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Jul 31 17:50:05 2017 +0200

    sched/fair: Fix wake_affine() for !NUMA_BALANCING
    
    In commit:
    
      3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")
    
    Rik changed wake_affine to consider NUMA information when balancing
    between LLC domains.
    
    There are a number of problems here which this patch tries to address:
    
     - LLC < NODE; in this case we'd use the wrong information to balance
     - !NUMA_BALANCING: in this case, the new code doesn't do any
       balancing at all
     - re-computes the NUMA data for every wakeup, this can mean iterating
       up to 64 CPUs for every wakeup.
     - default affine wakeups inside a cache
    
    We address these by saving the load/capacity values for each
    sched_domain during regular load-balance and using these values in
    wake_affine_llc(). The obvious down-side to using cached values is
    that they can be too old and poorly reflect reality.
    
    But this way we can use LLC wide information and thus not rely on
    assuming LLC matches NODE. We also don't rely on NUMA_BALANCING nor do
    we have to aggegate two nodes (or even cache domains) worth of CPUs
    for each wakeup.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")
    [ Minor readability improvements. ]
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7d065abc7a47..d7b6dab956ec 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -71,6 +71,14 @@ struct sched_domain_shared {
 	atomic_t	ref;
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
+
+	/*
+	 * Some variables from the most recent sd_lb_stats for this domain,
+	 * used by wake_affine().
+	 */
+	unsigned long	nr_running;
+	unsigned long	load;
+	unsigned long	capacity;
 };
 
 struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a7f1c3b797f8..8d5868771cb3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2658,59 +2658,6 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr)
 	}
 }
 
-/*
- * Can a task be moved from prev_cpu to this_cpu without causing a load
- * imbalance that would trigger the load balancer?
- */
-static inline bool numa_wake_affine(struct sched_domain *sd,
-				    struct task_struct *p, int this_cpu,
-				    int prev_cpu, int sync)
-{
-	struct numa_stats prev_load, this_load;
-	s64 this_eff_load, prev_eff_load;
-
-	update_numa_stats(&prev_load, cpu_to_node(prev_cpu));
-	update_numa_stats(&this_load, cpu_to_node(this_cpu));
-
-	/*
-	 * If sync wakeup then subtract the (maximum possible)
-	 * effect of the currently running task from the load
-	 * of the current CPU:
-	 */
-	if (sync) {
-		unsigned long current_load = task_h_load(current);
-
-		if (this_load.load > current_load)
-			this_load.load -= current_load;
-		else
-			this_load.load = 0;
-	}
-
-	/*
-	 * In low-load situations, where this_cpu's node is idle due to the
-	 * sync cause above having dropped this_load.load to 0, move the task.
-	 * Moving to an idle socket will not create a bad imbalance.
-	 *
-	 * Otherwise check if the nodes are near enough in load to allow this
-	 * task to be woken on this_cpu's node.
-	 */
-	if (this_load.load > 0) {
-		unsigned long task_load = task_h_load(p);
-
-		this_eff_load = 100;
-		this_eff_load *= prev_load.compute_capacity;
-
-		prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
-		prev_eff_load *= this_load.compute_capacity;
-
-		this_eff_load *= this_load.load + task_load;
-		prev_eff_load *= prev_load.load - task_load;
-
-		return this_eff_load <= prev_eff_load;
-	}
-
-	return true;
-}
 #else
 static void task_tick_numa(struct rq *rq, struct task_struct *curr)
 {
@@ -2724,14 +2671,6 @@ static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p)
 {
 }
 
-#ifdef CONFIG_SMP
-static inline bool numa_wake_affine(struct sched_domain *sd,
-				    struct task_struct *p, int this_cpu,
-				    int prev_cpu, int sync)
-{
-	return true;
-}
-#endif /* !SMP */
 #endif /* CONFIG_NUMA_BALANCING */
 
 static void
@@ -5428,20 +5367,115 @@ static int wake_wide(struct task_struct *p)
 	return 1;
 }
 
+struct llc_stats {
+	unsigned long	nr_running;
+	unsigned long	load;
+	unsigned long	capacity;
+	int		has_capacity;
+};
+
+static bool get_llc_stats(struct llc_stats *stats, int cpu)
+{
+	struct sched_domain_shared *sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+
+	if (!sds)
+		return false;
+
+	stats->nr_running	= READ_ONCE(sds->nr_running);
+	stats->load		= READ_ONCE(sds->load);
+	stats->capacity		= READ_ONCE(sds->capacity);
+	stats->has_capacity	= stats->nr_running < per_cpu(sd_llc_size, cpu);
+
+	return true;
+}
+
+/*
+ * Can a task be moved from prev_cpu to this_cpu without causing a load
+ * imbalance that would trigger the load balancer?
+ *
+ * Since we're running on 'stale' values, we might in fact create an imbalance
+ * but recomputing these values is expensive, as that'd mean iteration 2 cache
+ * domains worth of CPUs.
+ */
+static bool
+wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
+		int this_cpu, int prev_cpu, int sync)
+{
+	struct llc_stats prev_stats, this_stats;
+	s64 this_eff_load, prev_eff_load;
+	unsigned long task_load;
+
+	if (!get_llc_stats(&prev_stats, prev_cpu) ||
+	    !get_llc_stats(&this_stats, this_cpu))
+		return false;
+
+	/*
+	 * If sync wakeup then subtract the (maximum possible)
+	 * effect of the currently running task from the load
+	 * of the current LLC.
+	 */
+	if (sync) {
+		unsigned long current_load = task_h_load(current);
+
+		/* in this case load hits 0 and this LLC is considered 'idle' */
+		if (current_load > this_stats.load)
+			return true;
+
+		this_stats.load -= current_load;
+	}
+
+	/*
+	 * The has_capacity stuff is not SMT aware, but by trying to balance
+	 * the nr_running on both ends we try and fill the domain at equal
+	 * rates, thereby first consuming cores before siblings.
+	 */
+
+	/* if the old cache has capacity, stay there */
+	if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
+		return false;
+
+	/* if this cache has capacity, come here */
+	if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
+		return true;
+
+	/*
+	 * Check to see if we can move the load without causing too much
+	 * imbalance.
+	 */
+	task_load = task_h_load(p);
+
+	this_eff_load = 100;
+	this_eff_load *= prev_stats.capacity;
+
+	prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
+	prev_eff_load *= this_stats.capacity;
+
+	this_eff_load *= this_stats.load + task_load;
+	prev_eff_load *= prev_stats.load - task_load;
+
+	return this_eff_load <= prev_eff_load;
+}
+
 static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		       int prev_cpu, int sync)
 {
 	int this_cpu = smp_processor_id();
-	bool affine = false;
+	bool affine;
 
 	/*
-	 * Common case: CPUs are in the same socket, and select_idle_sibling()
-	 * will do its thing regardless of what we return:
+	 * Default to no affine wakeups; wake_affine() should not effect a task
+	 * placement the load-balancer feels inclined to undo. The conservative
+	 * option is therefore to not move tasks when they wake up.
 	 */
-	if (cpus_share_cache(prev_cpu, this_cpu))
-		affine = true;
-	else
-		affine = numa_wake_affine(sd, p, this_cpu, prev_cpu, sync);
+	affine = false;
+
+	/*
+	 * If the wakeup is across cache domains, try to evaluate if movement
+	 * makes sense, otherwise rely on select_idle_siblings() to do
+	 * placement inside the cache domain.
+	 */
+	if (!cpus_share_cache(prev_cpu, this_cpu))
+		affine = wake_affine_llc(sd, p, this_cpu, prev_cpu, sync);
 
 	schedstat_inc(p->se.statistics.nr_wakeups_affine_attempts);
 	if (affine) {
@@ -7121,6 +7155,7 @@ struct sg_lb_stats {
 struct sd_lb_stats {
 	struct sched_group *busiest;	/* Busiest group in this sd */
 	struct sched_group *local;	/* Local group in this sd */
+	unsigned long total_running;
 	unsigned long total_load;	/* Total load of all groups in sd */
 	unsigned long total_capacity;	/* Total capacity of all groups in sd */
 	unsigned long avg_load;	/* Average load across all groups in sd */
@@ -7140,6 +7175,7 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
 	*sds = (struct sd_lb_stats){
 		.busiest = NULL,
 		.local = NULL,
+		.total_running = 0UL,
 		.total_load = 0UL,
 		.total_capacity = 0UL,
 		.busiest_stat = {
@@ -7575,6 +7611,7 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq)
  */
 static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds)
 {
+	struct sched_domain_shared *shared = env->sd->shared;
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
 	struct sg_lb_stats *local = &sds->local_stat;
@@ -7631,6 +7668,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 
 next_group:
 		/* Now, start updating sd_lb_stats */
+		sds->total_running += sgs->sum_nr_running;
 		sds->total_load += sgs->group_load;
 		sds->total_capacity += sgs->group_capacity;
 
@@ -7646,6 +7684,21 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 			env->dst_rq->rd->overload = overload;
 	}
 
+	if (!shared)
+		return;
+
+	/*
+	 * Since these are sums over groups they can contain some CPUs
+	 * multiple times for the NUMA domains.
+	 *
+	 * Currently only wake_affine_llc() and find_busiest_group()
+	 * uses these numbers, only the last is affected by this problem.
+	 *
+	 * XXX fix that.
+	 */
+	WRITE_ONCE(shared->nr_running,	sds->total_running);
+	WRITE_ONCE(shared->load,	sds->total_load);
+	WRITE_ONCE(shared->capacity,	sds->total_capacity);
 }
 
 /**
@@ -7875,6 +7928,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (!sds.busiest || busiest->sum_nr_running == 0)
 		goto out_balanced;
 
+	/* XXX broken for overlapping NUMA groups */
 	sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load)
 						/ sds.total_capacity;
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
       [not found]     ` <150469312649.28581.17626550155735691534@mail.alporthouse.com>
@ 2017-09-06 10:44       ` Peter Zijlstra
  2017-09-06 10:51         ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06 10:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, Sep 06, 2017 at 11:18:46AM +0100, Chris Wilson wrote:

> > +static void get_llc_stats(struct llc_stats *stats, int cpu)
> > +{
> > +       struct sched_domain_shared *sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > +
> > +       if (!sds) {
> > +               memset(&stats, 0, sizeof(*stats));
> 
> Yes, I even sent you a mail about it ;)

Bah, too much email, sorry :-(

> > +       /*
> > +        * The has_capacity stuff is not SMT aware, but by trying to balance
> > +        * the nr_running on both ends we try and fill the domain at equal
> > +        * rates, thereby first consuming cores before siblings.
> > +        */
> > +
> > +       /* if the old cache has capacity, stay there */
> > +       if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> > +               return false;
> > +
> > +       /* if this cache has capacity, come here */
> > +       if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
> > +               return true;
> 
> This is still not working as intended, it should be 
> 
> 	if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
> 		return true;
> 
> to fix the regression.

Argh, you're quite right. Let me do a patch for that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06 10:44       ` Peter Zijlstra
@ 2017-09-06 10:51         ` Peter Zijlstra
  2017-09-07  8:16           ` [tip:sched/urgent] sched/fair: Fix wake_affine_llc() balancing rules tip-bot for Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06 10:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, Sep 06, 2017 at 12:44:20PM +0200, Peter Zijlstra wrote:
> > > +       /* if this cache has capacity, come here */
> > > +       if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
> > > +               return true;
> > 
> > This is still not working as intended, it should be 
> > 
> > 	if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
> > 		return true;
> > 
> > to fix the regression.
> 
> Argh, you're quite right. Let me do a patch for that.

---
Subject: sched/fair: Fix wake_affine_llc() balance rules
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Sep  6 12:45:45 CEST 2017

Chris reported that the SMT balance rules got the +1 on the wrong
side, resulting in a bias towards the current LLC; which the
load-balancer would then try and undo.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5435,7 +5435,7 @@ wake_affine_llc(struct sched_domain *sd,
 		return false;
 
 	/* if this cache has capacity, come here */
-	if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
+	if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
 		return true;
 
 	/*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06  9:03     ` Peter Zijlstra
@ 2017-09-06 16:14       ` Peter Zijlstra
  2017-09-07  6:15         ` Mike Galbraith
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-06 16:14 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, Sep 06, 2017 at 11:03:33AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 06, 2017 at 01:59:14AM -0700, Andy Lutomirski wrote:
> > > Any specifics on your setup?
> > 
> > On further fiddling, I only see this after a suspend and resume cycle.
> 
> Ah, ok. That's not something I otherwise test. Lets see if I can force
> this brick of mine through a suspend-resume cycle :-)

I'd be suspicious of these here commits:

  77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
  09e0dd8e0f2e ("sched/topology: Avoid pointless rebuild")
  bbdacdfed2f5 ("sched/debug: Optimize sched_domain sysctl generation")

I tested them with regular hotplug, but suspend resume always is a tad
funny.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-06 16:14       ` Peter Zijlstra
@ 2017-09-07  6:15         ` Mike Galbraith
  2017-09-07  7:31           ` Ingo Molnar
  2017-09-07  9:13           ` [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Galbraith @ 2017-09-07  6:15 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Andy Lutomirski, Ingo Molnar, linux-kernel

On Wed, 2017-09-06 at 18:14 +0200, Peter Zijlstra wrote:
> On Wed, Sep 06, 2017 at 11:03:33AM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 06, 2017 at 01:59:14AM -0700, Andy Lutomirski wrote:
> > > > Any specifics on your setup?
> > > 
> > > On further fiddling, I only see this after a suspend and resume cycle.
> > 
> > Ah, ok. That's not something I otherwise test. Lets see if I can force
> > this brick of mine through a suspend-resume cycle :-)
> 
> I'd be suspicious of these here commits:
> 
>   77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")

Seems to be that one.  After suspend/resume, cpu7 (i4790+SMT) is online
with NULL domain.  offline/online it, it grows proper domains.

	-Mike

(on vacation, about to go explore that unfamiliar 'outside' place)  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Abysmal scheduler performance in Linus' tree?
  2017-09-07  6:15         ` Mike Galbraith
@ 2017-09-07  7:31           ` Ingo Molnar
  2017-09-07  9:13           ` [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2017-09-07  7:31 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Andy Lutomirski, Andy Lutomirski, linux-kernel


* Mike Galbraith <efault@gmx.de> wrote:

> On Wed, 2017-09-06 at 18:14 +0200, Peter Zijlstra wrote:
> > On Wed, Sep 06, 2017 at 11:03:33AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 06, 2017 at 01:59:14AM -0700, Andy Lutomirski wrote:
> > > > > Any specifics on your setup?
> > > > 
> > > > On further fiddling, I only see this after a suspend and resume cycle.
> > > 
> > > Ah, ok. That's not something I otherwise test. Lets see if I can force
> > > this brick of mine through a suspend-resume cycle :-)
> > 
> > I'd be suspicious of these here commits:
> > 
> >   77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
> 
> Seems to be that one.  After suspend/resume, cpu7 (i4790+SMT) is online
> with NULL domain.  offline/online it, it grows proper domains.

Thanks, that's really useful, should narrow things down!

> (on vacation, about to go explore that unfamiliar 'outside' place)  

Be careful, it's a dangerous world for kernel hackers! ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:sched/urgent] sched/fair: Fix wake_affine_llc() balancing rules
  2017-09-06 10:51         ` Peter Zijlstra
@ 2017-09-07  8:16           ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-09-07  8:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, efault, mingo, chris, linux-kernel, hpa, torvalds, peterz, tglx

Commit-ID:  a731ebe6f17bff9e7ca12ef227f9da4d5bdf8425
Gitweb:     http://git.kernel.org/tip/a731ebe6f17bff9e7ca12ef227f9da4d5bdf8425
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 6 Sep 2017 12:51:31 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 7 Sep 2017 09:29:31 +0200

sched/fair: Fix wake_affine_llc() balancing rules

Chris Wilson reported that the SMT balance rules got the +1 on the
wrong side, resulting in a bias towards the current LLC; which the
load-balancer would then try and undo.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING")
Link: http://lkml.kernel.org/r/20170906105131.gqjmaextmn3u6tj2@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8d58687..9dd2ce1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5435,7 +5435,7 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
 		return false;
 
 	/* if this cache has capacity, come here */
-	if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
+	if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
 		return true;
 
 	/*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume
  2017-09-07  6:15         ` Mike Galbraith
  2017-09-07  7:31           ` Ingo Molnar
@ 2017-09-07  9:13           ` Peter Zijlstra
  2017-09-07  9:26             ` Peter Zijlstra
  2017-09-07 10:33             ` [tip:sched/urgent] sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-07  9:13 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Andy Lutomirski, Andy Lutomirski, Ingo Molnar, linux-kernel,
	Tejun Heo, Rafael J. Wysocki, Thomas Gleixner

On Thu, Sep 07, 2017 at 08:15:29AM +0200, Mike Galbraith wrote:

> > I'd be suspicious of these here commits:
> > 
> >   77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
> 
> Seems to be that one.  After suspend/resume, cpu7 (i4790+SMT) is online
> with NULL domain.  offline/online it, it grows proper domains.

Shiny.. and *urgh*.

So if I offline cpus 4-39 (otherwise too much output) and then do:

# echo processors > /sys/power/pm_test
# echo disk > /sys/power/state

I get:

[  568.315315] Disabling non-boot CPUs ...
[  568.329059] CPU0 attaching NULL sched-domain.
[  568.333942] CPU1 attaching NULL sched-domain.
[  568.338805] CPU2 attaching NULL sched-domain.
[  568.343675] CPU3 attaching NULL sched-domain.
[  568.348645] CPU0 attaching sched-domain(s):
[  568.353322]  domain-0: span=0,2-3 level=MC
[  568.357924]   groups: 0:{ span=0 }, 2:{ span=2 }, 3:{ span=3 }
[  568.364454] CPU2 attaching sched-domain(s):
[  568.369125]  domain-0: span=0,2-3 level=MC
[  568.373699]   groups: 2:{ span=2 }, 3:{ span=3 }, 0:{ span=0 }
[  568.380224] CPU3 attaching sched-domain(s):
[  568.384896]  domain-0: span=0,2-3 level=MC
[  568.389471]   groups: 3:{ span=3 }, 0:{ span=0 }, 2:{ span=2 }
[  568.448127] smpboot: CPU 1 is now offline

[  568.462083] CPU0 attaching NULL sched-domain.
[  568.466972] CPU2 attaching NULL sched-domain.
[  568.471850] CPU3 attaching NULL sched-domain.
[  568.476855] CPU0 attaching sched-domain(s):
[  568.481551]  domain-0: span=0,3 level=MC
[  568.485986]   groups: 0:{ span=0 }, 3:{ span=3 }
[  568.491178] CPU3 attaching sched-domain(s):
[  568.495864]  domain-0: span=0,3 level=MC
[  568.500252]   groups: 3:{ span=3 }, 0:{ span=0 }
[  568.522159] smpboot: CPU 2 is now offline

[  568.537075] CPU0 attaching NULL sched-domain.
[  568.541969] CPU3 attaching NULL sched-domain.
[  568.546947] CPU0 attaching NULL sched-domain.
[  568.558865] smpboot: CPU 3 is now offline

[  568.563955] PM: hibernation debug: Waiting for 5 seconds.



[  573.570008] Enabling non-boot CPUs ...
[  573.594298] CPU0 attaching NULL sched-domain.
[  573.599216] CPU0 attaching sched-domain(s):
[  573.603891]  domain-0: span=0-1 level=MC
[  573.608281]   groups: 0:{ span=0 }, 1:{ span=1 }
[  573.613455] CPU1 attaching sched-domain(s):
[  573.618127]  domain-0: span=0-1 level=MC
[  573.622509]   groups: 1:{ span=1 }, 0:{ span=0 }
[  573.627668] span: 0-1 (max cpu_capacity = 1024)
[  573.632799] CPU1 is up

[  573.650551] CPU0 attaching NULL sched-domain.
[  573.655421] CPU1 attaching NULL sched-domain.
[  573.660340] CPU0 attaching sched-domain(s):
[  573.665012]  domain-0: span=0-2 level=MC
[  573.669402]   groups: 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }
[  573.675930] CPU1 attaching sched-domain(s):
[  573.680601]  domain-0: span=0-2 level=MC
[  573.684982]   groups: 1:{ span=1 }, 2:{ span=2 }, 0:{ span=0 }
[  573.691496] CPU2 attaching sched-domain(s):
[  573.696166]  domain-0: span=0-2 level=MC
[  573.700546]   groups: 2:{ span=2 }, 0:{ span=0 }, 1:{ span=1 }
[  573.707064] span: 0-2 (max cpu_capacity = 1024)
[  573.712198] CPU2 is up

[  573.730022] CPU3 is up

[  575.913686] Restarting tasks ... done.

Its failing to rebuild domains for CPU3

And I can see how that happens, note how cpuset_cpu_active() doesn't
build an identity domain for the last CPU but relies on
cpuset_update_active_cpus() to create the proper full cpuset
configuration.

_However_, since cpuset_cpu_inactive() did _NOT_ call
cpuset_update_active_cpus(), top_cpuset.effective_cpus wasn't updated,
and now cpuset_hotplug_workfn() will find the active mask matching and
not in fact update anything.

So this was already broken if you'd had an actual cpuset configuration,
it would never restore the proper cpuset domains and retain the identity
domain setup created during resume.

What's worse, we don't seem to wait for completion of the
cpuset_hotplug_workfn() before unfreezing userspace.

---
Subject: sched/cpuset/pm: Fix cpuset vs suspend-resume

Cpusets vs suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.

On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.

But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.

So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.

An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.

Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().

Cc: tj@kernel.org
Cc: rjw@rjwysocki.net
Cc: efault@gmx.de
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpuset.h |  6 ++++++
 kernel/cgroup/cpuset.c | 16 +++++++++++++++-
 kernel/power/process.c |  5 ++++-
 kernel/sched/core.c    |  7 +++----
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index e74655d941b7..a1e6a33a4b03 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -51,7 +51,9 @@ static inline void cpuset_dec(void)
 
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
+extern void cpuset_force_rebuild(void);
 extern void cpuset_update_active_cpus(void);
+extern void cpuset_wait_for_hotplug(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern void cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
@@ -164,11 +166,15 @@ static inline bool cpusets_enabled(void) { return false; }
 static inline int cpuset_init(void) { return 0; }
 static inline void cpuset_init_smp(void) {}
 
+static inline void cpuset_force_rebuild(void) { }
+
 static inline void cpuset_update_active_cpus(void)
 {
 	partition_sched_domains(1, NULL, NULL);
 }
 
+static inline void cpuset_wait_for_hotplug(void) { }
+
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2f4039bafebb..0513ee39698b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2267,6 +2267,13 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs)
 	mutex_unlock(&cpuset_mutex);
 }
 
+static bool force_rebuild;
+
+void cpuset_force_rebuild(void)
+{
+	force_rebuild = true;
+}
+
 /**
  * cpuset_hotplug_workfn - handle CPU/memory hotunplug for a cpuset
  *
@@ -2341,8 +2348,10 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	}
 
 	/* rebuild sched domains if cpus_allowed has changed */
-	if (cpus_updated)
+	if (cpus_updated || force_rebuild) {
+		force_rebuild = false;
 		rebuild_sched_domains();
+	}
 }
 
 void cpuset_update_active_cpus(void)
@@ -2355,6 +2364,11 @@ void cpuset_update_active_cpus(void)
 	schedule_work(&cpuset_hotplug_work);
 }
 
+void cpuset_wait_for_hotplug(void)
+{
+	flush_work(&cpuset_hotplug_work);
+}
+
 /*
  * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
  * Call this routine anytime after node_states[N_MEMORY] changes.
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 78672d324a6e..50f25cb370c6 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -20,8 +20,9 @@
 #include <linux/workqueue.h>
 #include <linux/kmod.h>
 #include <trace/events/power.h>
+#include <linux/cpuset.h>
 
-/* 
+/*
  * Timeout for stopping processes
  */
 unsigned int __read_mostly freeze_timeout_msecs = 20 * MSEC_PER_SEC;
@@ -202,6 +203,8 @@ void thaw_processes(void)
 	__usermodehelper_set_disable_depth(UMH_FREEZING);
 	thaw_workqueues();
 
+	cpuset_wait_for_hotplug();
+
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
 		/* No other threads should have PF_SUSPEND_TASK set */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 703f5831738e..7ba7084ffc0a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5558,16 +5558,15 @@ static void cpuset_cpu_active(void)
 		 * operation in the resume sequence, just build a single sched
 		 * domain, ignoring cpusets.
 		 */
-		num_cpus_frozen--;
-		if (likely(num_cpus_frozen)) {
-			partition_sched_domains(1, NULL, NULL);
+		partition_sched_domains(1, NULL, NULL);
+		if (--num_cpus_frozen)
 			return;
-		}
 		/*
 		 * This is the last CPU online operation. So fall through and
 		 * restore the original sched domains by considering the
 		 * cpuset configurations.
 		 */
+		cpuset_force_rebuild();
 	}
 	cpuset_update_active_cpus();
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume
  2017-09-07  9:13           ` [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume Peter Zijlstra
@ 2017-09-07  9:26             ` Peter Zijlstra
  2017-09-07 10:54               ` Rafael J. Wysocki
  2017-09-07 20:38               ` Tejun Heo
  2017-09-07 10:33             ` [tip:sched/urgent] sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2017-09-07  9:26 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Andy Lutomirski, Andy Lutomirski, Ingo Molnar, linux-kernel,
	Tejun Heo, Rafael J. Wysocki, Thomas Gleixner

On Thu, Sep 07, 2017 at 11:13:38AM +0200, Peter Zijlstra wrote:
> Subject: sched/cpuset/pm: Fix cpuset vs suspend-resume
> 
> Cpusets vs suspend-resume is _completely_ broken. And it got noticed
> because it now resulted in non-cpuset usage breaking too.
> 
> On suspend cpuset_cpu_inactive() doesn't call into
> cpuset_update_active_cpus() because it doesn't want to move tasks about,
> there is no need, all tasks are frozen and won't run again until after
> we've resumed everything.
> 
> But this means that when we finally do call into
> cpuset_update_active_cpus() after resuming the last frozen cpu in
> cpuset_cpu_active(), the top_cpuset will not have any difference with
> the cpu_active_mask and this it will not in fact do _anything_.
> 
> So the cpuset configuration will not be restored. This was largely
> hidden because we would unconditionally create identity domains and
> mobile users would not in fact use cpusets much. And servers what do use
> cpusets tend to not suspend-resume much.
> 
> An addition problem is that we'd not in fact wait for the cpuset work to
> finish before resuming the tasks, allowing spurious migrations outside
> of the specified domains.
> 
> Fix the rebuild by introducing cpuset_force_rebuild() and fix the
> ordering with cpuset_wait_for_hotplug().
> 
> Cc: tj@kernel.org
> Cc: rjw@rjwysocki.net
> Cc: efault@gmx.de
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

TJ, I _think_ it was commit:

  deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling")

That wrecked things, but there's been so much changes in this area it is
really hard to tell. Note how before that commit it would
unconditionally rebuild the domains, and you 'optimized' that ;-)

That commit also introduced the work to do the async rebuild and failed
to do that flush on resume.

In any case, I think we should put a fixes tag on this commit such that
it gets picked up into stable kernels. Not sure anybody will try and
backport it into 4 year old kernels, but who knows.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:sched/urgent] sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
  2017-09-07  9:13           ` [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume Peter Zijlstra
  2017-09-07  9:26             ` Peter Zijlstra
@ 2017-09-07 10:33             ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-09-07 10:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, linux-kernel, tj, stable, torvalds, peterz, tglx,
	hpa, rjw, luto, efault

Commit-ID:  50e76632339d4655859523a39249dd95ee5e93e7
Gitweb:     http://git.kernel.org/tip/50e76632339d4655859523a39249dd95ee5e93e7
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 7 Sep 2017 11:13:38 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 7 Sep 2017 11:45:21 +0200

sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs

Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.

On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.

But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.

So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.

An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.

Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling")
Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/cpuset.h |  6 ++++++
 kernel/cgroup/cpuset.c | 16 +++++++++++++++-
 kernel/power/process.c |  5 ++++-
 kernel/sched/core.c    |  7 +++----
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index e74655d..a1e6a33 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -51,7 +51,9 @@ static inline void cpuset_dec(void)
 
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
+extern void cpuset_force_rebuild(void);
 extern void cpuset_update_active_cpus(void);
+extern void cpuset_wait_for_hotplug(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern void cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
@@ -164,11 +166,15 @@ static inline bool cpusets_enabled(void) { return false; }
 static inline int cpuset_init(void) { return 0; }
 static inline void cpuset_init_smp(void) {}
 
+static inline void cpuset_force_rebuild(void) { }
+
 static inline void cpuset_update_active_cpus(void)
 {
 	partition_sched_domains(1, NULL, NULL);
 }
 
+static inline void cpuset_wait_for_hotplug(void) { }
+
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2f4039b..0513ee3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2267,6 +2267,13 @@ retry:
 	mutex_unlock(&cpuset_mutex);
 }
 
+static bool force_rebuild;
+
+void cpuset_force_rebuild(void)
+{
+	force_rebuild = true;
+}
+
 /**
  * cpuset_hotplug_workfn - handle CPU/memory hotunplug for a cpuset
  *
@@ -2341,8 +2348,10 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	}
 
 	/* rebuild sched domains if cpus_allowed has changed */
-	if (cpus_updated)
+	if (cpus_updated || force_rebuild) {
+		force_rebuild = false;
 		rebuild_sched_domains();
+	}
 }
 
 void cpuset_update_active_cpus(void)
@@ -2355,6 +2364,11 @@ void cpuset_update_active_cpus(void)
 	schedule_work(&cpuset_hotplug_work);
 }
 
+void cpuset_wait_for_hotplug(void)
+{
+	flush_work(&cpuset_hotplug_work);
+}
+
 /*
  * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
  * Call this routine anytime after node_states[N_MEMORY] changes.
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 78672d3..50f25cb 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -20,8 +20,9 @@
 #include <linux/workqueue.h>
 #include <linux/kmod.h>
 #include <trace/events/power.h>
+#include <linux/cpuset.h>
 
-/* 
+/*
  * Timeout for stopping processes
  */
 unsigned int __read_mostly freeze_timeout_msecs = 20 * MSEC_PER_SEC;
@@ -202,6 +203,8 @@ void thaw_processes(void)
 	__usermodehelper_set_disable_depth(UMH_FREEZING);
 	thaw_workqueues();
 
+	cpuset_wait_for_hotplug();
+
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
 		/* No other threads should have PF_SUSPEND_TASK set */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6d2c7ff..136a76d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5556,16 +5556,15 @@ static void cpuset_cpu_active(void)
 		 * operation in the resume sequence, just build a single sched
 		 * domain, ignoring cpusets.
 		 */
-		num_cpus_frozen--;
-		if (likely(num_cpus_frozen)) {
-			partition_sched_domains(1, NULL, NULL);
+		partition_sched_domains(1, NULL, NULL);
+		if (--num_cpus_frozen)
 			return;
-		}
 		/*
 		 * This is the last CPU online operation. So fall through and
 		 * restore the original sched domains by considering the
 		 * cpuset configurations.
 		 */
+		cpuset_force_rebuild();
 	}
 	cpuset_update_active_cpus();
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume
  2017-09-07  9:26             ` Peter Zijlstra
@ 2017-09-07 10:54               ` Rafael J. Wysocki
  2017-09-07 20:38               ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2017-09-07 10:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Andy Lutomirski, Andy Lutomirski, Ingo Molnar,
	linux-kernel, Tejun Heo, Thomas Gleixner

On Thursday, September 7, 2017 11:26:16 AM CEST Peter Zijlstra wrote:
> On Thu, Sep 07, 2017 at 11:13:38AM +0200, Peter Zijlstra wrote:
> > Subject: sched/cpuset/pm: Fix cpuset vs suspend-resume
> > 
> > Cpusets vs suspend-resume is _completely_ broken. And it got noticed
> > because it now resulted in non-cpuset usage breaking too.
> > 
> > On suspend cpuset_cpu_inactive() doesn't call into
> > cpuset_update_active_cpus() because it doesn't want to move tasks about,
> > there is no need, all tasks are frozen and won't run again until after
> > we've resumed everything.
> > 
> > But this means that when we finally do call into
> > cpuset_update_active_cpus() after resuming the last frozen cpu in
> > cpuset_cpu_active(), the top_cpuset will not have any difference with
> > the cpu_active_mask and this it will not in fact do _anything_.
> > 
> > So the cpuset configuration will not be restored. This was largely
> > hidden because we would unconditionally create identity domains and
> > mobile users would not in fact use cpusets much. And servers what do use
> > cpusets tend to not suspend-resume much.
> > 
> > An addition problem is that we'd not in fact wait for the cpuset work to
> > finish before resuming the tasks, allowing spurious migrations outside
> > of the specified domains.
> > 
> > Fix the rebuild by introducing cpuset_force_rebuild() and fix the
> > ordering with cpuset_wait_for_hotplug().
> > 
> > Cc: tj@kernel.org
> > Cc: rjw@rjwysocki.net
> > Cc: efault@gmx.de
> > Reported-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> TJ, I _think_ it was commit:
> 
>   deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling")
> 
> That wrecked things, but there's been so much changes in this area it is
> really hard to tell. Note how before that commit it would
> unconditionally rebuild the domains, and you 'optimized' that ;-)
> 
> That commit also introduced the work to do the async rebuild and failed
> to do that flush on resume.
> 
> In any case, I think we should put a fixes tag on this commit such that
> it gets picked up into stable kernels. Not sure anybody will try and
> backport it into 4 year old kernels, but who knows.
> 

Many thanks for fixing this!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume
  2017-09-07  9:26             ` Peter Zijlstra
  2017-09-07 10:54               ` Rafael J. Wysocki
@ 2017-09-07 20:38               ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-09-07 20:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Andy Lutomirski, Andy Lutomirski, Ingo Molnar,
	linux-kernel, Rafael J. Wysocki, Thomas Gleixner

Hello,

On Thu, Sep 07, 2017 at 11:26:16AM +0200, Peter Zijlstra wrote:
> TJ, I _think_ it was commit:
> 
>   deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling")

Heh, that's a while ago.

> That wrecked things, but there's been so much changes in this area it is
> really hard to tell. Note how before that commit it would
> unconditionally rebuild the domains, and you 'optimized' that ;-)
> 
> That commit also introduced the work to do the async rebuild and failed
> to do that flush on resume.
>
> In any case, I think we should put a fixes tag on this commit such that
> it gets picked up into stable kernels. Not sure anybody will try and
> backport it into 4 year old kernels, but who knows.

I see & sounds good to me.  I see that the patch already got applied
but FWIW,

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-06  5:13 Abysmal scheduler performance in Linus' tree? Andy Lutomirski
2017-09-06  8:25 ` Peter Zijlstra
2017-09-06  8:59   ` Andy Lutomirski
2017-09-06  9:03     ` Peter Zijlstra
2017-09-06 16:14       ` Peter Zijlstra
2017-09-07  6:15         ` Mike Galbraith
2017-09-07  7:31           ` Ingo Molnar
2017-09-07  9:13           ` [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume Peter Zijlstra
2017-09-07  9:26             ` Peter Zijlstra
2017-09-07 10:54               ` Rafael J. Wysocki
2017-09-07 20:38               ` Tejun Heo
2017-09-07 10:33             ` [tip:sched/urgent] sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs tip-bot for Peter Zijlstra
2017-09-06  9:15 ` Abysmal scheduler performance in Linus' tree? Chris Wilson
2017-09-06  9:24   ` Peter Zijlstra
     [not found]     ` <150469312649.28581.17626550155735691534@mail.alporthouse.com>
2017-09-06 10:44       ` Peter Zijlstra
2017-09-06 10:51         ` Peter Zijlstra
2017-09-07  8:16           ` [tip:sched/urgent] sched/fair: Fix wake_affine_llc() balancing rules tip-bot for Peter Zijlstra

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git