linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Bruno Wolff III <bruno@wolff.to>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Josh Boyer <jwboyer@redhat.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c
Date: Fri, 18 Jul 2014 12:16:33 +0200	[thread overview]
Message-ID: <20140718101633.GP9918@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140718053449.GA2039@wolff.to>

[-- Attachment #1: Type: text/plain, Size: 6605 bytes --]

On Fri, Jul 18, 2014 at 12:34:49AM -0500, Bruno Wolff III wrote:
> On Thu, Jul 17, 2014 at 14:35:02 +0200,
>  Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >In any case, can someone who can trigger this run with the below; its
> >'clean' for me, but supposedly you'll trigger a FAIL somewhere.
> 
> I got a couple of fail messages.
> 
> dmesg output is available in the bug as the following attachment:
> https://bugzilla.kernel.org/attachment.cgi?id=143361

Thanks!

[    0.252059] __sdt_alloc: allocated f255b020 with cpus: 
[    0.252147] __sdt_alloc: allocated f255b0e0 with cpus: 
[    0.252229] __sdt_alloc: allocated f255b120 with cpus: 
[    0.252311] __sdt_alloc: allocated f255b160 with cpus: 

[    0.252395] __sdt_alloc: allocated f255b1a0 with cpus: 
[    0.252477] __sdt_alloc: allocated f255b1e0 with cpus: 
[    0.252559] __sdt_alloc: allocated f255b220 with cpus: 
[    0.252641] __sdt_alloc: allocated f255b260 with cpus: 

[    0.253013] __sdt_alloc: allocated f255b2a0 with cpus: 
[    0.253097] __sdt_alloc: allocated f255b2e0 with cpus: 
[    0.253184] __sdt_alloc: allocated f255b320 with cpus: 
[    0.253265] __sdt_alloc: allocated f255b360 with cpus: 

[    0.253354] build_sched_groups: got group f255b020 with cpus: 
[    0.253436] build_sched_groups: got group f255b120 with cpus: 
[    0.253519] build_sched_groups: got group f255b1a0 with cpus: 
[    0.253600] build_sched_groups: got group f255b2a0 with cpus: 
[    0.253681] build_sched_groups: got group f255b2e0 with cpus: 

[    0.253762] build_sched_groups: got group f255b320 with cpus: 
[    0.253843] build_sched_groups: got group f255b360 with cpus: 
[    0.254004] build_sched_groups: got group f255b0e0 with cpus: 
[    0.254087] build_sched_groups: got group f255b160 with cpus: 
[    0.254170] build_sched_groups: got group f255b1e0 with cpus: 
[    0.254252] build_sched_groups: FAIL
[    0.254331] build_sched_groups: got group f255b1a0 with cpus: 0
[    0.255004] build_sched_groups: FAIL
[    0.255084] build_sched_groups: got group f255b1e0 with cpus: 1

So from previous msgs we know:

	CPU0	CPU1	CPU2	CPU3

D0	*		*		SMT
		*		*

D2	*	*	*	*	DIE


This gives us (from __sdt_alloc):

	020	0e0	120	160	SMT
	1a0	1e0	220	260	MC
	2a0	2e0	320	360	DIE

Given that you have a DIE domain, and MC is found degenerate, I'll
conclude that you do not have the shared L3 possible for your machine
and only have the dual socket, with 2 threads per socket.

So the domains _should_ look like:

D0	0,2	1,3	0,2	1,3
D1	0,2	1,3	0,2	1,3
D2	0,1,2,3 0,1,2,3	0,1,2,3	0,1,2,3

Assuming that, build_sched_groups(), which gets called for each cpu, for
each domain, we get:

D0g	020(0)		120(2)
D1g	1a0(0,2)
D2g	2a0(0,2)

So far so good, at this point we're in build_sched_groups, we have a
.cpu=0 @span=0-3 @covered=0,2 @i=0 and we're just about to start the
loop for @i=1.

	1 is not set in covered

	get_group(i=1, sdd, &sg)
	  @sd = *per_cpu_ptr(sdd->sd, 1); /* should be D2 for CPU1 */
	  @child = sd->child; /* should be D1 for CPU1: 1,3 */
	  @cpu = 1
	  @sg = *per_cpu_ptr(sdd->sg, 1); /* should be: 2e0 */

But instead we get 320 !?

The 2e0 group would cover 1,3, thereby increasing @cover to 0-3 and
we're done for CPU0. Instead things go on to return 360, more WTF!

So it looks like the actual domain tree is broken, and not what we
assumed it was.

Could I bother you to run with the below instead? It should also print
out the sched domain masks so we don't need to guess about them.

(make sure you have CONFIG_SCHED_DEBUG=y otherwise it will not build)

> I also booted with early printk=keepsched_debug as requested by Dietmar.

can you make that: sched_debug ?

---
 kernel/sched/core.c | 22 ++++++++++++++++++++++
 lib/vsprintf.c      |  5 +++++
 2 files changed, 27 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7bc599dc4aa4..4babcbbc11b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5857,6 +5857,17 @@ build_sched_groups(struct sched_domain *sd, int cpu)
 			continue;
 
 		group = get_group(i, sdd, &sg);
+
+		if (!cpumask_empty(sched_group_cpus(sg)))
+			printk("%s: FAIL\n", __func__);
+
+		printk("%s: got group %p with cpus: %pc\n",
+				__func__,
+				sg,
+				sched_group_cpus(sg));
+
+		cpumask_clear(sched_group_cpus(sg));
+
 		cpumask_setall(sched_group_mask(sg));
 
 		for_each_cpu(j, span) {
@@ -6418,6 +6429,11 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
 			if (!sg)
 				return -ENOMEM;
 
+			printk("%s: allocated %p with cpus: %pc\n",
+					__func__,
+					sg,
+					sched_group_cpus(sg));
+
 			sg->next = sg;
 
 			*per_cpu_ptr(sdd->sg, j) = sg;
@@ -6474,6 +6490,12 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
 	if (!sd)
 		return child;
 
+	printk("%s: cpu: %d level: %s cpu_map: %pc tl->mask: %pc\n",
+			__func__,
+			cpu, tl->name,
+			cpu_map,
+			tl->mask(cpu));
+
 	cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
 	if (child) {
 		sd->level = child->level + 1;
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 6fe2c84eb055..ac22c46fd6d0 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -28,6 +28,7 @@
 #include <linux/ioport.h>
 #include <linux/dcache.h>
 #include <linux/cred.h>
+#include <linux/cpumask.h>
 #include <net/addrconf.h>
 
 #include <asm/page.h>		/* for PAGE_SIZE */
@@ -1250,6 +1251,7 @@ int kptr_restrict __read_mostly;
  *           (default assumed to be phys_addr_t, passed by reference)
  * - 'd[234]' For a dentry name (optionally 2-4 last components)
  * - 'D[234]' Same as 'd' but for a struct file
+ * - 'c' For a cpumask list
  *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
@@ -1389,6 +1391,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		return dentry_name(buf, end,
 				   ((const struct file *)ptr)->f_path.dentry,
 				   spec, fmt);
+	case 'c':
+		return buf + cpulist_scnprintf(buf, end - buf, ptr);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {
@@ -1635,6 +1639,7 @@ int format_decode(const char *fmt, struct printf_spec *spec)
  *   case.
  * %*ph[CDN] a variable-length hex string with a separator (supports up to 64
  *           bytes of the input)
+ * %pc print a cpumask as comma-separated list
  * %n is ignored
  *
  * ** Please update Documentation/printk-formats.txt when making changes **

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  parent reply	other threads:[~2014-07-18 10:16 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-16 14:55 Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c Bruno Wolff III
2014-07-16 15:17 ` Josh Boyer
2014-07-16 19:17   ` Dietmar Eggemann
2014-07-16 19:54     ` Bruno Wolff III
2014-07-16 23:18       ` Dietmar Eggemann
2014-07-17  3:09         ` Bruno Wolff III
2014-07-17  8:57           ` Dietmar Eggemann
2014-07-17  9:04             ` Peter Zijlstra
2014-07-17 11:23               ` Dietmar Eggemann
2014-07-17 12:35                 ` Peter Zijlstra
2014-07-18  5:34                   ` Bruno Wolff III
2014-07-18  9:28                     ` Dietmar Eggemann
2014-07-18 12:09                       ` Bruno Wolff III
2014-07-18 10:16                     ` Peter Zijlstra [this message]
2014-07-18 13:01                       ` Bruno Wolff III
2014-07-18 14:16                         ` Dietmar Eggemann
2014-07-18 14:16                         ` Peter Zijlstra
2014-07-18 14:50                           ` Peter Zijlstra
2014-07-18 16:16                             ` Peter Zijlstra
2014-07-21 16:35                               ` Bruno Wolff III
2014-07-21 16:52                                 ` Peter Zijlstra
2014-07-22  9:47                                   ` Peter Zijlstra
2014-07-22 10:38                                     ` Peter Zijlstra
2014-07-22 12:10                                       ` Bruno Wolff III
2014-07-22 13:03                                         ` Peter Zijlstra
2014-07-22 13:26                                           ` Peter Zijlstra
2014-07-22 13:35                                             ` Peter Zijlstra
2014-07-22 14:09                                               ` Bruno Wolff III
2014-07-22 14:18                                                 ` Peter Zijlstra
2014-07-23  1:37                                                   ` Bruno Wolff III
2014-07-23  6:51                                                     ` Peter Zijlstra
2014-07-22 17:05                                               ` H. Peter Anvin
2014-07-23 15:11                                               ` Peter Zijlstra
2014-07-23 15:12                                                 ` H. Peter Anvin
2014-07-24  1:45                                                 ` Bruno Wolff III
2014-07-23 15:39                                               ` [tip:x86/urgent] x86, cpu: Fix cache topology for early P4-SMT tip-bot for Peter Zijlstra
2014-07-22 12:12                                     ` Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c Dietmar Eggemann
2014-07-22 12:57                                     ` Bruno Wolff III
2014-07-28  8:28                                     ` [tip:sched/core] sched: Robustify topology setup tip-bot for Peter Zijlstra
2014-07-17 16:36             ` Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c Bruno Wolff III
2014-07-17 18:43               ` Dietmar Eggemann
2014-07-17 18:54                 ` Bruno Wolff III
2014-07-17  4:21         ` Bruno Wolff III
2014-07-17  4:28     ` Bruno Wolff III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140718101633.GP9918@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bruno@wolff.to \
    --cc=dietmar.eggemann@arm.com \
    --cc=jwboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).