All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
@ 2012-05-29 13:54 Borislav Petkov
  2012-05-29 14:51 ` Peter Zijlstra
  2012-06-14  8:39 ` [tip:x86/urgent] x86/smp: Fix topology checks on AMD MCM CPUs tip-bot for Borislav Petkov
  0 siblings, 2 replies; 16+ messages in thread
From: Borislav Petkov @ 2012-05-29 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Andreas Herrmann, LKML

Dudes,

I'm getting the warning below on current linus. AFAICT, it is caused by

static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
        if (c->phys_proc_id == o->phys_proc_id)
                return topology_sane(c, o, "mc");

        return false;
}

and the reason is, IMHO, that because this is a MCM box which has two
nodes in one physical package, i.e., phys_proc_id is 0 on both CPU6 and
CPU0 but it has two internal nodes, 0 and 1 and CPUs 0-5 are on node 0
and CPUs 6-11 are on node 1, the warning fires.

Maybe we could do something like this untested hunk:

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 433529e29be4..e52538cd48bb 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -348,7 +348,8 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
        if (c->phys_proc_id == o->phys_proc_id)
-               return topology_sane(c, o, "mc");
+               if (!cpu_has(c, X86_FEATURE_AMD_DCM))
+                       return topology_sane(c, o, "mc");
 
        return false;
 }

or you have a better idea...?

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 13:54 WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() Borislav Petkov
@ 2012-05-29 14:51 ` Peter Zijlstra
  2012-05-29 15:29   ` Andreas Herrmann
  2012-06-14  8:39 ` [tip:x86/urgent] x86/smp: Fix topology checks on AMD MCM CPUs tip-bot for Borislav Petkov
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-05-29 14:51 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ingo Molnar, Andreas Herrmann, LKML, hpa, Thomas Gleixner

On Tue, 2012-05-29 at 15:54 +0200, Borislav Petkov wrote:
> Dudes,
> 
> I'm getting the warning below on current linus. AFAICT, it is caused by
> 
> static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
>         if (c->phys_proc_id == o->phys_proc_id)
>                 return topology_sane(c, o, "mc");
> 
>         return false;
> }
> 
> and the reason is, IMHO, that because this is a MCM box which has two
> nodes in one physical package, i.e., phys_proc_id is 0 on both CPU6 and
> CPU0 but it has two internal nodes, 0 and 1 and CPUs 0-5 are on node 0
> and CPUs 6-11 are on node 1, the warning fires.
> 
> Maybe we could do something like this untested hunk:
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 433529e29be4..e52538cd48bb 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -348,7 +348,8 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
>         if (c->phys_proc_id == o->phys_proc_id)
> -               return topology_sane(c, o, "mc");
> +               if (!cpu_has(c, X86_FEATURE_AMD_DCM))
> +                       return topology_sane(c, o, "mc");
>  
>         return false;
>  }
> 
> or you have a better idea...?

Ah,.. uhm.. unfortunate this... we only seem to use cpu_core_mask for
topology_core_cpumask() and its purpose is to enumerate cores in a
package for some very limited generic functions.

Its a bit sad we defined it thus, the multi-core concept only really
make sense if you share caches, otherwise its just smp.

Also, our generic topology as defined doesn't match nodes. Which is
weird to say the least.

I'd almost be tempted to say you should fake phys_id, but I can only
imagine what all would explode if we'd do that :-)

Yeah, I guess we should do the thing you propose, unless someone else
has a sane idea?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 14:51 ` Peter Zijlstra
@ 2012-05-29 15:29   ` Andreas Herrmann
  2012-05-29 16:59     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Herrmann @ 2012-05-29 15:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Borislav Petkov, Ingo Molnar, LKML, hpa, Thomas Gleixner

On Tue, May 29, 2012 at 04:51:46PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-29 at 15:54 +0200, Borislav Petkov wrote:
> > Dudes,
> > 
> > I'm getting the warning below on current linus. AFAICT, it is caused by
> > 
> > static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> > {
> >         if (c->phys_proc_id == o->phys_proc_id)
> >                 return topology_sane(c, o, "mc");
> > 
> >         return false;
> > }
> > 
> > and the reason is, IMHO, that because this is a MCM box which has two
> > nodes in one physical package, i.e., phys_proc_id is 0 on both CPU6 and
> > CPU0 but it has two internal nodes, 0 and 1 and CPUs 0-5 are on node 0
> > and CPUs 6-11 are on node 1, the warning fires.
> > 
> > Maybe we could do something like this untested hunk:
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 433529e29be4..e52538cd48bb 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -348,7 +348,8 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> >  static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> >  {
> >         if (c->phys_proc_id == o->phys_proc_id)
> > -               return topology_sane(c, o, "mc");
> > +               if (!cpu_has(c, X86_FEATURE_AMD_DCM))
> > +                       return topology_sane(c, o, "mc");
> >  
> >         return false;
> >  }
> > 
> > or you have a better idea...?
> 
> Ah,.. uhm.. unfortunate this... we only seem to use cpu_core_mask for
> topology_core_cpumask() and its purpose is to enumerate cores in a
> package for some very limited generic functions.
> 
> Its a bit sad we defined it thus, the multi-core concept only really
> make sense if you share caches, otherwise its just smp.
> 
> Also, our generic topology as defined doesn't match nodes. Which is
> weird to say the least.
> 
> I'd almost be tempted to say you should fake phys_id, but I can only
> imagine what all would explode if we'd do that :-)
> 
> Yeah, I guess we should do the thing you propose, unless someone else
> has a sane idea?

I've also looked at this. core_siblings mask is broken with this patch.
And there is this new irritating warning ...

I second Boris' suggestion for a fix. But I think the check for
X86_FEATURE_AMD_DCM should go into topology_sane() which in theory
could check other things as well.


Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 15:29   ` Andreas Herrmann
@ 2012-05-29 16:59     ` Peter Zijlstra
  2012-05-29 17:13       ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-05-29 16:59 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: Borislav Petkov, Ingo Molnar, LKML, hpa, Thomas Gleixner

On Tue, 2012-05-29 at 17:29 +0200, Andreas Herrmann wrote:

> I've also looked at this. core_siblings mask is broken with this patch.
> And there is this new irritating warning ...

Hehe, you made this irritating hardware ;-) But fair enough.

> I second Boris' suggestion for a fix. But I think the check for
> X86_FEATURE_AMD_DCM should go into topology_sane() which in theory
> could check other things as well.

Unless you plan to go span cache (or even SMT siblings) over physical
IDs I'd strongly argue against putting it in topology_sane().

As it stands I think we should discuss the definition for the generic
topology bits (drivers/base/topology.c), because I think your
Magny-Cours thing does the wrong thing here.

The core span in a phys_id is all nice and such, but what does it mean?
IOW what would you do with it?

I would think the LLC range and the node-span are much more useful
things to have. Once you have nodes the sysfs node topology takes over.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 16:59     ` Peter Zijlstra
@ 2012-05-29 17:13       ` Borislav Petkov
  2012-05-29 17:25         ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2012-05-29 17:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andreas Herrmann, Ingo Molnar, LKML, hpa, Thomas Gleixner

On Tue, May 29, 2012 at 06:59:03PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-29 at 17:29 +0200, Andreas Herrmann wrote:
> 
> > I've also looked at this. core_siblings mask is broken with this patch.
> > And there is this new irritating warning ...
> 
> Hehe, you made this irritating hardware ;-) But fair enough.
> 
> > I second Boris' suggestion for a fix. But I think the check for
> > X86_FEATURE_AMD_DCM should go into topology_sane() which in theory
> > could check other things as well.
> 
> Unless you plan to go span cache (or even SMT siblings) over physical
> IDs I'd strongly argue against putting it in topology_sane().

Nah, the check goes in match_mc - we just talked it over with Andreas.

> As it stands I think we should discuss the definition for the generic
> topology bits (drivers/base/topology.c), because I think your
> Magny-Cours thing does the wrong thing here.

"wrong" is such a strong word :-) Please elaborate and I'll have a look.

> The core span in a phys_id is all nice and such, but what does it mean?

AFAICT, this is the physical package id to which the cores belong, i.e.
physical socket.

> IOW what would you do with it?

Shoot empty cans with it... :-)

Andreas?

> I would think the LLC range and the node-span are much more useful
> things to have. Once you have nodes the sysfs node topology takes
> over.

Yes, the LLC range should be the cores belonging to an internal node and
the node-span is the cores belonging to a physical socket. I think you
can compute anything else from those topo-wise.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 17:13       ` Borislav Petkov
@ 2012-05-29 17:25         ` Peter Zijlstra
  2012-05-29 17:48           ` Andreas Herrmann
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-05-29 17:25 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andreas Herrmann, Ingo Molnar, LKML, hpa, Thomas Gleixner

On Tue, 2012-05-29 at 19:13 +0200, Borislav Petkov wrote:
> 
> > As it stands I think we should discuss the definition for the generic
> > topology bits (drivers/base/topology.c), because I think your
> > Magny-Cours thing does the wrong thing here.
> 
> "wrong" is such a strong word :-) Please elaborate and I'll have a look.

Right, so I meant LLC is the useful mask, and in my mind LLC is what
makes a multi-core, without shared cache its just SMP. So core_siblings
to me would mean LLC sharing cores.

But its all very subjective I guess, but using strong words gets the
discussion going better ;-)

> > The core span in a phys_id is all nice and such, but what does it mean?
> 
> AFAICT, this is the physical package id to which the cores belong, i.e.
> physical socket.
> 
> > IOW what would you do with it?
> 
> Shoot empty cans with it... :-) 

Right, I actually came up with proper use-case, physical hotplug :-)

Its not immediately obvious the sysfs topo bits have the llc mask, which
is the more 'useful' one IMO.

Another funny case I don't see represented well is where there's
multiple sockets to a node -- I know this is like ancient tech and
unlikely in these days of multi-node sockets, but still ;-)

I guess what I'm asking is what is the purpose of the sys topo bits?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
  2012-05-29 17:25         ` Peter Zijlstra
@ 2012-05-29 17:48           ` Andreas Herrmann
  2012-06-04 12:41             ` [PATCH] x86, smp: Fix topology checks on AMD MCM Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Herrmann @ 2012-05-29 17:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Borislav Petkov, Ingo Molnar, LKML, hpa, Thomas Gleixner

On Tue, May 29, 2012 at 07:25:19PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-29 at 19:13 +0200, Borislav Petkov wrote:
> > 
> > > As it stands I think we should discuss the definition for the generic
> > > topology bits (drivers/base/topology.c), because I think your
> > > Magny-Cours thing does the wrong thing here.
> > 
> > "wrong" is such a strong word :-) Please elaborate and I'll have a look.
> 
> Right, so I meant LLC is the useful mask, and in my mind LLC is what
> makes a multi-core, without shared cache its just SMP. So core_siblings
> to me would mean LLC sharing cores.
> 
> But its all very subjective I guess, but using strong words gets the
> discussion going better ;-)
> 
> > > The core span in a phys_id is all nice and such, but what does it mean?
> > 
> > AFAICT, this is the physical package id to which the cores belong, i.e.
> > physical socket.
> > 
> > > IOW what would you do with it?
> > 
> > Shoot empty cans with it... :-) 
> 
> Right, I actually came up with proper use-case, physical hotplug :-)
> 
> Its not immediately obvious the sysfs topo bits have the llc mask, which
> is the more 'useful' one IMO.
> 
> Another funny case I don't see represented well is where there's
> multiple sockets to a node -- I know this is like ancient tech and
> unlikely in these days of multi-node sockets, but still ;-)
> 
> I guess what I'm asking is what is the purpose of the sys topo bits?

You mean the topology directory for each CPU in sysfs?

Where else do you find reliable/complete CPU topology information for
you system? (And yes I know that the info provided in this directory
is already incomplete for AMD multi-node CPUs.)

AFAIK hwloc relies on this information. (Together with cache topology
information in the cache sub-directory for each CPU.)

And if the question was why does it matter to know which cores belong
to which physical socket. What if you want to pin all your tasks to
both nodes of the same socket on AMD mult-node CPUs? Use
core_siblings/core_siblings_list provided in sysfs, use it as
parameter for a tool like taskset and you are done with it.


Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-05-29 17:48           ` Andreas Herrmann
@ 2012-06-04 12:41             ` Borislav Petkov
  2012-06-04 12:43               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2012-06-04 12:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: X86-ML, LKML, Borislav Petkov, Peter Zijlstra, Andreas Herrmann

From: Borislav Petkov <borislav.petkov@amd.com>

The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/smpboot.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f56f96da77f5..912e5cac61d8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -350,7 +350,8 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	if (c->phys_proc_id == o->phys_proc_id)
-		return topology_sane(c, o, "mc");
+		if (!cpu_has(c, X86_FEATURE_AMD_DCM))
+			return topology_sane(c, o, "mc");
 
 	return false;
 }
-- 
1.7.9.3.362.g71319


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 12:41             ` [PATCH] x86, smp: Fix topology checks on AMD MCM Borislav Petkov
@ 2012-06-04 12:43               ` Peter Zijlstra
  2012-06-04 13:37                 ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-06-04 12:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, X86-ML, LKML, Borislav Petkov, Andreas Herrmann

On Mon, 2012-06-04 at 14:41 +0200, Borislav Petkov wrote:
> The warning below triggers on AMD MCM packages because physical
> package
> IDs on the cores of a _physical_ socket are the same. I.e., this field
> says which CPUs belong to the same physical package.
> 
Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 12:43               ` Peter Zijlstra
@ 2012-06-04 13:37                 ` Borislav Petkov
  2012-06-04 13:38                   ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2012-06-04 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, X86-ML, LKML, Andreas Herrmann

On Mon, Jun 04, 2012 at 02:43:50PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-06-04 at 14:41 +0200, Borislav Petkov wrote:
> > The warning below triggers on AMD MCM packages because physical
> > package
> > IDs on the cores of a _physical_ socket are the same. I.e., this field
> > says which CPUs belong to the same physical package.
> > 
> Thanks!

This is still crap.

It breaks tests like those in powernow-k8:

        if (cpumask_first(cpu_core_mask(data->cpu)) == data->cpu)
                print_basics(data);

and it calls print_basics() on each core filling up dmesg with:

[    5.216925] powernow-k8:    0 : pstate 0 (1700 MHz)
[    5.222139] powernow-k8:    1 : pstate 1 (1300 MHz)
[    5.227354] powernow-k8:    2 : pstate 2 (1100 MHz)
[    5.232568] powernow-k8:    3 : pstate 3 (1000 MHz)
[    5.237783] powernow-k8:    4 : pstate 4 (800 MHz)
[    5.242988] powernow-k8:    0 : pstate 0 (1700 MHz)
[    5.248242] powernow-k8:    1 : pstate 1 (1300 MHz)
[    5.253462] powernow-k8:    2 : pstate 2 (1100 MHz)
[    5.258677] powernow-k8:    3 : pstate 3 (1000 MHz)
[    5.263892] powernow-k8:    4 : pstate 4 (800 MHz)
[    5.269094] powernow-k8:    0 : pstate 0 (1700 MHz)
[    5.274348] powernow-k8:    1 : pstate 1 (1300 MHz)
[    5.279566] powernow-k8:    2 : pstate 2 (1100 MHz)
[    5.284787] powernow-k8:    3 : pstate 3 (1000 MHz)
[    5.298387] powernow-k8:    4 : pstate 4 (800 MHz)
...

while it should dump the Pstates only once per physical socket.

The proper fix should be:

static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
        if (c->phys_proc_id == o->phys_proc_id) {
                if (cpu_has(c, X86_FEATURE_AMD_DCM))
                        return true;

                return topology_sane(c, o, "mc");
        }
        return false;
}

which basically circumvents the topology check on MCM modules. I'll run
this a bit longer to verify it doesn't break anything else.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 13:37                 ` Borislav Petkov
@ 2012-06-04 13:38                   ` Peter Zijlstra
  2012-06-04 14:48                     ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-06-04 13:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ingo Molnar, X86-ML, LKML, Andreas Herrmann

On Mon, 2012-06-04 at 15:37 +0200, Borislav Petkov wrote:
> This is still crap.

*poof* and the patch is gone.. I'll wait for an update ;-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 13:38                   ` Peter Zijlstra
@ 2012-06-04 14:48                     ` Borislav Petkov
  2012-06-04 14:56                       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2012-06-04 14:48 UTC (permalink / raw)
  To: Peter Zijlstra, H. Peter Anvin, Ingo Molnar
  Cc: Borislav Petkov, X86-ML, LKML, Andreas Herrmann

On Mon, Jun 04, 2012 at 03:38:37PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-06-04 at 15:37 +0200, Borislav Petkov wrote:
> > This is still crap.
> 
> *poof* and the patch is gone.. I'll wait for an update ;-)

Thanks.

So I'm looking more into this and there's another issue IMHO:

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 9
stepping        : 1
microcode       : 0x10000d9
cpu MHz         : 800.000
cache size      : 512 KB
physical id     : 0
siblings        : 12
core id         : 2
cpu cores       : 2
^^^^^^^^^^^^^^^^^^^

This is cpuinfo_x86.booted_cores and here's how it progresses on this
MCM box (below).

On core 2 it becomes 2 and it doesn't change again. Until the next
physical socket comes (cpu 12) where it starts again from 1.

And, it is normally the number of booted cores on the socket, i.e. 12 in
this case.

Now, before we go babbling about what the right fix is, maybe we should
remove this thing completely - I mean, it looks like no one uses it, it
is only in the /proc/cpuinfo thing. Ingo, hpa?

$ grep -E "(cpu cores|processor)" /proc/cpuinfo
processor       : 0
cpu cores       : 1
processor       : 1
cpu cores       : 2
processor       : 2
cpu cores       : 2
processor       : 3
cpu cores       : 2
processor       : 4
cpu cores       : 2
processor       : 5
cpu cores       : 2
processor       : 6
cpu cores       : 2
processor       : 7
cpu cores       : 2
processor       : 8
cpu cores       : 2
processor       : 9
cpu cores       : 2
processor       : 10
cpu cores       : 2
processor       : 11
cpu cores       : 2
processor       : 12
cpu cores       : 1
processor       : 13
cpu cores       : 2
processor       : 14
cpu cores       : 2
processor       : 15
cpu cores       : 2
processor       : 16
cpu cores       : 2
processor       : 17
cpu cores       : 2
processor       : 18
cpu cores       : 2
processor       : 19
cpu cores       : 2
processor       : 20
cpu cores       : 2
processor       : 21
cpu cores       : 2
processor       : 22
cpu cores       : 2
processor       : 23
cpu cores       : 2


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 14:48                     ` Borislav Petkov
@ 2012-06-04 14:56                       ` Peter Zijlstra
  2012-06-04 16:01                         ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-06-04 14:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, Ingo Molnar, Borislav Petkov, X86-ML, LKML,
	Andreas Herrmann

On Mon, 2012-06-04 at 16:48 +0200, Borislav Petkov wrote:
> This is cpuinfo_x86.booted_cores and here's how it progresses on this
> MCM box (below).


 http://marc.info/?l=linux-kernel&m=133849647709656

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 14:56                       ` Peter Zijlstra
@ 2012-06-04 16:01                         ` Borislav Petkov
  2012-06-06 15:31                           ` [PATCH -v2] " Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2012-06-04 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, Borislav Petkov, X86-ML, LKML,
	Andreas Herrmann

On Mon, Jun 04, 2012 at 04:56:01PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-06-04 at 16:48 +0200, Borislav Petkov wrote:
> > This is cpuinfo_x86.booted_cores and here's how it progresses on this
> > MCM box (below).
> 
>  http://marc.info/?l=linux-kernel&m=133849647709656

Ho-hum, it workie.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH -v2] x86, smp: Fix topology checks on AMD MCM
  2012-06-04 16:01                         ` Borislav Petkov
@ 2012-06-06 15:31                           ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2012-06-06 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, X86-ML, LKML, Andreas Herrmann,
	Frank Arnold

From: Borislav Petkov <borislav.petkov@amd.com>

The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---

This is -v2, we ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

 arch/x86/kernel/smpboot.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f56f96da77f5..9432dcd86340 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -349,9 +349,12 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 
 static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id)
-		return topology_sane(c, o, "mc");
+	if (c->phys_proc_id == o->phys_proc_id) {
+		if (cpu_has(c, X86_FEATURE_AMD_DCM))
+			return true;
 
+		return topology_sane(c, o, "mc");
+	}
 	return false;
 }
 
-- 
1.7.11.rc1


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:x86/urgent] x86/smp: Fix topology checks on AMD MCM CPUs
  2012-05-29 13:54 WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() Borislav Petkov
  2012-05-29 14:51 ` Peter Zijlstra
@ 2012-06-14  8:39 ` tip-bot for Borislav Petkov
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot for Borislav Petkov @ 2012-06-14  8:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, andreas.herrmann3, tglx,
	borislav.petkov

Commit-ID:  161270fc1f9ddfc17154e0d49291472a9cdef7db
Gitweb:     http://git.kernel.org/tip/161270fc1f9ddfc17154e0d49291472a9cdef7db
Author:     Borislav Petkov <borislav.petkov@amd.com>
AuthorDate: Wed, 6 Jun 2012 17:31:26 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 13 Jun 2012 14:56:12 +0200

x86/smp: Fix topology checks on AMD MCM CPUs

The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fd019d7..c3a6bac 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -349,9 +349,12 @@ static bool __cpuinit match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 
 static bool __cpuinit match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id)
-		return topology_sane(c, o, "mc");
+	if (c->phys_proc_id == o->phys_proc_id) {
+		if (cpu_has(c, X86_FEATURE_AMD_DCM))
+			return true;
 
+		return topology_sane(c, o, "mc");
+	}
 	return false;
 }
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-06-14  8:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-29 13:54 WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() Borislav Petkov
2012-05-29 14:51 ` Peter Zijlstra
2012-05-29 15:29   ` Andreas Herrmann
2012-05-29 16:59     ` Peter Zijlstra
2012-05-29 17:13       ` Borislav Petkov
2012-05-29 17:25         ` Peter Zijlstra
2012-05-29 17:48           ` Andreas Herrmann
2012-06-04 12:41             ` [PATCH] x86, smp: Fix topology checks on AMD MCM Borislav Petkov
2012-06-04 12:43               ` Peter Zijlstra
2012-06-04 13:37                 ` Borislav Petkov
2012-06-04 13:38                   ` Peter Zijlstra
2012-06-04 14:48                     ` Borislav Petkov
2012-06-04 14:56                       ` Peter Zijlstra
2012-06-04 16:01                         ` Borislav Petkov
2012-06-06 15:31                           ` [PATCH -v2] " Borislav Petkov
2012-06-14  8:39 ` [tip:x86/urgent] x86/smp: Fix topology checks on AMD MCM CPUs tip-bot for Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.