linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 5/7] perf: Optimise topology iteration
@ 2010-12-27 15:38 Lin Ming
  2011-01-03 11:02 ` Peter Zijlstra
  2011-01-04 12:06 ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Lin Ming @ 2010-12-27 15:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Andi Kleen, Stephane Eranian,
	robert.richter
  Cc: lkml

Currently we iterate the full machine looking for a matching core_id/nb
for the percore and the amd northbridge stuff , using a smaller topology
mask makes sense. 

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
 arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 67e2202..5a3b7b8 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
 	nb_id = amd_get_nb_id(cpu);
 	WARN_ON_ONCE(nb_id == BAD_APICID);
 
-	for_each_online_cpu(i) {
+	for_each_cpu(i, topology_core_cpumask(cpu)) {
 		nb = per_cpu(cpu_hw_events, i).amd_nb;
 		if (WARN_ON_ONCE(!nb))
 			continue;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 354d1de..ad70c2c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1111,7 +1111,7 @@ static void intel_pmu_cpu_starting(int cpu)
 	if (!ht_enabled(cpu))
 		return;
 
-	for_each_online_cpu(i) {
+	for_each_cpu(i, topology_thread_cpumask(cpu)) {
 		struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
 
 		if (pc && pc->core_id == core_id) {
-- 
1.7.3






^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2010-12-27 15:38 [PATCH 5/7] perf: Optimise topology iteration Lin Ming
@ 2011-01-03 11:02 ` Peter Zijlstra
  2011-01-03 15:20   ` Andi Kleen
  2011-01-04  6:18   ` Lin Ming
  2011-01-04 12:06 ` Peter Zijlstra
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2011-01-03 11:02 UTC (permalink / raw)
  To: Lin Ming; +Cc: Ingo Molnar, Andi Kleen, Stephane Eranian, robert.richter, lkml

On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> Currently we iterate the full machine looking for a matching core_id/nb
> for the percore and the amd northbridge stuff , using a smaller topology
> mask makes sense. 

Does topology_thread_cpumask() include offline cpus? I tried looking at
it, but I cannot find any code clearing bits in that mask on offline.

> Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> ---
>  arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
>  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 67e2202..5a3b7b8 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
>  	nb_id = amd_get_nb_id(cpu);
>  	WARN_ON_ONCE(nb_id == BAD_APICID);
>  
> -	for_each_online_cpu(i) {
> +	for_each_cpu(i, topology_core_cpumask(cpu)) {
>  		nb = per_cpu(cpu_hw_events, i).amd_nb;
>  		if (WARN_ON_ONCE(!nb))
>  			continue;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 354d1de..ad70c2c 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1111,7 +1111,7 @@ static void intel_pmu_cpu_starting(int cpu)
>  	if (!ht_enabled(cpu))
>  		return;
>  
> -	for_each_online_cpu(i) {
> +	for_each_cpu(i, topology_thread_cpumask(cpu)) {
>  		struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
>  
>  		if (pc && pc->core_id == core_id) {



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-03 11:02 ` Peter Zijlstra
@ 2011-01-03 15:20   ` Andi Kleen
  2011-01-04  7:13     ` Lin Ming
  2011-01-04  6:18   ` Lin Ming
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2011-01-03 15:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lin Ming, Ingo Molnar, Andi Kleen, Stephane Eranian,
	robert.richter, lkml

On Mon, Jan 03, 2011 at 12:02:10PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> > Currently we iterate the full machine looking for a matching core_id/nb
> > for the percore and the amd northbridge stuff , using a smaller topology
> > mask makes sense. 
> 
> Does topology_thread_cpumask() include offline cpus? I tried looking at
> it, but I cannot find any code clearing bits in that mask on offline.

The problem is not only at offline, but also at online between CPUs
going online.  I don't think the patch is a good idea and it doesn't
even have any advantages either since this is a initialization only
slow path.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-03 11:02 ` Peter Zijlstra
  2011-01-03 15:20   ` Andi Kleen
@ 2011-01-04  6:18   ` Lin Ming
  1 sibling, 0 replies; 9+ messages in thread
From: Lin Ming @ 2011-01-04  6:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Stephane Eranian, robert.richter, lkml

On Mon, 2011-01-03 at 19:02 +0800, Peter Zijlstra wrote:
> On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> > Currently we iterate the full machine looking for a matching core_id/nb
> > for the percore and the amd northbridge stuff , using a smaller topology
> > mask makes sense. 
> 
> Does topology_thread_cpumask() include offline cpus? I tried looking at
> it, but I cannot find any code clearing bits in that mask on offline.

No, it does not include offline cpus.
For x86 code, remove_siblinginfo() clears the bits.

take_cpu_down ->
  __cpu_disable ->
    native_cpu_disable ->
      cpu_disable_common ->
        remove_siblinginfo

static void remove_siblinginfo(int cpu)
{
        int sibling;
        struct cpuinfo_x86 *c = &cpu_data(cpu);

        for_each_cpu(sibling, cpu_core_mask(cpu)) {
                cpumask_clear_cpu(cpu, cpu_core_mask(sibling));
                /*/
                 * last thread sibling in this cpu core going down
                 */
                if (cpumask_weight(cpu_sibling_mask(cpu)) == 1)
                        cpu_data(sibling).booted_cores--;
        }

        for_each_cpu(sibling, cpu_sibling_mask(cpu))
                cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
        cpumask_clear(cpu_sibling_mask(cpu));
        cpumask_clear(cpu_core_mask(cpu));
        c->phys_proc_id = 0;
        c->cpu_core_id = 0;
        cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
}

Lin Ming

> 
> > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > ---
> >  arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
> >  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index 67e2202..5a3b7b8 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
> >  	nb_id = amd_get_nb_id(cpu);
> >  	WARN_ON_ONCE(nb_id == BAD_APICID);
> >  
> > -	for_each_online_cpu(i) {
> > +	for_each_cpu(i, topology_core_cpumask(cpu)) {
> >  		nb = per_cpu(cpu_hw_events, i).amd_nb;
> >  		if (WARN_ON_ONCE(!nb))
> >  			continue;
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> > index 354d1de..ad70c2c 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > @@ -1111,7 +1111,7 @@ static void intel_pmu_cpu_starting(int cpu)
> >  	if (!ht_enabled(cpu))
> >  		return;
> >  
> > -	for_each_online_cpu(i) {
> > +	for_each_cpu(i, topology_thread_cpumask(cpu)) {
> >  		struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
> >  
> >  		if (pc && pc->core_id == core_id) {
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-03 15:20   ` Andi Kleen
@ 2011-01-04  7:13     ` Lin Ming
  0 siblings, 0 replies; 9+ messages in thread
From: Lin Ming @ 2011-01-04  7:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, robert.richter, lkml

On Mon, 2011-01-03 at 23:20 +0800, Andi Kleen wrote:
> On Mon, Jan 03, 2011 at 12:02:10PM +0100, Peter Zijlstra wrote:
> > On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> > > Currently we iterate the full machine looking for a matching core_id/nb
> > > for the percore and the amd northbridge stuff , using a smaller topology
> > > mask makes sense. 
> > 
> > Does topology_thread_cpumask() include offline cpus? I tried looking at
> > it, but I cannot find any code clearing bits in that mask on offline.
> 
> The problem is not only at offline, but also at online between CPUs
> going online.  I don't think the patch is a good idea and it doesn't

I didn't see the problem.

Assume logical cpu 3, 7 are 2 threads in a core, and they are
plug/unpluged as below sequence,
CPU 3 offline, CPU7 offline, CPU3 online, CPU7 online

1. After cpu3 offline

topology_thread_cpumask(3) returns empty
topology_thread_cpumask(7) returns 7

2. After CPU7 offline

topology_thread_cpumask(3) returns empty
topology_thread_cpumask(7) returns empty

3. When CPU3 online, calling intel_pmu_cpu_starting

topology_thread_cpumask(3) returns 3
topology_thread_cpumask(7) returns empty

        for_each_cpu(i, topology_thread_cpumask(cpu)) {
                struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;

                if (pc && pc->core_id == core_id) {
                        kfree(cpuc->per_core);
                        cpuc->per_core = pc;
                        break;
                }
        }

Above "if" statement will not be executed, because pc->core_id was
initialized to -1 in intel_pmu_cpu_prepare.

4. When CPU7 online, calling intel_pmu_cpu_starting

topology_thread_cpumask(3) returns 3, 7
topology_thread_cpumask(7) returns 3, 7


        for_each_cpu(i, topology_thread_cpumask(cpu)) {
                struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;

                if (pc && pc->core_id == core_id) {
                        kfree(cpuc->per_core);
                        cpuc->per_core = pc;
                        break;
                }
        }

        cpuc->per_core->core_id = core_id;
        cpuc->per_core->refcnt++;

Above "if" statement will be executed and the per_core data allocated
for cpu7 will be freed.

All above is right, or could you explain more about the problem at CPUs
offline and online?

Thanks,
Lin Ming

> even have any advantages either since this is a initialization only
> slow path.
> 
> -Andi



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2010-12-27 15:38 [PATCH 5/7] perf: Optimise topology iteration Lin Ming
  2011-01-03 11:02 ` Peter Zijlstra
@ 2011-01-04 12:06 ` Peter Zijlstra
  2011-01-04 14:22   ` Borislav Petkov
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2011-01-04 12:06 UTC (permalink / raw)
  To: Lin Ming
  Cc: Ingo Molnar, Andi Kleen, Stephane Eranian, robert.richter, lkml,
	Borislav Petkov

On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> Currently we iterate the full machine looking for a matching core_id/nb
> for the percore and the amd northbridge stuff , using a smaller topology
> mask makes sense. 
> 
> Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> ---
>  arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
>  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 67e2202..5a3b7b8 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
>  	nb_id = amd_get_nb_id(cpu);
>  	WARN_ON_ONCE(nb_id == BAD_APICID);
>  
> -	for_each_online_cpu(i) {
> +	for_each_cpu(i, topology_core_cpumask(cpu)) {
>  		nb = per_cpu(cpu_hw_events, i).amd_nb;
>  		if (WARN_ON_ONCE(!nb))
>  			continue;

Borislav, is topology_core_cpumask() the right mask for northbridge_id
span? I could imagine Magny-Cours would have all 12 cores in the
core_cpumask() and have the node_mask() be half that.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-04 12:06 ` Peter Zijlstra
@ 2011-01-04 14:22   ` Borislav Petkov
  2011-01-05  5:24     ` Lin Ming
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2011-01-04 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lin Ming, Ingo Molnar, Andi Kleen, Stephane Eranian, Richter,
	Robert, lkml, Andreas Herrmann

Adding Andreas since this is his code.

On Tue, Jan 04, 2011 at 07:06:16AM -0500, Peter Zijlstra wrote:
> On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> > Currently we iterate the full machine looking for a matching core_id/nb
> > for the percore and the amd northbridge stuff , using a smaller topology
> > mask makes sense. 
> > 
> > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > ---
> >  arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
> >  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index 67e2202..5a3b7b8 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
> >  	nb_id = amd_get_nb_id(cpu);
> >  	WARN_ON_ONCE(nb_id == BAD_APICID);
> >  
> > -	for_each_online_cpu(i) {
> > +	for_each_cpu(i, topology_core_cpumask(cpu)) {
> >  		nb = per_cpu(cpu_hw_events, i).amd_nb;
> >  		if (WARN_ON_ONCE(!nb))
> >  			continue;
> 
> Borislav, is topology_core_cpumask() the right mask for northbridge_id
> span? I could imagine Magny-Cours would have all 12 cores in the
> core_cpumask() and have the node_mask() be half that.

So, topology_core_cpumask() or cpu_core_mask() both are cpu_core_map
which represents the socket mask. I.e., on a multisocket cpu you'll have
in it all the cores on one socket. A 12-cores Magny-Cours contains two
internal northbridges and this mask will have 12 bits set.

AFAICT, you want to iterate over the cores on a single node here
(an internal node in the Magny-Cours case) so for this we have the
llc_shared_map. See near the top of cache_shared_cpu_map_setup() in
<arch/x86/kernel/cpu/intel_cacheinfo.c> for an example.

node_mask() is roughly the same but contains correct info only with
CONFIG_NUMA on and correct SRAT table on the system.

HTH.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-04 14:22   ` Borislav Petkov
@ 2011-01-05  5:24     ` Lin Ming
  2011-01-05  9:51       ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Lin Ming @ 2011-01-05  5:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Stephane Eranian,
	Richter, Robert, lkml, Andreas Herrmann

On Tue, 2011-01-04 at 22:22 +0800, Borislav Petkov wrote:
> Adding Andreas since this is his code.
> 
> On Tue, Jan 04, 2011 at 07:06:16AM -0500, Peter Zijlstra wrote:
> > On Mon, 2010-12-27 at 23:38 +0800, Lin Ming wrote:
> > > Currently we iterate the full machine looking for a matching core_id/nb
> > > for the percore and the amd northbridge stuff , using a smaller topology
> > > mask makes sense. 
> > > 
> > > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > > ---
> > >  arch/x86/kernel/cpu/perf_event_amd.c   |    2 +-
> > >  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > > index 67e2202..5a3b7b8 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > > @@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
> > >  	nb_id = amd_get_nb_id(cpu);
> > >  	WARN_ON_ONCE(nb_id == BAD_APICID);
> > >  
> > > -	for_each_online_cpu(i) {
> > > +	for_each_cpu(i, topology_core_cpumask(cpu)) {
> > >  		nb = per_cpu(cpu_hw_events, i).amd_nb;
> > >  		if (WARN_ON_ONCE(!nb))
> > >  			continue;
> > 
> > Borislav, is topology_core_cpumask() the right mask for northbridge_id
> > span? I could imagine Magny-Cours would have all 12 cores in the
> > core_cpumask() and have the node_mask() be half that.
> 
> So, topology_core_cpumask() or cpu_core_mask() both are cpu_core_map
> which represents the socket mask. I.e., on a multisocket cpu you'll have
> in it all the cores on one socket. A 12-cores Magny-Cours contains two
> internal northbridges and this mask will have 12 bits set.
> 
> AFAICT, you want to iterate over the cores on a single node here
> (an internal node in the Magny-Cours case) so for this we have the
> llc_shared_map. See near the top of cache_shared_cpu_map_setup() in
> <arch/x86/kernel/cpu/intel_cacheinfo.c> for an example.

cpu_coregroup_mask() seems the right mask for northbridge_id span.

arch/x86/kernel/smpboot.c:

/* maps the cpu to the sched domain representing multi-core */
const struct cpumask *cpu_coregroup_mask(int cpu)
{
        struct cpuinfo_x86 *c = &cpu_data(cpu);
        /*
         * For perf, we return last level cache shared map.
         * And for power savings, we return cpu_core_map
         */
        if ((sched_mc_power_savings || sched_smt_power_savings) &&
            !(cpu_has(c, X86_FEATURE_AMD_DCM)))
                return cpu_core_mask(cpu);
        else
                return c->llc_shared_map;
}


> 
> node_mask() is roughly the same but contains correct info only with
> CONFIG_NUMA on and correct SRAT table on the system.
> 
> HTH.
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] perf: Optimise topology iteration
  2011-01-05  5:24     ` Lin Ming
@ 2011-01-05  9:51       ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2011-01-05  9:51 UTC (permalink / raw)
  To: Lin Ming
  Cc: Borislav Petkov, Ingo Molnar, Andi Kleen, Stephane Eranian,
	Richter, Robert, lkml, Andreas Herrmann

On Wed, 2011-01-05 at 13:24 +0800, Lin Ming wrote:
> > > > + for_each_cpu(i, topology_core_cpumask(cpu)) {
> > > >           nb = per_cpu(cpu_hw_events, i).amd_nb;
> > > >           if (WARN_ON_ONCE(!nb))
> > > >                   continue;
> > > 
> > > Borislav, is topology_core_cpumask() the right mask for northbridge_id
> > > span? I could imagine Magny-Cours would have all 12 cores in the
> > > core_cpumask() and have the node_mask() be half that.
> > 
> > So, topology_core_cpumask() or cpu_core_mask() both are cpu_core_map
> > which represents the socket mask. I.e., on a multisocket cpu you'll have
> > in it all the cores on one socket. A 12-cores Magny-Cours contains two
> > internal northbridges and this mask will have 12 bits set.
> > 
> > AFAICT, you want to iterate over the cores on a single node here
> > (an internal node in the Magny-Cours case) so for this we have the
> > llc_shared_map. See near the top of cache_shared_cpu_map_setup() in
> > <arch/x86/kernel/cpu/intel_cacheinfo.c> for an example.
> 
> cpu_coregroup_mask() seems the right mask for northbridge_id span.
> 
> arch/x86/kernel/smpboot.c:
> 
> /* maps the cpu to the sched domain representing multi-core */
> const struct cpumask *cpu_coregroup_mask(int cpu)
> {
>         struct cpuinfo_x86 *c = &cpu_data(cpu);
>         /*
>          * For perf, we return last level cache shared map.
>          * And for power savings, we return cpu_core_map
>          */
>         if ((sched_mc_power_savings || sched_smt_power_savings) &&
>             !(cpu_has(c, X86_FEATURE_AMD_DCM)))
>                 return cpu_core_mask(cpu);
>         else
>                 return c->llc_shared_map;
> } 

Argh, that function really must die, its the most horrible brain damage
around. Andreas promised he'd clean that up after making it worse for
Magny-Cours.

But yes, assuming all Magny-Cours have this AMD_DCM thing set, it seems
to return the right map.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-01-05  9:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-27 15:38 [PATCH 5/7] perf: Optimise topology iteration Lin Ming
2011-01-03 11:02 ` Peter Zijlstra
2011-01-03 15:20   ` Andi Kleen
2011-01-04  7:13     ` Lin Ming
2011-01-04  6:18   ` Lin Ming
2011-01-04 12:06 ` Peter Zijlstra
2011-01-04 14:22   ` Borislav Petkov
2011-01-05  5:24     ` Lin Ming
2011-01-05  9:51       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).