All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
@ 2016-02-17 15:47 Thomas Gleixner
  2016-02-17 16:08 ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-17 15:47 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, x86, Stephane Eranian, Matt Fleming, Vikas Shivappa

CQM is a strict per package facility. Use the proper cpumasks to lookup the
readers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++++++++++-------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = {
 
 static inline void cqm_pick_event_reader(int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int reader, pkg = topology_physical_package_id(cpu);
 
-	for_each_cpu(i, &cqm_cpumask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;	/* already got reader for this socket */
-	}
-
-	cpumask_set_cpu(cpu, &cqm_cpumask);
+	/* First online cpu in package becomes the reader */
+	reader = cpumask_any_and(topology_core_cpumask(cpu), &uncore_cpu_mask);
+	if (reader >= nr_cpu_ids)
+		cpumask_set_cpu(cpu, &cqm_cpumask);
 }
 
 static void intel_cqm_cpu_starting(unsigned int cpu)
@@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig
 
 static void intel_cqm_cpu_exit(unsigned int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int target, pkg = topology_physical_package_id(cpu);
 
-	/*
-	 * Is @cpu a designated cqm reader?
-	 */
+	/* Is @cpu the current cqm reader for this package ? */
 	if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
 		return;
 
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-
-		if (phys_id == topology_physical_package_id(i)) {
-			cpumask_set_cpu(i, &cqm_cpumask);
-			break;
-		}
-	}
+	/* Find another online reader in this package */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (target < nr_cpu_ids)
+		cpumask_set_cpu(target, &uncore_cpu_mask);
 }
 
 static int intel_cqm_cpu_notifier(struct notifier_block *nb,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 15:47 [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups Thomas Gleixner
@ 2016-02-17 16:08 ` Thomas Gleixner
  2016-02-17 16:28   ` Matt Fleming
  2016-02-17 18:27   ` Vikas Shivappa
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-17 16:08 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, x86, Stephane Eranian, Matt Fleming, Vikas Shivappa

On Wed, 17 Feb 2016, Thomas Gleixner wrote:

> CQM is a strict per package facility. Use the proper cpumasks to lookup the
> readers.

Sorry for the noise. PEBKAC: quilt refresh missing. Correct version below.

Thanks,

	tglx

8<----------

Subject: x86/perf/cqm: Get rid of the silly for_each_cpu lookups
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sun, 14 Feb 2016 23:09:06 +0100

CQM is a strict per package facility. Use the proper cpumasks to lookup the
readers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++++++++++-------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = {
 
 static inline void cqm_pick_event_reader(int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int reader;
 
-	for_each_cpu(i, &cqm_cpumask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;	/* already got reader for this socket */
-	}
-
-	cpumask_set_cpu(cpu, &cqm_cpumask);
+	/* First online cpu in package becomes the reader */
+	reader = cpumask_any_and(topology_core_cpumask(cpu), &cqm_cpumask);
+	if (reader >= nr_cpu_ids)
+		cpumask_set_cpu(cpu, &cqm_cpumask);
 }
 
 static void intel_cqm_cpu_starting(unsigned int cpu)
@@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig
 
 static void intel_cqm_cpu_exit(unsigned int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int target;
 
-	/*
-	 * Is @cpu a designated cqm reader?
-	 */
+	/* Is @cpu the current cqm reader for this package ? */
 	if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
 		return;
 
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-
-		if (phys_id == topology_physical_package_id(i)) {
-			cpumask_set_cpu(i, &cqm_cpumask);
-			break;
-		}
-	}
+	/* Find another online reader in this package */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (target < nr_cpu_ids)
+		cpumask_set_cpu(target, &cqm_cpumask);
 }
 
 static int intel_cqm_cpu_notifier(struct notifier_block *nb,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 16:08 ` Thomas Gleixner
@ 2016-02-17 16:28   ` Matt Fleming
  2016-02-17 18:27   ` Vikas Shivappa
  1 sibling, 0 replies; 10+ messages in thread
From: Matt Fleming @ 2016-02-17 16:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Vikas Shivappa

On Wed, 17 Feb, at 05:08:57PM, Thomas Gleixner wrote:
> On Wed, 17 Feb 2016, Thomas Gleixner wrote:
> 
> > CQM is a strict per package facility. Use the proper cpumasks to lookup the
> > readers.
> 
> Sorry for the noise. PEBKAC: quilt refresh missing. Correct version below.
> 
> Thanks,
> 
> 	tglx
> 
> 8<----------
> 
> Subject: x86/perf/cqm: Get rid of the silly for_each_cpu lookups
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Sun, 14 Feb 2016 23:09:06 +0100
> 
> CQM is a strict per package facility. Use the proper cpumasks to lookup the
> readers.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++++++++++-------------------
>  1 file changed, 12 insertions(+), 22 deletions(-)

Right, I just could not make the connection between
topology_core_cpumask() and topology_physical_package_id() when I
wrote this code.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 16:08 ` Thomas Gleixner
  2016-02-17 16:28   ` Matt Fleming
@ 2016-02-17 18:27   ` Vikas Shivappa
  2016-02-17 18:31     ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Vikas Shivappa @ 2016-02-17 18:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Matt Fleming,
	Vikas Shivappa



On Wed, 17 Feb 2016, Thomas Gleixner wrote:

> On Wed, 17 Feb 2016, Thomas Gleixner wrote:
>
>> CQM is a strict per package facility. Use the proper cpumasks to lookup the
>> readers.
>
> Sorry for the noise. PEBKAC: quilt refresh missing. Correct version below.
>
> Thanks,
>
> 	tglx
>
> 8<----------
>
> Subject: x86/perf/cqm: Get rid of the silly for_each_cpu lookups
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Sun, 14 Feb 2016 23:09:06 +0100
>
> CQM is a strict per package facility. Use the proper cpumasks to lookup the
> readers.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++++++++++-------------------
> 1 file changed, 12 insertions(+), 22 deletions(-)
>
> --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
> @@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = {
>
> static inline void cqm_pick_event_reader(int cpu)
> {
> -	int phys_id = topology_physical_package_id(cpu);
> -	int i;
> +	int reader;
>
> -	for_each_cpu(i, &cqm_cpumask) {
> -		if (phys_id == topology_physical_package_id(i))
> -			return;	/* already got reader for this socket */
> -	}
> -
> -	cpumask_set_cpu(cpu, &cqm_cpumask);
> +	/* First online cpu in package becomes the reader */
> +	reader = cpumask_any_and(topology_core_cpumask(cpu), &cqm_cpumask);
> +	if (reader >= nr_cpu_ids)
> +		cpumask_set_cpu(cpu, &cqm_cpumask);

I was confused by the topology_core_cpumask naming as its really the mask of all 
cpus in the same package. I sent a patch for cqm and rapl to fix this looping 
when you had given the same feedback during CAt patches review , 
but they have never made it so far due to CAT patch issues.

Would it be better to resend the rapl (looks like perf_uncore has same loop) 
fixes separately ?

Thanks,
Vikas

> }
>
> static void intel_cqm_cpu_starting(unsigned int cpu)
> @@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig
>
> static void intel_cqm_cpu_exit(unsigned int cpu)
> {
> -	int phys_id = topology_physical_package_id(cpu);
> -	int i;
> +	int target;
>
> -	/*
> -	 * Is @cpu a designated cqm reader?
> -	 */
> +	/* Is @cpu the current cqm reader for this package ? */
> 	if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
> 		return;
>
> -	for_each_online_cpu(i) {
> -		if (i == cpu)
> -			continue;
> -
> -		if (phys_id == topology_physical_package_id(i)) {
> -			cpumask_set_cpu(i, &cqm_cpumask);
> -			break;
> -		}
> -	}
> +	/* Find another online reader in this package */
> +	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
> +
> +	if (target < nr_cpu_ids)
> +		cpumask_set_cpu(target, &cqm_cpumask);
> }
>
> static int intel_cqm_cpu_notifier(struct notifier_block *nb,
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 18:27   ` Vikas Shivappa
@ 2016-02-17 18:31     ` Thomas Gleixner
  2016-02-17 18:35       ` Thomas Gleixner
  2016-02-17 18:48       ` Vikas Shivappa
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-17 18:31 UTC (permalink / raw)
  To: Vikas Shivappa; +Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Matt Fleming

On Wed, 17 Feb 2016, Vikas Shivappa wrote:
> > -	cpumask_set_cpu(cpu, &cqm_cpumask);
> > +	/* First online cpu in package becomes the reader */
> > +	reader = cpumask_any_and(topology_core_cpumask(cpu), &cqm_cpumask);
> > +	if (reader >= nr_cpu_ids)
> > +		cpumask_set_cpu(cpu, &cqm_cpumask);
> 
> I was confused by the topology_core_cpumask naming as its really the mask of
> all cpus in the same package. I sent a patch for cqm and rapl to fix this
> looping when you had given the same feedback during CAt patches review , but
> they have never made it so far due to CAT patch issues.

Duh, totally forgot about them.
 
> Would it be better to resend the rapl (looks like perf_uncore has same loop)
> fixes separately ?

Yes, please resend the rapl one. perf_uncore is a different trainwreck which I
fixed already:

      lkml.kernel.org/r/20160217132903.767990400@linutronix.de

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 18:31     ` Thomas Gleixner
@ 2016-02-17 18:35       ` Thomas Gleixner
  2016-02-17 18:48       ` Vikas Shivappa
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-17 18:35 UTC (permalink / raw)
  To: Vikas Shivappa; +Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Matt Fleming

On Wed, 17 Feb 2016, Thomas Gleixner wrote:

> On Wed, 17 Feb 2016, Vikas Shivappa wrote:
> > > -	cpumask_set_cpu(cpu, &cqm_cpumask);
> > > +	/* First online cpu in package becomes the reader */
> > > +	reader = cpumask_any_and(topology_core_cpumask(cpu), &cqm_cpumask);
> > > +	if (reader >= nr_cpu_ids)
> > > +		cpumask_set_cpu(cpu, &cqm_cpumask);
> > 
> > I was confused by the topology_core_cpumask naming as its really the mask of
> > all cpus in the same package. I sent a patch for cqm and rapl to fix this
> > looping when you had given the same feedback during CAt patches review , but
> > they have never made it so far due to CAT patch issues.
> 
> Duh, totally forgot about them.
>  
> > Would it be better to resend the rapl (looks like perf_uncore has same loop)
> > fixes separately ?
> 
> Yes, please resend the rapl one. perf_uncore is a different trainwreck which I
> fixed already:
> 
>       lkml.kernel.org/r/20160217132903.767990400@linutronix.de

And looking at rapl, it's the same trainwreck as uncore versus error
handling. Sigh....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 18:31     ` Thomas Gleixner
  2016-02-17 18:35       ` Thomas Gleixner
@ 2016-02-17 18:48       ` Vikas Shivappa
  2016-02-17 19:09         ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Vikas Shivappa @ 2016-02-17 18:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, LKML, Peter Zijlstra, x86, Stephane Eranian,
	Matt Fleming



On Wed, 17 Feb 2016, Thomas Gleixner wrote:

> On Wed, 17 Feb 2016, Vikas Shivappa wrote:
>
> Yes, please resend the rapl one. perf_uncore is a different trainwreck which I
> fixed already:
>
>      lkml.kernel.org/r/20160217132903.767990400@linutronix.de

Ok , will resend the rapl separately.

the fix i sent however was a little different in that it uses a static 
tmp_cpumask to avoid the loop in the cpumask_and_any below -
         while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
                 if (cpumask_test_cpu(n, src2p))

But we have an extra static - static to avoid having it in the 
stack..

copy below -

+/*
+ * Temporary cpumask used during hot cpu notificaiton handling. The usage
+ * is serialized by hot cpu locks.
+ */
+static cpumask_t tmp_cpumask;
+

  static void rapl_cpu_init(int cpu)
  {
-       int i, phys_id = topology_physical_package_id(cpu);
-
-       /* check if phys_is is already covered */
-       for_each_cpu(i, &rapl_cpu_mask) {
-               if (phys_id == topology_physical_package_id(i))
-                       return;
-       }
-       /* was not found, so add it */
-       cpumask_set_cpu(cpu, &rapl_cpu_mask);
+       /* check if cpu's package is already covered.If not, add it.*/
+       cpumask_and(&tmp_cpumask, &rapl_cpu_mask, topology_core_cpumask(cpu));
+       if (cpumask_empty(&tmp_cpumask))
+               cpumask_set_cpu(cpu, &rapl_cpu_mask);
  }

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 18:48       ` Vikas Shivappa
@ 2016-02-17 19:09         ` Thomas Gleixner
  2016-02-18 20:15           ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-17 19:09 UTC (permalink / raw)
  To: Vikas Shivappa; +Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Matt Fleming

On Wed, 17 Feb 2016, Vikas Shivappa wrote:

Please stop top posting, finally!

> But we have an extra static - static to avoid having it in the stack..

It's not about the cpu mask on the stack. The reason was that with cpumask off
stack cpumask_and_mask() requires an allocation, which then can't be used in
the starting/dying callbacks.
 
Darn, you are right to remind me.

Now, the proper solution for this stuff is to provide a library function as we
need that for several drivers. No point to duplicate that functionality. I'll
cook something up and repost the uncore/cqm set tomorrow.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-17 19:09         ` Thomas Gleixner
@ 2016-02-18 20:15           ` Thomas Gleixner
  2016-02-18 23:12             ` Vikas Shivappa
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2016-02-18 20:15 UTC (permalink / raw)
  To: Vikas Shivappa; +Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Matt Fleming

On Wed, 17 Feb 2016, Thomas Gleixner wrote:
> On Wed, 17 Feb 2016, Vikas Shivappa wrote:
> 
> Please stop top posting, finally!
> 
> > But we have an extra static - static to avoid having it in the stack..
> 
> It's not about the cpu mask on the stack. The reason was that with cpumask off
> stack cpumask_and_mask() requires an allocation, which then can't be used in
> the starting/dying callbacks.
>  
> Darn, you are right to remind me.
> 
> Now, the proper solution for this stuff is to provide a library function as we
> need that for several drivers. No point to duplicate that functionality. I'll
> cook something up and repost the uncore/cqm set tomorrow.

Second thoughts on that.

cpumask_any_but() is fine as is, if we feed it topology_core_cpumask(cpu). The
worst case search is two bitmap_find_next() if the first search returned cpu.

Now cpumask_any_and() does a search as well, but the number of
bitmap_find_next() invocations is limited to the number of sockets if we feed
the cqm_cpu_mask as first argument. So for 4 or 8 sockets that's still a
reasonable limit. If the people with insane large machines care, we can
revisit that topic. It's still faster than for_each_online_cpu() :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-18 20:15           ` Thomas Gleixner
@ 2016-02-18 23:12             ` Vikas Shivappa
  0 siblings, 0 replies; 10+ messages in thread
From: Vikas Shivappa @ 2016-02-18 23:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, LKML, Peter Zijlstra, x86, Stephane Eranian,
	Matt Fleming



On Thu, 18 Feb 2016, Thomas Gleixner wrote:

> On Wed, 17 Feb 2016, Thomas Gleixner wrote:
>> On Wed, 17 Feb 2016, Vikas Shivappa wrote:
>>
>> Please stop top posting, finally!
>>
>>> But we have an extra static - static to avoid having it in the stack..
>>
>> It's not about the cpu mask on the stack. The reason was that with cpumask off
>> stack cpumask_and_mask() requires an allocation, which then can't be used in
>> the starting/dying callbacks.
>>
>> Darn, you are right to remind me.
>>
>> Now, the proper solution for this stuff is to provide a library function as we
>> need that for several drivers. No point to duplicate that functionality. I'll
>> cook something up and repost the uncore/cqm set tomorrow.
>
> Second thoughts on that.
>
> cpumask_any_but() is fine as is, if we feed it topology_core_cpumask(cpu). The
> worst case search is two bitmap_find_next() if the first search returned cpu.
>
> Now cpumask_any_and() does a search as well, but the number of
> bitmap_find_next() invocations is limited to the number of sockets if we feed
> the cqm_cpu_mask as first argument. So for 4 or 8 sockets that's still a
> reasonable limit. If the people with insane large machines care, we can
> revisit that topic. It's still faster than for_each_online_cpu() :)

Agree. if we dont care about the large number of sockets this would still be 
far better than scanning each cpu. There could be some branches we 
avoid if we are too aggressive and remove 'all' loops (the 2nd search is always 
a success if 1st one fails in cpumask_any_but)
by using the cpumask_and but they should not be much important/use 
in this case.

Will send rapl patch separately.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-18 23:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 15:47 [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups Thomas Gleixner
2016-02-17 16:08 ` Thomas Gleixner
2016-02-17 16:28   ` Matt Fleming
2016-02-17 18:27   ` Vikas Shivappa
2016-02-17 18:31     ` Thomas Gleixner
2016-02-17 18:35       ` Thomas Gleixner
2016-02-17 18:48       ` Vikas Shivappa
2016-02-17 19:09         ` Thomas Gleixner
2016-02-18 20:15           ` Thomas Gleixner
2016-02-18 23:12             ` Vikas Shivappa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.