All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
@ 2019-12-05  8:32 Srikar Dronamraju
  2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Srikar Dronamraju @ 2019-12-05  8:32 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Juri Lelli, Parth Shah, Phil Auld, Srikar Dronamraju,
	Gautham R . Shenoy, Ihor Pasichnyk, Waiman Long, linuxppc-dev

With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
This leads to wrong choice of CPU, which in-turn leads to larger wakeup
latencies. Eventually, it leads to performance regression in latency
sensitive benchmarks like soltp, schbench etc.

On Powerpc, vcpu_is_preempted only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever LPAR enters CEDE state. So any CPU
that has entered CEDE state is assumed to be preempted.

Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are suppose to own the vCPU.

On a Power9 System with 32 cores
 # lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  8
Core(s) per socket:  1
Socket(s):           16
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127

  # perf stat -a -r 5 ./schbench
v5.4                                     v5.4 + patch
Latency percentiles (usec)               Latency percentiles (usec)
	50.0000th: 45                    	50.0000th: 39
	75.0000th: 62                    	75.0000th: 53
	90.0000th: 71                    	90.0000th: 67
	95.0000th: 77                    	95.0000th: 76
	*99.0000th: 91                   	*99.0000th: 89
	99.5000th: 707                   	99.5000th: 93
	99.9000th: 6920                  	99.9000th: 118
	min=0, max=10048                 	min=0, max=211
Latency percentiles (usec)               Latency percentiles (usec)
	50.0000th: 45                    	50.0000th: 34
	75.0000th: 61                    	75.0000th: 45
	90.0000th: 72                    	90.0000th: 53
	95.0000th: 79                    	95.0000th: 56
	*99.0000th: 691                  	*99.0000th: 61
	99.5000th: 3972                  	99.5000th: 63
	99.9000th: 8368                  	99.9000th: 78
	min=0, max=16606                 	min=0, max=228
Latency percentiles (usec)               Latency percentiles (usec)
	50.0000th: 45                    	50.0000th: 34
	75.0000th: 61                    	75.0000th: 45
	90.0000th: 71                    	90.0000th: 53
	95.0000th: 77                    	95.0000th: 57
	*99.0000th: 106                  	*99.0000th: 63
	99.5000th: 2364                  	99.5000th: 68
	99.9000th: 7480                  	99.9000th: 100
	min=0, max=10001                 	min=0, max=134
Latency percentiles (usec)               Latency percentiles (usec)
	50.0000th: 45                    	50.0000th: 34
	75.0000th: 62                    	75.0000th: 46
	90.0000th: 72                    	90.0000th: 53
	95.0000th: 78                    	95.0000th: 56
	*99.0000th: 93                   	*99.0000th: 61
	99.5000th: 108                   	99.5000th: 64
	99.9000th: 6792                  	99.9000th: 85
	min=0, max=17681                 	min=0, max=121
Latency percentiles (usec)               Latency percentiles (usec)
	50.0000th: 46                    	50.0000th: 33
	75.0000th: 62                    	75.0000th: 44
	90.0000th: 73                    	90.0000th: 51
	95.0000th: 79                    	95.0000th: 54
	*99.0000th: 113                  	*99.0000th: 61
	99.5000th: 2724                  	99.5000th: 64
	99.9000th: 6184                  	99.9000th: 82
	min=0, max=9887                  	min=0, max=121

 Performance counter stats for 'system wide' (5 runs):

context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )

Waiman Long suggested using static_keys.

Reported-by: Parth Shah <parth@linux.ibm.com>
Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
Cc: Parth Shah <parth@linux.ibm.com>
Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Ack-by: Waiman Long <longman@redhat.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1 (https://patchwork.ozlabs.org/patch/1204190/) ->v3:
Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
This was suggested by Waiman Long.

 arch/powerpc/include/asm/spinlock.h | 5 +++--
 arch/powerpc/mm/numa.c              | 4 ++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index e9a960e28f3c..de817c25deff 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -35,11 +35,12 @@
 #define LOCK_TOKEN	1
 #endif
 
-#ifdef CONFIG_PPC_PSERIES
+#ifdef CONFIG_PPC_SPLPAR
+DECLARE_STATIC_KEY_FALSE(shared_processor);
 #define vcpu_is_preempted vcpu_is_preempted
 static inline bool vcpu_is_preempted(int cpu)
 {
-	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+	if (!static_branch_unlikely(&shared_processor))
 		return false;
 	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
 }
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d21ddcc..ffb971f3a63c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1568,9 +1568,13 @@ int prrn_is_enabled(void)
 	return prrn_enabled;
 }
 
+DEFINE_STATIC_KEY_FALSE(shared_processor);
+EXPORT_SYMBOL_GPL(shared_processor);
+
 void __init shared_proc_topology_init(void)
 {
 	if (lppaca_shared_proc(get_lppaca())) {
+		static_branch_enable(&shared_processor);
 		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
 			    nr_cpumask_bits);
 		numa_update_cpu_topology(false);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor
  2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
@ 2019-12-05  8:32 ` Srikar Dronamraju
  2019-12-05 13:51   ` Phil Auld
  2019-12-05 14:57   ` Waiman Long
  2019-12-05 13:48 ` [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Phil Auld
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 8+ messages in thread
From: Srikar Dronamraju @ 2019-12-05  8:32 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Juri Lelli, Parth Shah, Phil Auld, Srikar Dronamraju,
	Ihor Pasichnyk, Waiman Long, linuxppc-dev

With the static key shared processor available, is_shared_processor()
can return without having to query the lppaca structure.

Cc: Parth Shah <parth@linux.ibm.com>
Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1 (https://patchwork.ozlabs.org/patch/1204192/) ->v2:
Now that we no more refer to lppaca, remove the comment.

Changelog v2->v3:
Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
This was suggested by Waiman Long.

 arch/powerpc/include/asm/spinlock.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index de817c25deff..e83d57f27566 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -111,13 +111,8 @@ static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
 
 static inline bool is_shared_processor(void)
 {
-/*
- * LPPACA is only available on Pseries so guard anything LPPACA related to
- * allow other platforms (which include this common header) to compile.
- */
-#ifdef CONFIG_PPC_PSERIES
-	return (IS_ENABLED(CONFIG_PPC_SPLPAR) &&
-		lppaca_shared_proc(local_paca->lppaca_ptr));
+#ifdef CONFIG_PPC_SPLPAR
+	return static_branch_unlikely(&shared_processor);
 #else
 	return false;
 #endif
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
  2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
  2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
@ 2019-12-05 13:48 ` Phil Auld
  2019-12-06  9:34 ` Vaidyanathan Srinivasan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Phil Auld @ 2019-12-05 13:48 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Juri Lelli, Parth Shah, Gautham R . Shenoy, Ihor Pasichnyk,
	Waiman Long, linuxppc-dev

On Thu, Dec 05, 2019 at 02:02:17PM +0530 Srikar Dronamraju wrote:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
> 
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
> 
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
> 
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):           16
> NUMA node(s):        2
> Model:               2.2 (pvr 004e 0202)
> Model name:          POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
> 
>   # perf stat -a -r 5 ./schbench
> v5.4                                     v5.4 + patch
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 39
> 	75.0000th: 62                    	75.0000th: 53
> 	90.0000th: 71                    	90.0000th: 67
> 	95.0000th: 77                    	95.0000th: 76
> 	*99.0000th: 91                   	*99.0000th: 89
> 	99.5000th: 707                   	99.5000th: 93
> 	99.9000th: 6920                  	99.9000th: 118
> 	min=0, max=10048                 	min=0, max=211
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 79                    	95.0000th: 56
> 	*99.0000th: 691                  	*99.0000th: 61
> 	99.5000th: 3972                  	99.5000th: 63
> 	99.9000th: 8368                  	99.9000th: 78
> 	min=0, max=16606                 	min=0, max=228
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 71                    	90.0000th: 53
> 	95.0000th: 77                    	95.0000th: 57
> 	*99.0000th: 106                  	*99.0000th: 63
> 	99.5000th: 2364                  	99.5000th: 68
> 	99.9000th: 7480                  	99.9000th: 100
> 	min=0, max=10001                 	min=0, max=134
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 62                    	75.0000th: 46
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 78                    	95.0000th: 56
> 	*99.0000th: 93                   	*99.0000th: 61
> 	99.5000th: 108                   	99.5000th: 64
> 	99.9000th: 6792                  	99.9000th: 85
> 	min=0, max=17681                 	min=0, max=121
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 46                    	50.0000th: 33
> 	75.0000th: 62                    	75.0000th: 44
> 	90.0000th: 73                    	90.0000th: 51
> 	95.0000th: 79                    	95.0000th: 54
> 	*99.0000th: 113                  	*99.0000th: 61
> 	99.5000th: 2724                  	99.5000th: 64
> 	99.9000th: 6184                  	99.9000th: 82
> 	min=0, max=9887                  	min=0, max=121
> 
>  Performance counter stats for 'system wide' (5 runs):
> 
> context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
> cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
> page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
> 
> Waiman Long suggested using static_keys.
> 
> Reported-by: Parth Shah <parth@linux.ibm.com>
> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Parth Shah <parth@linux.ibm.com>
> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Tested-by: Juri Lelli <juri.lelli@redhat.com>
> Ack-by: Waiman Long <longman@redhat.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog v1 (https://patchwork.ozlabs.org/patch/1204190/) ->v3:
> Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
> This was suggested by Waiman Long.
> 
>  arch/powerpc/include/asm/spinlock.h | 5 +++--
>  arch/powerpc/mm/numa.c              | 4 ++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index e9a960e28f3c..de817c25deff 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -35,11 +35,12 @@
>  #define LOCK_TOKEN	1
>  #endif
>  
> -#ifdef CONFIG_PPC_PSERIES
> +#ifdef CONFIG_PPC_SPLPAR
> +DECLARE_STATIC_KEY_FALSE(shared_processor);
>  #define vcpu_is_preempted vcpu_is_preempted
>  static inline bool vcpu_is_preempted(int cpu)
>  {
> -	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
> +	if (!static_branch_unlikely(&shared_processor))
>  		return false;
>  	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
>  }
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d21ddcc..ffb971f3a63c 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1568,9 +1568,13 @@ int prrn_is_enabled(void)
>  	return prrn_enabled;
>  }
>  
> +DEFINE_STATIC_KEY_FALSE(shared_processor);
> +EXPORT_SYMBOL_GPL(shared_processor);
> +
>  void __init shared_proc_topology_init(void)
>  {
>  	if (lppaca_shared_proc(get_lppaca())) {
> +		static_branch_enable(&shared_processor);
>  		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>  			    nr_cpumask_bits);
>  		numa_update_cpu_topology(false);
> -- 
> 2.18.1
>

This looks good to me, thanks Srikar.

Acked-by: Phil Auld <pauld@redhat.com>
-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor
  2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
@ 2019-12-05 13:51   ` Phil Auld
  2019-12-05 14:57   ` Waiman Long
  1 sibling, 0 replies; 8+ messages in thread
From: Phil Auld @ 2019-12-05 13:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Juri Lelli, Parth Shah, Ihor Pasichnyk, Waiman Long, linuxppc-dev

On Thu, Dec 05, 2019 at 02:02:18PM +0530 Srikar Dronamraju wrote:
> With the static key shared processor available, is_shared_processor()
> can return without having to query the lppaca structure.
> 
> Cc: Parth Shah <parth@linux.ibm.com>
> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog v1 (https://patchwork.ozlabs.org/patch/1204192/) ->v2:
> Now that we no more refer to lppaca, remove the comment.
> 
> Changelog v2->v3:
> Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
> This was suggested by Waiman Long.
> 
>  arch/powerpc/include/asm/spinlock.h | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index de817c25deff..e83d57f27566 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -111,13 +111,8 @@ static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
>  
>  static inline bool is_shared_processor(void)
>  {
> -/*
> - * LPPACA is only available on Pseries so guard anything LPPACA related to
> - * allow other platforms (which include this common header) to compile.
> - */
> -#ifdef CONFIG_PPC_PSERIES
> -	return (IS_ENABLED(CONFIG_PPC_SPLPAR) &&
> -		lppaca_shared_proc(local_paca->lppaca_ptr));
> +#ifdef CONFIG_PPC_SPLPAR
> +	return static_branch_unlikely(&shared_processor);
>  #else
>  	return false;
>  #endif
> -- 
> 2.18.1
> 

Fwiw,

Acked-by: Phil Auld <pauld@redhat.com>
-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor
  2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
  2019-12-05 13:51   ` Phil Auld
@ 2019-12-05 14:57   ` Waiman Long
  1 sibling, 0 replies; 8+ messages in thread
From: Waiman Long @ 2019-12-05 14:57 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Juri Lelli, Parth Shah, Phil Auld, linuxppc-dev, Ihor Pasichnyk

On 12/5/19 3:32 AM, Srikar Dronamraju wrote:
> With the static key shared processor available, is_shared_processor()
> can return without having to query the lppaca structure.
>
> Cc: Parth Shah <parth@linux.ibm.com>
> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog v1 (https://patchwork.ozlabs.org/patch/1204192/) ->v2:
> Now that we no more refer to lppaca, remove the comment.
>
> Changelog v2->v3:
> Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
> This was suggested by Waiman Long.
>
>  arch/powerpc/include/asm/spinlock.h | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index de817c25deff..e83d57f27566 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -111,13 +111,8 @@ static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
>  
>  static inline bool is_shared_processor(void)
>  {
> -/*
> - * LPPACA is only available on Pseries so guard anything LPPACA related to
> - * allow other platforms (which include this common header) to compile.
> - */
> -#ifdef CONFIG_PPC_PSERIES
> -	return (IS_ENABLED(CONFIG_PPC_SPLPAR) &&
> -		lppaca_shared_proc(local_paca->lppaca_ptr));
> +#ifdef CONFIG_PPC_SPLPAR
> +	return static_branch_unlikely(&shared_processor);
>  #else
>  	return false;
>  #endif

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
  2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
  2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
  2019-12-05 13:48 ` [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Phil Auld
@ 2019-12-06  9:34 ` Vaidyanathan Srinivasan
  2019-12-09  8:26 ` Parth Shah
  2019-12-11 14:52 ` Waiman Long
  4 siblings, 0 replies; 8+ messages in thread
From: Vaidyanathan Srinivasan @ 2019-12-06  9:34 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Juri Lelli, Gautham R . Shenoy, Phil Auld, Parth Shah,
	Ihor Pasichnyk, Waiman Long, linuxppc-dev

* Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2019-12-05 14:02:17]:

> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
> 
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
> 
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
> 
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):           16
> NUMA node(s):        2
> Model:               2.2 (pvr 004e 0202)
> Model name:          POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
> 
>   # perf stat -a -r 5 ./schbench
> v5.4                                     v5.4 + patch
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 39
> 	75.0000th: 62                    	75.0000th: 53
> 	90.0000th: 71                    	90.0000th: 67
> 	95.0000th: 77                    	95.0000th: 76
> 	*99.0000th: 91                   	*99.0000th: 89
> 	99.5000th: 707                   	99.5000th: 93
> 	99.9000th: 6920                  	99.9000th: 118
> 	min=0, max=10048                 	min=0, max=211
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 79                    	95.0000th: 56
> 	*99.0000th: 691                  	*99.0000th: 61
> 	99.5000th: 3972                  	99.5000th: 63
> 	99.9000th: 8368                  	99.9000th: 78
> 	min=0, max=16606                 	min=0, max=228
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 71                    	90.0000th: 53
> 	95.0000th: 77                    	95.0000th: 57
> 	*99.0000th: 106                  	*99.0000th: 63
> 	99.5000th: 2364                  	99.5000th: 68
> 	99.9000th: 7480                  	99.9000th: 100
> 	min=0, max=10001                 	min=0, max=134
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 62                    	75.0000th: 46
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 78                    	95.0000th: 56
> 	*99.0000th: 93                   	*99.0000th: 61
> 	99.5000th: 108                   	99.5000th: 64
> 	99.9000th: 6792                  	99.9000th: 85
> 	min=0, max=17681                 	min=0, max=121
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 46                    	50.0000th: 33
> 	75.0000th: 62                    	75.0000th: 44
> 	90.0000th: 73                    	90.0000th: 51
> 	95.0000th: 79                    	95.0000th: 54
> 	*99.0000th: 113                  	*99.0000th: 61
> 	99.5000th: 2724                  	99.5000th: 64
> 	99.9000th: 6184                  	99.9000th: 82
> 	min=0, max=9887                  	min=0, max=121
> 
>  Performance counter stats for 'system wide' (5 runs):
> 
> context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
> cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
> page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
> 
> Waiman Long suggested using static_keys.
> 
> Reported-by: Parth Shah <parth@linux.ibm.com>
> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Parth Shah <parth@linux.ibm.com>
> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Tested-by: Juri Lelli <juri.lelli@redhat.com>
> Ack-by: Waiman Long <longman@redhat.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog v1 (https://patchwork.ozlabs.org/patch/1204190/) ->v3:
> Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
> This was suggested by Waiman Long.
> 
>  arch/powerpc/include/asm/spinlock.h | 5 +++--
>  arch/powerpc/mm/numa.c              | 4 ++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index e9a960e28f3c..de817c25deff 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -35,11 +35,12 @@
>  #define LOCK_TOKEN	1
>  #endif
>  
> -#ifdef CONFIG_PPC_PSERIES
> +#ifdef CONFIG_PPC_SPLPAR
> +DECLARE_STATIC_KEY_FALSE(shared_processor);
>  #define vcpu_is_preempted vcpu_is_preempted
>  static inline bool vcpu_is_preempted(int cpu)
>  {
> -	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
> +	if (!static_branch_unlikely(&shared_processor))
>  		return false;
>  	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);

This condition check resolves the scheduler task wakeup latency
regression.  Make a static key is better in the fast path.

>  }
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d21ddcc..ffb971f3a63c 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1568,9 +1568,13 @@ int prrn_is_enabled(void)
>  	return prrn_enabled;
>  }
>  
> +DEFINE_STATIC_KEY_FALSE(shared_processor);
> +EXPORT_SYMBOL_GPL(shared_processor);
> +
>  void __init shared_proc_topology_init(void)
>  {
>  	if (lppaca_shared_proc(get_lppaca())) {
> +		static_branch_enable(&shared_processor);
>  		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>  			    nr_cpumask_bits);
>  		numa_update_cpu_topology(false);


Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>

Thanks Srikar for the fix.

--Vaidy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
  2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2019-12-06  9:34 ` Vaidyanathan Srinivasan
@ 2019-12-09  8:26 ` Parth Shah
  2019-12-11 14:52 ` Waiman Long
  4 siblings, 0 replies; 8+ messages in thread
From: Parth Shah @ 2019-12-09  8:26 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Juri Lelli, Gautham R . Shenoy, Phil Auld, Ihor Pasichnyk,
	Waiman Long, linuxppc-dev

Hi,

On 12/5/19 2:02 PM, Srikar Dronamraju wrote:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
> 
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
> 
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
> 
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):           16
> NUMA node(s):        2
> Model:               2.2 (pvr 004e 0202)
> Model name:          POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
> 
>   # perf stat -a -r 5 ./schbench
> v5.4                                     v5.4 + patch
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 39
> 	75.0000th: 62                    	75.0000th: 53
> 	90.0000th: 71                    	90.0000th: 67
> 	95.0000th: 77                    	95.0000th: 76
> 	*99.0000th: 91                   	*99.0000th: 89
> 	99.5000th: 707                   	99.5000th: 93
> 	99.9000th: 6920                  	99.9000th: 118
> 	min=0, max=10048                 	min=0, max=211
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 79                    	95.0000th: 56
> 	*99.0000th: 691                  	*99.0000th: 61
> 	99.5000th: 3972                  	99.5000th: 63
> 	99.9000th: 8368                  	99.9000th: 78
> 	min=0, max=16606                 	min=0, max=228
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 71                    	90.0000th: 53
> 	95.0000th: 77                    	95.0000th: 57
> 	*99.0000th: 106                  	*99.0000th: 63
> 	99.5000th: 2364                  	99.5000th: 68
> 	99.9000th: 7480                  	99.9000th: 100
> 	min=0, max=10001                 	min=0, max=134
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 62                    	75.0000th: 46
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 78                    	95.0000th: 56
> 	*99.0000th: 93                   	*99.0000th: 61
> 	99.5000th: 108                   	99.5000th: 64
> 	99.9000th: 6792                  	99.9000th: 85
> 	min=0, max=17681                 	min=0, max=121
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 46                    	50.0000th: 33
> 	75.0000th: 62                    	75.0000th: 44
> 	90.0000th: 73                    	90.0000th: 51
> 	95.0000th: 79                    	95.0000th: 54
> 	*99.0000th: 113                  	*99.0000th: 61
> 	99.5000th: 2724                  	99.5000th: 64
> 	99.9000th: 6184                  	99.9000th: 82
> 	min=0, max=9887                  	min=0, max=121
> 
>  Performance counter stats for 'system wide' (5 runs):
> 
> context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
> cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
> page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
> 
> Waiman Long suggested using static_keys.
> 
> Reported-by: Parth Shah <parth@linux.ibm.com>
> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Parth Shah <parth@linux.ibm.com>
> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Tested-by: Juri Lelli <juri.lelli@redhat.com>
> Ack-by: Waiman Long <longman@redhat.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> Changelog v1 (https://patchwork.ozlabs.org/patch/1204190/) ->v3:
> Code is now under CONFIG_PPC_SPLPAR as it depends on CONFIG_PPC_PSERIES.
> This was suggested by Waiman Long.
> 
>  arch/powerpc/include/asm/spinlock.h | 5 +++--
>  arch/powerpc/mm/numa.c              | 4 ++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index e9a960e28f3c..de817c25deff 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -35,11 +35,12 @@
>  #define LOCK_TOKEN	1
>  #endif
>  
> -#ifdef CONFIG_PPC_PSERIES
> +#ifdef CONFIG_PPC_SPLPAR
> +DECLARE_STATIC_KEY_FALSE(shared_processor);
>  #define vcpu_is_preempted vcpu_is_preempted
>  static inline bool vcpu_is_preempted(int cpu)
>  {
> -	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
> +	if (!static_branch_unlikely(&shared_processor))
>  		return false;
>  	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
>  }
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d21ddcc..ffb971f3a63c 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1568,9 +1568,13 @@ int prrn_is_enabled(void)
>  	return prrn_enabled;
>  }
>  
> +DEFINE_STATIC_KEY_FALSE(shared_processor);
> +EXPORT_SYMBOL_GPL(shared_processor);
> +
>  void __init shared_proc_topology_init(void)
>  {
>  	if (lppaca_shared_proc(get_lppaca())) {
> +		static_branch_enable(&shared_processor);
>  		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>  			    nr_cpumask_bits);
>  		numa_update_cpu_topology(false);
> 

I have tested this patch on Guest KVM with following configuration:
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              88
On-line CPU(s) list: 0-87
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           88
NUMA node(s):        1
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
NUMA node0 CPU(s):   0-87

- Baseline kernel: v5.4

Setup:
=======
- 2 KVM guests: Both sharing same set of CPUs.
- First guest is idle
- Second guest executes 'schbench -r 30'
- Hypervisor details:
  - Architecture: POWER9
  - CPU(s): 88
  - Socket(s): 1
  - kernel: v5.4


Results:
===========
- schbench -r 30
+----------------+--------------+-----------------+
|   Latency %ile | v5.4         | v5.4 + patch    |
+================+==============+=================+
|          50    | 28 (+- 1)    | 22.625 (+- 0.7) |
+----------------+--------------+-----------------+
|          75    | 330 (+- 107) | 67 (+- 36)      |
+----------------+--------------+-----------------+
|          90    | 463 (+- 14)  | 447 (+- 5)      |
+----------------+--------------+-----------------+
|          95    | 486 (+- 27)  | 472 (+- 7.3)    |
+----------------+--------------+-----------------+
|          99    | 851 (+- 83)  | 709 (+- 77)     |
+----------------+--------------+-----------------+
|          99.5  | 884 (+- 51)  | 865 (+- 57)     |
+----------------+--------------+-----------------+
|          99.99 | 1038 (+- 72) | 961 (+- 36)     |
+----------------+--------------+-----------------+


Tested-by: Parth Shah <parth@linux.ibm.com>


Best,
Parth


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt
  2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
                   ` (3 preceding siblings ...)
  2019-12-09  8:26 ` Parth Shah
@ 2019-12-11 14:52 ` Waiman Long
  4 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2019-12-11 14:52 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Juri Lelli, Parth Shah, Phil Auld, Gautham R . Shenoy,
	Ihor Pasichnyk, linuxppc-dev

On 12/5/19 3:32 AM, Srikar Dronamraju wrote:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
>
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
>
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
>
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):           16
> NUMA node(s):        2
> Model:               2.2 (pvr 004e 0202)
> Model name:          POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
>
>   # perf stat -a -r 5 ./schbench
> v5.4                                     v5.4 + patch
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 39
> 	75.0000th: 62                    	75.0000th: 53
> 	90.0000th: 71                    	90.0000th: 67
> 	95.0000th: 77                    	95.0000th: 76
> 	*99.0000th: 91                   	*99.0000th: 89
> 	99.5000th: 707                   	99.5000th: 93
> 	99.9000th: 6920                  	99.9000th: 118
> 	min=0, max=10048                 	min=0, max=211
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 79                    	95.0000th: 56
> 	*99.0000th: 691                  	*99.0000th: 61
> 	99.5000th: 3972                  	99.5000th: 63
> 	99.9000th: 8368                  	99.9000th: 78
> 	min=0, max=16606                 	min=0, max=228
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 61                    	75.0000th: 45
> 	90.0000th: 71                    	90.0000th: 53
> 	95.0000th: 77                    	95.0000th: 57
> 	*99.0000th: 106                  	*99.0000th: 63
> 	99.5000th: 2364                  	99.5000th: 68
> 	99.9000th: 7480                  	99.9000th: 100
> 	min=0, max=10001                 	min=0, max=134
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 45                    	50.0000th: 34
> 	75.0000th: 62                    	75.0000th: 46
> 	90.0000th: 72                    	90.0000th: 53
> 	95.0000th: 78                    	95.0000th: 56
> 	*99.0000th: 93                   	*99.0000th: 61
> 	99.5000th: 108                   	99.5000th: 64
> 	99.9000th: 6792                  	99.9000th: 85
> 	min=0, max=17681                 	min=0, max=121
> Latency percentiles (usec)               Latency percentiles (usec)
> 	50.0000th: 46                    	50.0000th: 33
> 	75.0000th: 62                    	75.0000th: 44
> 	90.0000th: 73                    	90.0000th: 51
> 	95.0000th: 79                    	95.0000th: 54
> 	*99.0000th: 113                  	*99.0000th: 61
> 	99.5000th: 2724                  	99.5000th: 64
> 	99.9000th: 6184                  	99.9000th: 82
> 	min=0, max=9887                  	min=0, max=121
>
>  Performance counter stats for 'system wide' (5 runs):
>
> context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
> cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
> page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
>
> Waiman Long suggested using static_keys.

Since this patch is fixing a performance regression. Maybe we should add

Fixes: 41946c86876e ("locking/core, powerpc: Implement
vcpu_is_preempted(cpu)")

Cheers,
Longman



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-11 14:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05  8:32 [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Srikar Dronamraju
2019-12-05  8:32 ` [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor Srikar Dronamraju
2019-12-05 13:51   ` Phil Auld
2019-12-05 14:57   ` Waiman Long
2019-12-05 13:48 ` [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt Phil Auld
2019-12-06  9:34 ` Vaidyanathan Srinivasan
2019-12-09  8:26 ` Parth Shah
2019-12-11 14:52 ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.