All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/kernel: Avoid redundancies on giveup_all
@ 2017-06-22 20:27 Breno Leitao
  2017-06-23  6:03 ` Cyril Bur
  0 siblings, 1 reply; 3+ messages in thread
From: Breno Leitao @ 2017-06-22 20:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Breno Leitao, Gustavo Romero

Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
__giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
__giveup_altivec() again, in a redudant manner.

Other than giving up FP and Altivec, __giveup_vsx() also disables
MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
As VSX can not be enabled alone on MSR (without FP and/or VEC
enabled), this is also a redundancy. VSX is never enabled alone (without
FP and VEC) because every time VSX is enabled, as through load_up_vsx()
and restore_math(), FP is also enabled together.

This change improves giveup_all() in average just 3%, but since
giveup_all() is called very frequently, around 8x per CPU per second on
an idle machine, this change might show some noticeable improvement.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 2ad725ef4368..5d6af58270e6 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -494,10 +494,6 @@ void giveup_all(struct task_struct *tsk)
 	if (usermsr & MSR_VEC)
 		__giveup_altivec(tsk);
 #endif
-#ifdef CONFIG_VSX
-	if (usermsr & MSR_VSX)
-		__giveup_vsx(tsk);
-#endif
 #ifdef CONFIG_SPE
 	if (usermsr & MSR_SPE)
 		__giveup_spe(tsk);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all
  2017-06-22 20:27 [PATCH] powerpc/kernel: Avoid redundancies on giveup_all Breno Leitao
@ 2017-06-23  6:03 ` Cyril Bur
  2017-06-23 17:51   ` Breno Leitao
  0 siblings, 1 reply; 3+ messages in thread
From: Cyril Bur @ 2017-06-23  6:03 UTC (permalink / raw)
  To: Breno Leitao, linuxppc-dev; +Cc: Gustavo Romero

On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote:
> Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
> __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
> __giveup_altivec() again, in a redudant manner.
> 
> Other than giving up FP and Altivec, __giveup_vsx() also disables
> MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
> As VSX can not be enabled alone on MSR (without FP and/or VEC
> enabled), this is also a redundancy. VSX is never enabled alone (without
> FP and VEC) because every time VSX is enabled, as through load_up_vsx()
> and restore_math(), FP is also enabled together.
> 
> This change improves giveup_all() in average just 3%, but since
> giveup_all() is called very frequently, around 8x per CPU per second on
> an idle machine, this change might show some noticeable improvement.
> 

So I totally agree except this makes me quite nervous. I know we're
quite good at always disabling VSX when we disable FPU and ALTIVEC and
we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if
we ever get that wrong...

I'm more interested in how this improves giveup_all() performance by so
much, but then hardware often surprises - I guess that's the cost of a
function call.

Perhaps caching the thread.regs->msr isn't a good idea. If we could
branch over in the common case and but still have the call to the
function in case something goes horribly wrong?

> Signed-off-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/process.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 2ad725ef4368..5d6af58270e6 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -494,10 +494,6 @@ void giveup_all(struct task_struct *tsk)
>  	if (usermsr & MSR_VEC)
>  		__giveup_altivec(tsk);
>  #endif
> -#ifdef CONFIG_VSX
> -	if (usermsr & MSR_VSX)
> -		__giveup_vsx(tsk);
> -#endif
>  #ifdef CONFIG_SPE
>  	if (usermsr & MSR_SPE)
>  		__giveup_spe(tsk);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all
  2017-06-23  6:03 ` Cyril Bur
@ 2017-06-23 17:51   ` Breno Leitao
  0 siblings, 0 replies; 3+ messages in thread
From: Breno Leitao @ 2017-06-23 17:51 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, Gustavo Romero

Hi Cyril,

On Fri, Jun 23, 2017 at 04:03:12PM +1000, Cyril Bur wrote:
> On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote:
> > Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
> > __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
> > __giveup_altivec() again, in a redudant manner.
> > 
> > Other than giving up FP and Altivec, __giveup_vsx() also disables
> > MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
> > As VSX can not be enabled alone on MSR (without FP and/or VEC
> > enabled), this is also a redundancy. VSX is never enabled alone (without
> > FP and VEC) because every time VSX is enabled, as through load_up_vsx()
> > and restore_math(), FP is also enabled together.
> > 
> > This change improves giveup_all() in average just 3%, but since
> > giveup_all() is called very frequently, around 8x per CPU per second on
> > an idle machine, this change might show some noticeable improvement.
> > 
> 
> So I totally agree except this makes me quite nervous. I know we're
> quite good at always disabling VSX when we disable FPU and ALTIVEC and
> we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if
> we ever get that wrong...

Right, I understand your point, we can consider this code as a
'fallback' if we, somehow, forget to disable VSX when disabling
FPU/ALTIVEC. Good point.

> I'm more interested in how this improves giveup_all() performance by so
> much, but then hardware often surprises - I guess that's the cost of a
> function call.

I got this number using ftrace. I used the 'funcgraph' tracer with the
trace_options set to 'funcgraph-duration'. Then I set set_ftrace_filter with
giveup_all().

There is also a tool that helps with it if you wish. It uses the
exactly same mechanism I used but in a more automated way. The tool name
is funcgraph by Brendan.

https://github.com/brendangregg/perf-tools/blob/master/kernel/funcgraph

> Perhaps caching the thread.regs->msr isn't a good idea.

Yes, I looked at it, but it seems that the compiler is optimizing it, keeping
it at r30, and not saving in the memory/stack. This is the code being generated
here, where r9 contains the task pointer.

 usermsr = tsk->thread.regs->msr;
	c0000000000199c4:       08 01 c9 eb     ld r30,264(r9)

 if ((usermsr & msr_all_available) == 0)                                                                                                  
	c0000000000199c8:       60 5f 2a e9     ld r9,24416(r10)
	c0000000000199cc:       39 48 ca 7f     and.  r10,r30,r9
	c0000000000199d0:       20 00 82 40     bne c0000000000199f0 <giveup_all+0x60>

> If we could
> branch over in the common case and but still have the call to the
> function in case something goes horribly wrong?

Yes, we can revisit it on a future opportunity. Thanks for sharing your opinion.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-06-23 17:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-22 20:27 [PATCH] powerpc/kernel: Avoid redundancies on giveup_all Breno Leitao
2017-06-23  6:03 ` Cyril Bur
2017-06-23 17:51   ` Breno Leitao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.