* [PATCH] powerpc/64s: exception optimise MSR handling
@ 2016-09-15 9:04 Nicholas Piggin
2016-09-20 4:25 ` Michael Ellerman
2016-09-20 13:07 ` Michael Ellerman
0 siblings, 2 replies; 4+ messages in thread
From: Nicholas Piggin @ 2016-09-15 9:04 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: Nicholas Piggin
mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
know what state those bits are, so the kernel MSR does not need to be
loaded when modifying them.
mtmsrd is often in the critical execution path, so avoiding dependency
on even L1 load is noticable. On a POWER8 this saves about 3 cycles
from the syscall path, and possibly a few from other exception returns
(not measured).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/entry_64.S | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6b8bc0d..585b9ca 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -139,7 +139,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
#ifdef CONFIG_PPC_BOOK3E
wrteei 1
#else
- ld r11,PACAKMSR(r13)
+ li r11,MSR_RI
ori r11,r11,MSR_EE
mtmsrd r11,1
#endif /* CONFIG_PPC_BOOK3E */
@@ -195,7 +195,6 @@ system_call: /* label this so stack traces look sane */
#ifdef CONFIG_PPC_BOOK3E
wrteei 0
#else
- ld r10,PACAKMSR(r13)
/*
* For performance reasons we clear RI the same time that we
* clear EE. We only need to clear RI just before we restore r13
@@ -203,8 +202,7 @@ system_call: /* label this so stack traces look sane */
* We have to be careful to restore RI if we branch anywhere from
* here (eg syscall_exit_work).
*/
- li r9,MSR_RI
- andc r11,r10,r9
+ li r11,0
mtmsrd r11,1
#endif /* CONFIG_PPC_BOOK3E */
@@ -221,13 +219,12 @@ system_call: /* label this so stack traces look sane */
#endif
2: addi r3,r1,STACK_FRAME_OVERHEAD
#ifdef CONFIG_PPC_BOOK3S
+ li r10,MSR_RI
mtmsrd r10,1 /* Restore RI */
#endif
bl restore_math
#ifdef CONFIG_PPC_BOOK3S
- ld r10,PACAKMSR(r13)
- li r9,MSR_RI
- andc r11,r10,r9 /* Re-clear RI */
+ li r11,0
mtmsrd r11,1
#endif
ld r8,_MSR(r1)
@@ -308,6 +305,7 @@ syscall_enosys:
syscall_exit_work:
#ifdef CONFIG_PPC_BOOK3S
+ li r10,MSR_RI
mtmsrd r10,1 /* Restore RI */
#endif
/* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr.
@@ -354,7 +352,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
#ifdef CONFIG_PPC_BOOK3E
wrteei 1
#else
- ld r10,PACAKMSR(r13)
+ li r10,MSR_RI
ori r10,r10,MSR_EE
mtmsrd r10,1
#endif /* CONFIG_PPC_BOOK3E */
@@ -619,7 +617,7 @@ _GLOBAL(ret_from_except_lite)
#ifdef CONFIG_PPC_BOOK3E
wrteei 0
#else
- ld r10,PACAKMSR(r13) /* Get kernel MSR without EE */
+ li r10,MSR_RI
mtmsrd r10,1 /* Update machine state */
#endif /* CONFIG_PPC_BOOK3E */
@@ -751,7 +749,7 @@ resume_kernel:
#ifdef CONFIG_PPC_BOOK3E
wrteei 0
#else
- ld r10,PACAKMSR(r13) /* Get kernel MSR without EE */
+ li r10,MSR_RI
mtmsrd r10,1 /* Update machine state */
#endif /* CONFIG_PPC_BOOK3E */
#endif /* CONFIG_PREEMPT */
@@ -841,8 +839,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* userspace and we take an exception after restoring r13,
* we end up corrupting the userspace r13 value.
*/
- ld r4,PACAKMSR(r13) /* Get kernel MSR without EE */
- andc r4,r4,r0 /* r0 contains MSR_RI here */
+ li r4,0
mtmsrd r4,1
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
--
2.9.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/64s: exception optimise MSR handling
2016-09-15 9:04 [PATCH] powerpc/64s: exception optimise MSR handling Nicholas Piggin
@ 2016-09-20 4:25 ` Michael Ellerman
2016-09-20 5:32 ` Nicholas Piggin
2016-09-20 13:07 ` Michael Ellerman
1 sibling, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2016-09-20 4:25 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
Nicholas Piggin <npiggin@gmail.com> writes:
> mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
> know what state those bits are, so the kernel MSR does not need to be
> loaded when modifying them.
>
> mtmsrd is often in the critical execution path, so avoiding dependency
> on even L1 load is noticable. On a POWER8 this saves about 3 cycles
> from the syscall path, and possibly a few from other exception returns
> (not measured).
This looks good in principle.
My worry is that these code paths have lots of assumptions about what's
left in registers, so we may have a path somewhere which expects rX to
contain PACAKMSR. Hence the review below ...
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 6b8bc0d..585b9ca 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -139,7 +139,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> #ifdef CONFIG_PPC_BOOK3E
> wrteei 1
> #else
> - ld r11,PACAKMSR(r13)
> + li r11,MSR_RI
> ori r11,r11,MSR_EE
> mtmsrd r11,1
> #endif /* CONFIG_PPC_BOOK3E */
/* We do need to set SOFTE in the stack frame or the return
* from interrupt will be painful
*/
li r10,1
std r10,SOFTE(r1)
CURRENT_THREAD_INFO(r11, r1)
So that's one OK. r11 isn't used again until its clobbered here.
> @@ -195,7 +195,6 @@ system_call: /* label this so stack traces look sane */
#ifdef CONFIG_PPC_BOOK3S
/* No MSR:RI on BookE */
andi. r10,r8,MSR_RI
beq- unrecov_restore
#endif
So at this point r10 == MSR_RI, otherwise we would have taken the branch.
/*
* Disable interrupts so current_thread_info()->flags can't change,
* and so that we don't get interrupted after loading SRR0/1.
*/
> #ifdef CONFIG_PPC_BOOK3E
> wrteei 0
> #else
> - ld r10,PACAKMSR(r13)
> /*
> * For performance reasons we clear RI the same time that we
> * clear EE. We only need to clear RI just before we restore r13
> * below, but batching it with EE saves us one expensive mtmsrd call.
> * We have to be careful to restore RI if we branch anywhere from
> * here (eg syscall_exit_work).
> */
> - li r9,MSR_RI
> - andc r11,r10,r9
> + li r11,0
> mtmsrd r11,1
> #endif /* CONFIG_PPC_BOOK3E */
ld r9,TI_FLAGS(r12)
li r11,-MAX_ERRNO
andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne- syscall_exit_work
Which is:
syscall_exit_work:
#ifdef CONFIG_PPC_BOOK3S
mtmsrd r10,1 /* Restore RI */
#endif
So that appears to still work. But it's super fragile.
What I'd like to do is drop that optimisation of clearing RI early with
EE. That would avoid us needing to restore RI in syscall_exit_work and
before restore_math (and reclearing it after).
But I guess that will hurt performance too much :/
cheers
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/64s: exception optimise MSR handling
2016-09-20 4:25 ` Michael Ellerman
@ 2016-09-20 5:32 ` Nicholas Piggin
0 siblings, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2016-09-20 5:32 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
On Tue, 20 Sep 2016 14:25:48 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
> > know what state those bits are, so the kernel MSR does not need to be
> > loaded when modifying them.
> >
> > mtmsrd is often in the critical execution path, so avoiding dependency
> > on even L1 load is noticable. On a POWER8 this saves about 3 cycles
> > from the syscall path, and possibly a few from other exception returns
> > (not measured).
>
> This looks good in principle.
>
> My worry is that these code paths have lots of assumptions about what's
> left in registers, so we may have a path somewhere which expects rX to
> contain PACAKMSR. Hence the review below ...
>
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 6b8bc0d..585b9ca 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -139,7 +139,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> > #ifdef CONFIG_PPC_BOOK3E
> > wrteei 1
> > #else
> > - ld r11,PACAKMSR(r13)
> > + li r11,MSR_RI
> > ori r11,r11,MSR_EE
> > mtmsrd r11,1
> > #endif /* CONFIG_PPC_BOOK3E */
>
> /* We do need to set SOFTE in the stack frame or the return
> * from interrupt will be painful
> */
> li r10,1
> std r10,SOFTE(r1)
>
> CURRENT_THREAD_INFO(r11, r1)
>
> So that's one OK. r11 isn't used again until its clobbered here.
>
>
> > @@ -195,7 +195,6 @@ system_call: /* label this so stack traces look sane */
>
> #ifdef CONFIG_PPC_BOOK3S
> /* No MSR:RI on BookE */
> andi. r10,r8,MSR_RI
> beq- unrecov_restore
> #endif
>
> So at this point r10 == MSR_RI, otherwise we would have taken the branch.
>
> /*
> * Disable interrupts so current_thread_info()->flags can't change,
> * and so that we don't get interrupted after loading SRR0/1.
> */
> > #ifdef CONFIG_PPC_BOOK3E
> > wrteei 0
> > #else
> > - ld r10,PACAKMSR(r13)
> > /*
> > * For performance reasons we clear RI the same time that we
> > * clear EE. We only need to clear RI just before we restore r13
> > * below, but batching it with EE saves us one expensive mtmsrd call.
> > * We have to be careful to restore RI if we branch anywhere from
> > * here (eg syscall_exit_work).
> > */
> > - li r9,MSR_RI
> > - andc r11,r10,r9
> > + li r11,0
> > mtmsrd r11,1
> > #endif /* CONFIG_PPC_BOOK3E */
>
> ld r9,TI_FLAGS(r12)
> li r11,-MAX_ERRNO
> andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> bne- syscall_exit_work
>
> Which is:
>
> syscall_exit_work:
> #ifdef CONFIG_PPC_BOOK3S
> mtmsrd r10,1 /* Restore RI */
> #endif
>
> So that appears to still work. But it's super fragile.
Agreed. We'll go with the idea you mentioned offline to just load r10 again
here to avoid the long dependency -- it's not going to be a measurable cost.
> What I'd like to do is drop that optimisation of clearing RI early with
> EE. That would avoid us needing to restore RI in syscall_exit_work and
> before restore_math (and reclearing it after).
>
> But I guess that will hurt performance too much :/
Yes that's something to look into. Newer cores, more kernel code using fp
registers. I'll look into it.
Thanks,
Nick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: powerpc/64s: exception optimise MSR handling
2016-09-15 9:04 [PATCH] powerpc/64s: exception optimise MSR handling Nicholas Piggin
2016-09-20 4:25 ` Michael Ellerman
@ 2016-09-20 13:07 ` Michael Ellerman
1 sibling, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2016-09-20 13:07 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
On Thu, 2016-15-09 at 09:04:46 UTC, Nicholas Piggin wrote:
> mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
> know what state those bits are, so the kernel MSR does not need to be
> loaded when modifying them.
>
> mtmsrd is often in the critical execution path, so avoiding dependency
> on even L1 load is noticable. On a POWER8 this saves about 3 cycles
> from the syscall path, and possibly a few from other exception returns
> (not measured).
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/49d09bf2a66f4b5a6eabefb0d4
cheers
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-09-20 13:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15 9:04 [PATCH] powerpc/64s: exception optimise MSR handling Nicholas Piggin
2016-09-20 4:25 ` Michael Ellerman
2016-09-20 5:32 ` Nicholas Piggin
2016-09-20 13:07 ` Michael Ellerman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.