All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()
@ 2016-05-29 12:03 Anton Blanchard
  2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Anton Blanchard @ 2016-05-29 12:03 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

From: Anton Blanchard <anton@samba.org>

In both __giveup_fpu() and __giveup_altivec() we make two modifications
to tsk->thread.regs->msr. gcc decides to do a read/modify/write of
each change, so we end up with a load hit store:

        ld      r9,264(r10)
        rldicl  r9,r9,50,1
        rotldi  r9,r9,14
        std     r9,264(r10)
...
        ld      r9,264(r10)
        rldicl  r9,r9,40,1
        rotldi  r9,r9,24
        std     r9,264(r10)

Fix this by using a temporary.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/kernel/process.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e2f12cb..7da1f92 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -139,12 +139,16 @@ EXPORT_SYMBOL(__msr_check_and_clear);
 #ifdef CONFIG_PPC_FPU
 void __giveup_fpu(struct task_struct *tsk)
 {
+	u64 msr;
+
 	save_fpu(tsk);
-	tsk->thread.regs->msr &= ~MSR_FP;
+	msr = tsk->thread.regs->msr;
+	msr &= ~MSR_FP;
 #ifdef CONFIG_VSX
 	if (cpu_has_feature(CPU_FTR_VSX))
-		tsk->thread.regs->msr &= ~MSR_VSX;
+		msr &= ~MSR_VSX;
 #endif
+	tsk->thread.regs->msr = msr;
 }
 
 void giveup_fpu(struct task_struct *tsk)
@@ -219,12 +223,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; }
 
 static void __giveup_altivec(struct task_struct *tsk)
 {
+	u64 msr;
+
 	save_altivec(tsk);
-	tsk->thread.regs->msr &= ~MSR_VEC;
+	msr = tsk->thread.regs->msr;
+	msr &= ~MSR_VEC;
 #ifdef CONFIG_VSX
 	if (cpu_has_feature(CPU_FTR_VSX))
-		tsk->thread.regs->msr &= ~MSR_VSX;
+		msr &= ~MSR_VSX;
 #endif
+	tsk->thread.regs->msr = msr;
 }
 
 void giveup_altivec(struct task_struct *tsk)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()
  2016-05-29 12:03 [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Anton Blanchard
@ 2016-05-29 12:03 ` Anton Blanchard
  2016-05-29 23:14   ` Michael Neuling
  2016-06-15 12:39   ` [2/3] " Michael Ellerman
  2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Anton Blanchard @ 2016-05-29 12:03 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

From: Anton Blanchard <anton@samba.org>

In setup_sigcontext(), we set current->thread.vrsave then use it
straight after. Since current is hidden from the compiler via inline
assembly, it cannot optimise this and we end up with a load hit store.

Fix this by using a temporary.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/kernel/signal_64.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 2552079..7e49984 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -104,6 +104,7 @@ static long setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs,
 	 */
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc);
+	unsigned long vrsave;
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
@@ -125,9 +126,13 @@ static long setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs,
 	/* We always copy to/from vrsave, it's 0 if we don't have or don't
 	 * use altivec.
 	 */
-	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		current->thread.vrsave = mfspr(SPRN_VRSAVE);
-	err |= __put_user(current->thread.vrsave, (u32 __user *)&v_regs[33]);
+	vrsave = 0;
+	if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+		vrsave = mfspr(SPRN_VRSAVE);
+		current->thread.vrsave = vrsave;
+	}
+
+	err |= __put_user(vrsave, (u32 __user *)&v_regs[33]);
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-29 12:03 [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Anton Blanchard
  2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
@ 2016-05-29 12:03 ` Anton Blanchard
  2016-05-30  5:17   ` Aneesh Kumar K.V
                     ` (2 more replies)
  2016-05-30  8:15 ` [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Gabriel Paubert
  2016-06-15 12:39 ` [1/3] " Michael Ellerman
  3 siblings, 3 replies; 17+ messages in thread
From: Anton Blanchard @ 2016-05-29 12:03 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

From: Anton Blanchard <anton@samba.org>

In many cases we disable interrupts right before calling
find_linux_pte_or_hugepte().

find_linux_pte_or_hugepte() first checks interrupts are disabled
before calling __find_linux_pte_or_hugepte():

        if (!arch_irqs_disabled()) {
                pr_info("%s called with irq enabled\n", __func__);
                dump_stack();
        }
        return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);

We know interrupts are disabled, but since the arch_irqs_*() macros
are hidden from the compiler with inline assembly, gcc does not. We
end up with a pretty awful load hit store:

	li      r9,0
	lbz     r24,570(r13)
	stb     r9,570(r13)	<----
	lbz     r9,570(r13)	<---- ouch
	cmpdi   cr7,r9,0
	bne     cr7,c000000000049d30

Find these cases, and call __find_linux_pte_or_hugepte() directly.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 arch/powerpc/kvm/e500_mmu_host.c    | 2 +-
 arch/powerpc/mm/hash_utils_64.c     | 2 +-
 arch/powerpc/mm/hugetlbpage.c       | 2 +-
 arch/powerpc/mm/tlb_hash64.c        | 5 +++--
 arch/powerpc/perf/callchain.c       | 2 +-
 7 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 05f09ae..ff53db5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -543,7 +543,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			 * hugepage split and collapse.
 			 */
 			local_irq_save(flags);
-			ptep = find_linux_pte_or_hugepte(current->mm->pgd,
+			ptep = __find_linux_pte_or_hugepte(current->mm->pgd,
 							 hva, NULL, NULL);
 			if (ptep) {
 				pte = kvmppc_read_update_linux_pte(ptep, 1);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 99b4e9d..8ee1f49 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -225,7 +225,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 						   &hpage_shift);
 	else {
 		local_irq_save(irq_flags);
-		ptep = find_linux_pte_or_hugepte(pgdir, hva, NULL,
+		ptep = __find_linux_pte_or_hugepte(pgdir, hva, NULL,
 						 &hpage_shift);
 	}
 	if (ptep) {
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index b0333cc..b6487f5 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -476,7 +476,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	 * can't run hence pfn won't change.
 	 */
 	local_irq_save(flags);
-	ptep = find_linux_pte_or_hugepte(pgdir, hva, NULL, NULL);
+	ptep = __find_linux_pte_or_hugepte(pgdir, hva, NULL, NULL);
 	if (ptep) {
 		pte_t pte = READ_ONCE(*ptep);
 
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5926896..5e47caa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1384,7 +1384,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
 	 * THP pages use update_mmu_cache_pmd. We don't do
 	 * hash preload there. Hence can ignore THP here
 	 */
-	ptep = find_linux_pte_or_hugepte(pgdir, ea, NULL, &hugepage_shift);
+	ptep = __find_linux_pte_or_hugepte(pgdir, ea, NULL, &hugepage_shift);
 	if (!ptep)
 		goto out_exit;
 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 5aac1a3..ee8bc5b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -637,7 +637,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
 	struct page *page = ERR_PTR(-EINVAL);
 
 	local_irq_save(flags);
-	ptep = find_linux_pte_or_hugepte(mm->pgd, address, &is_thp, &shift);
+	ptep = __find_linux_pte_or_hugepte(mm->pgd, address, &is_thp, &shift);
 	if (!ptep)
 		goto no_page;
 	pte = READ_ONCE(*ptep);
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index 4517aa4..f41bf3d 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -209,8 +209,9 @@ void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
 	local_irq_save(flags);
 	arch_enter_lazy_mmu_mode();
 	for (; start < end; start += PAGE_SIZE) {
-		pte_t *ptep = find_linux_pte_or_hugepte(mm->pgd, start, &is_thp,
-							&hugepage_shift);
+		pte_t *ptep = __find_linux_pte_or_hugepte(mm->pgd, start,
+							  &is_thp,
+							  &hugepage_shift);
 		unsigned long pte;
 
 		if (ptep == NULL)
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 0fc2671..f4d1d88 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -127,7 +127,7 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 		return -EFAULT;
 
 	local_irq_save(flags);
-	ptep = find_linux_pte_or_hugepte(pgdir, addr, NULL, &shift);
+	ptep = __find_linux_pte_or_hugepte(pgdir, addr, NULL, &shift);
 	if (!ptep)
 		goto err_out;
 	if (!shift)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()
  2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
@ 2016-05-29 23:14   ` Michael Neuling
  2016-05-29 23:23     ` Anton Blanchard
  2016-06-15 12:39   ` [2/3] " Michael Ellerman
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Neuling @ 2016-05-29 23:14 UTC (permalink / raw)
  To: Anton Blanchard, mpe, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
>=20
> In setup_sigcontext(), we set current->thread.vrsave then use it
> straight after. Since current is hidden from the compiler via inline
> assembly, it cannot optimise this and we end up with a load hit store.

Is setup_sigcontext() really a fast path we need to optimise?

Mikey

> Fix this by using a temporary.
>=20
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> =C2=A0arch/powerpc/kernel/signal_64.c | 11 ++++++++---
> =C2=A01 file changed, 8 insertions(+), 3 deletions(-)
>=20
> diff --git a/arch/powerpc/kernel/signal_64.c
> b/arch/powerpc/kernel/signal_64.c
> index 2552079..7e49984 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -104,6 +104,7 @@ static long setup_sigcontext(struct sigcontext __user
> *sc, struct pt_regs *regs,
> =C2=A0	=C2=A0*/
> =C2=A0#ifdef CONFIG_ALTIVEC
> =C2=A0	elf_vrreg_t __user *v_regs =3D sigcontext_vmx_regs(sc);
> +	unsigned long vrsave;
> =C2=A0#endif
> =C2=A0	unsigned long msr =3D regs->msr;
> =C2=A0	long err =3D 0;
> @@ -125,9 +126,13 @@ static long setup_sigcontext(struct sigcontext
> __user *sc, struct pt_regs *regs,
> =C2=A0	/* We always copy to/from vrsave, it's 0 if we don't have or
> don't
> =C2=A0	=C2=A0* use altivec.
> =C2=A0	=C2=A0*/
> -	if (cpu_has_feature(CPU_FTR_ALTIVEC))
> -		current->thread.vrsave =3D mfspr(SPRN_VRSAVE);
> -	err |=3D __put_user(current->thread.vrsave, (u32 __user
> *)&v_regs[33]);
> +	vrsave =3D 0;
> +	if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
> +		vrsave =3D mfspr(SPRN_VRSAVE);
> +		current->thread.vrsave =3D vrsave;
> +	}
> +
> +	err |=3D __put_user(vrsave, (u32 __user *)&v_regs[33]);
> =C2=A0#else /* CONFIG_ALTIVEC */
> =C2=A0	err |=3D __put_user(0, &sc->v_regs);
> =C2=A0#endif /* CONFIG_ALTIVEC */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext()
  2016-05-29 23:14   ` Michael Neuling
@ 2016-05-29 23:23     ` Anton Blanchard
  0 siblings, 0 replies; 17+ messages in thread
From: Anton Blanchard @ 2016-05-29 23:23 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Anton Blanchard, mpe, benh, paulus, aneesh.kumar, acsawdey, linuxppc-dev

Hi,

> On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:
> > From: Anton Blanchard <anton@samba.org>
> > 
> > In setup_sigcontext(), we set current->thread.vrsave then use it
> > straight after. Since current is hidden from the compiler via inline
> > assembly, it cannot optimise this and we end up with a load hit
> > store.  
> 
> Is setup_sigcontext() really a fast path we need to optimise?

setup_sigcontext() is called for every signal we deliver. I'd like
userspace to use less signals, but we still stumble across workloads
with quite heavy use of them.

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
@ 2016-05-30  5:17   ` Aneesh Kumar K.V
  2016-05-30 23:29   ` Michael Ellerman
  2016-06-01  3:43   ` [3/3] " Michael Ellerman
  2 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-05-30  5:17 UTC (permalink / raw)
  To: Anton Blanchard, mpe, benh, paulus, acsawdey; +Cc: linuxppc-dev

Anton Blanchard <anton@ozlabs.org> writes:

> From: Anton Blanchard <anton@samba.org>
>
> In many cases we disable interrupts right before calling
> find_linux_pte_or_hugepte().
>
> find_linux_pte_or_hugepte() first checks interrupts are disabled
> before calling __find_linux_pte_or_hugepte():
>
>         if (!arch_irqs_disabled()) {
>                 pr_info("%s called with irq enabled\n", __func__);
>                 dump_stack();
>         }
>         return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);
>
> We know interrupts are disabled, but since the arch_irqs_*() macros
> are hidden from the compiler with inline assembly, gcc does not. We
> end up with a pretty awful load hit store:
>
> 	li      r9,0
> 	lbz     r24,570(r13)
> 	stb     r9,570(r13)	<----
> 	lbz     r9,570(r13)	<---- ouch
> 	cmpdi   cr7,r9,0
> 	bne     cr7,c000000000049d30
>
> Find these cases, and call __find_linux_pte_or_hugepte() directly.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
>  arch/powerpc/kvm/e500_mmu_host.c    | 2 +-
>  arch/powerpc/mm/hash_utils_64.c     | 2 +-
>  arch/powerpc/mm/hugetlbpage.c       | 2 +-
>  arch/powerpc/mm/tlb_hash64.c        | 5 +++--
>  arch/powerpc/perf/callchain.c       | 2 +-
>  7 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 05f09ae..ff53db5 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -543,7 +543,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  			 * hugepage split and collapse.
>  			 */
>  			local_irq_save(flags);
> -			ptep = find_linux_pte_or_hugepte(current->mm->pgd,
> +			ptep = __find_linux_pte_or_hugepte(current->mm->pgd,
>  							 hva, NULL, NULL);
>  			if (ptep) {
>  				pte = kvmppc_read_update_linux_pte(ptep, 1);
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 99b4e9d..8ee1f49 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -225,7 +225,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
>  						   &hpage_shift);
>  	else {
>  		local_irq_save(irq_flags);
> -		ptep = find_linux_pte_or_hugepte(pgdir, hva, NULL,
> +		ptep = __find_linux_pte_or_hugepte(pgdir, hva, NULL,
>  						 &hpage_shift);
>  	}
>  	if (ptep) {
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index b0333cc..b6487f5 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -476,7 +476,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  	 * can't run hence pfn won't change.
>  	 */
>  	local_irq_save(flags);
> -	ptep = find_linux_pte_or_hugepte(pgdir, hva, NULL, NULL);
> +	ptep = __find_linux_pte_or_hugepte(pgdir, hva, NULL, NULL);
>  	if (ptep) {
>  		pte_t pte = READ_ONCE(*ptep);
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 5926896..5e47caa 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -1384,7 +1384,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
>  	 * THP pages use update_mmu_cache_pmd. We don't do
>  	 * hash preload there. Hence can ignore THP here
>  	 */
> -	ptep = find_linux_pte_or_hugepte(pgdir, ea, NULL, &hugepage_shift);
> +	ptep = __find_linux_pte_or_hugepte(pgdir, ea, NULL, &hugepage_shift);
>  	if (!ptep)
>  		goto out_exit;
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 5aac1a3..ee8bc5b 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -637,7 +637,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
>  	struct page *page = ERR_PTR(-EINVAL);
>
>  	local_irq_save(flags);
> -	ptep = find_linux_pte_or_hugepte(mm->pgd, address, &is_thp, &shift);
> +	ptep = __find_linux_pte_or_hugepte(mm->pgd, address, &is_thp, &shift);
>  	if (!ptep)
>  		goto no_page;
>  	pte = READ_ONCE(*ptep);
> diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
> index 4517aa4..f41bf3d 100644
> --- a/arch/powerpc/mm/tlb_hash64.c
> +++ b/arch/powerpc/mm/tlb_hash64.c
> @@ -209,8 +209,9 @@ void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
>  	local_irq_save(flags);
>  	arch_enter_lazy_mmu_mode();
>  	for (; start < end; start += PAGE_SIZE) {
> -		pte_t *ptep = find_linux_pte_or_hugepte(mm->pgd, start, &is_thp,
> -							&hugepage_shift);
> +		pte_t *ptep = __find_linux_pte_or_hugepte(mm->pgd, start,
> +							  &is_thp,
> +							  &hugepage_shift);
>  		unsigned long pte;
>
>  		if (ptep == NULL)
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 0fc2671..f4d1d88 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -127,7 +127,7 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  		return -EFAULT;
>
>  	local_irq_save(flags);
> -	ptep = find_linux_pte_or_hugepte(pgdir, addr, NULL, &shift);
> +	ptep = __find_linux_pte_or_hugepte(pgdir, addr, NULL, &shift);
>  	if (!ptep)
>  		goto err_out;
>  	if (!shift)
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  2016-05-29 12:03 [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Anton Blanchard
  2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
  2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
@ 2016-05-30  8:15 ` Gabriel Paubert
  2016-05-31 10:09   ` Anton Blanchard
  2016-06-15 12:39 ` [1/3] " Michael Ellerman
  3 siblings, 1 reply; 17+ messages in thread
From: Gabriel Paubert @ 2016-05-30  8:15 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: mpe, benh, paulus, aneesh.kumar, acsawdey, linuxppc-dev

On Sun, May 29, 2016 at 10:03:50PM +1000, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> In both __giveup_fpu() and __giveup_altivec() we make two modifications
> to tsk->thread.regs->msr. gcc decides to do a read/modify/write of
> each change, so we end up with a load hit store:
> 
>         ld      r9,264(r10)
>         rldicl  r9,r9,50,1
>         rotldi  r9,r9,14
>         std     r9,264(r10)
> ...
>         ld      r9,264(r10)
>         rldicl  r9,r9,40,1
>         rotldi  r9,r9,24
>         std     r9,264(r10)
> 
> Fix this by using a temporary.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>  arch/powerpc/kernel/process.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index e2f12cb..7da1f92 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -139,12 +139,16 @@ EXPORT_SYMBOL(__msr_check_and_clear);
>  #ifdef CONFIG_PPC_FPU
>  void __giveup_fpu(struct task_struct *tsk)
>  {
> +	u64 msr;

Huh? Make it an unsigned long please, which is the type of the msr
field in struct pt_regs to work on both 32 and 64 bit processors.

> +
>  	save_fpu(tsk);
> -	tsk->thread.regs->msr &= ~MSR_FP;
> +	msr = tsk->thread.regs->msr;
> +	msr &= ~MSR_FP;
>  #ifdef CONFIG_VSX
>  	if (cpu_has_feature(CPU_FTR_VSX))
> -		tsk->thread.regs->msr &= ~MSR_VSX;
> +		msr &= ~MSR_VSX;
>  #endif
> +	tsk->thread.regs->msr = msr;
>  }
>  
>  void giveup_fpu(struct task_struct *tsk)
> @@ -219,12 +223,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; }
>  
>  static void __giveup_altivec(struct task_struct *tsk)
>  {
> +	u64 msr;
> +

Ditto.

>  	save_altivec(tsk);
> -	tsk->thread.regs->msr &= ~MSR_VEC;
> +	msr = tsk->thread.regs->msr;
> +	msr &= ~MSR_VEC;
>  #ifdef CONFIG_VSX
>  	if (cpu_has_feature(CPU_FTR_VSX))
> -		tsk->thread.regs->msr &= ~MSR_VSX;
> +		msr &= ~MSR_VSX;
>  #endif
> +	tsk->thread.regs->msr = msr;
>  }
>  
>  void giveup_altivec(struct task_struct *tsk)
> -- 
> 2.7.4
> 

    Gabriel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
  2016-05-30  5:17   ` Aneesh Kumar K.V
@ 2016-05-30 23:29   ` Michael Ellerman
  2016-05-31  3:29     ` Aneesh Kumar K.V
  2016-06-01  3:43   ` [3/3] " Michael Ellerman
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2016-05-30 23:29 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:

> From: Anton Blanchard <anton@samba.org>
> 
> In many cases we disable interrupts right before calling
> find_linux_pte_or_hugepte().
> 
> find_linux_pte_or_hugepte() first checks interrupts are disabled
> before calling __find_linux_pte_or_hugepte():
> 
>         if (!arch_irqs_disabled()) {
>                 pr_info("%s called with irq enabled\n", __func__);
>                 dump_stack();
>         }

Should that be a VM_WARN_ON()?

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-30 23:29   ` Michael Ellerman
@ 2016-05-31  3:29     ` Aneesh Kumar K.V
  2016-05-31  4:18       ` Michael Ellerman
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-05-31  3:29 UTC (permalink / raw)
  To: Michael Ellerman, Anton Blanchard, benh, paulus, acsawdey; +Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:

> On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:
>
>> From: Anton Blanchard <anton@samba.org>
>> 
>> In many cases we disable interrupts right before calling
>> find_linux_pte_or_hugepte().
>> 
>> find_linux_pte_or_hugepte() first checks interrupts are disabled
>> before calling __find_linux_pte_or_hugepte():
>> 
>>         if (!arch_irqs_disabled()) {
>>                 pr_info("%s called with irq enabled\n", __func__);
>>                 dump_stack();
>>         }
>
> Should that be a VM_WARN_ON()?
>

VM_WARN_ON() put them inside CONFIG_DEBUG_VM. We should be able to use
WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", __func__);

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-31  3:29     ` Aneesh Kumar K.V
@ 2016-05-31  4:18       ` Michael Ellerman
  2016-05-31  6:52         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2016-05-31  4:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Anton Blanchard, benh, paulus, acsawdey; +Cc: linuxppc-dev

On Tue, 2016-05-31 at 08:59 +0530, Aneesh Kumar K.V wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:
> > > From: Anton Blanchard <anton@samba.org>
> > > 
> > > In many cases we disable interrupts right before calling
> > > find_linux_pte_or_hugepte().
> > > 
> > > find_linux_pte_or_hugepte() first checks interrupts are disabled
> > > before calling __find_linux_pte_or_hugepte():
> > > 
> > >         if (!arch_irqs_disabled()) {
> > >                 pr_info("%s called with irq enabled\n", __func__);
> > >                 dump_stack();
> > >         }
> > 
> > Should that be a VM_WARN_ON()?
> > 
> 
> VM_WARN_ON() put them inside CONFIG_DEBUG_VM. We should be able to use

Yeah that was my point. Does this still need to be an always-on check, or can
we hide it behind CONFIG_DEBUG_VM ?

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-31  4:18       ` Michael Ellerman
@ 2016-05-31  6:52         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2016-05-31  6:52 UTC (permalink / raw)
  To: Michael Ellerman, Anton Blanchard, benh, paulus, acsawdey; +Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:

> On Tue, 2016-05-31 at 08:59 +0530, Aneesh Kumar K.V wrote:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> > On Sun, 2016-05-29 at 22:03 +1000, Anton Blanchard wrote:
>> > > From: Anton Blanchard <anton@samba.org>
>> > > 
>> > > In many cases we disable interrupts right before calling
>> > > find_linux_pte_or_hugepte().
>> > > 
>> > > find_linux_pte_or_hugepte() first checks interrupts are disabled
>> > > before calling __find_linux_pte_or_hugepte():
>> > > 
>> > >         if (!arch_irqs_disabled()) {
>> > >                 pr_info("%s called with irq enabled\n", __func__);
>> > >                 dump_stack();
>> > >         }
>> > 
>> > Should that be a VM_WARN_ON()?
>> > 
>> 
>> VM_WARN_ON() put them inside CONFIG_DEBUG_VM. We should be able to use
>
> Yeah that was my point. Does this still need to be an always-on check, or can
> we hide it behind CONFIG_DEBUG_VM ?

Ok I guess we can move that within a CONFIG_DEBUG_VM #ifdef considering
we will run with DEBUG_VM enabled once in a while to catch wrong usage. I will get
a patch to do that.

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  2016-05-30  8:15 ` [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Gabriel Paubert
@ 2016-05-31 10:09   ` Anton Blanchard
  2016-06-01  3:06     ` Michael Ellerman
  0 siblings, 1 reply; 17+ messages in thread
From: Anton Blanchard @ 2016-05-31 10:09 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: mpe, benh, paulus, aneesh.kumar, acsawdey, linuxppc-dev


> Huh? Make it an unsigned long please, which is the type of the msr
> field in struct pt_regs to work on both 32 and 64 bit processors.

Thanks, not sure what I was thinking there. Will respin.

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  2016-05-31 10:09   ` Anton Blanchard
@ 2016-06-01  3:06     ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2016-06-01  3:06 UTC (permalink / raw)
  To: Anton Blanchard, Gabriel Paubert
  Cc: benh, paulus, aneesh.kumar, acsawdey, linuxppc-dev

On Tue, 2016-05-31 at 20:09 +1000, Anton Blanchard wrote:
> > Huh? Make it an unsigned long please, which is the type of the msr
> > field in struct pt_regs to work on both 32 and 64 bit processors.
> 
> Thanks, not sure what I was thinking there. Will respin.

I'll fix it up here.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
  2016-05-30  5:17   ` Aneesh Kumar K.V
  2016-05-30 23:29   ` Michael Ellerman
@ 2016-06-01  3:43   ` Michael Ellerman
  2016-06-01  5:35     ` Anton Blanchard
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2016-06-01  3:43 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

On Sun, 2016-29-05 at 12:03:52 UTC, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> In many cases we disable interrupts right before calling
> find_linux_pte_or_hugepte().
> 
> find_linux_pte_or_hugepte() first checks interrupts are disabled
> before calling __find_linux_pte_or_hugepte():
> 
>         if (!arch_irqs_disabled()) {
>                 pr_info("%s called with irq enabled\n", __func__);
>                 dump_stack();
>         }
>         return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);
> 
> We know interrupts are disabled, but since the arch_irqs_*() macros
> are hidden from the compiler with inline assembly, gcc does not. We
> end up with a pretty awful load hit store:
> 
> 	li      r9,0
> 	lbz     r24,570(r13)
> 	stb     r9,570(r13)	<----
> 	lbz     r9,570(r13)	<---- ouch
> 	cmpdi   cr7,r9,0
> 	bne     cr7,c000000000049d30
> 
> Find these cases, and call __find_linux_pte_or_hugepte() directly.

I'd really rather __find_linux_pte_or_hugepte() was an internal detail, rather
than the standard API.

We do already have quite a few uses, but adding more just further spreads the
details about how the implementation works.

So I'm going to drop this in favor of Aneesh's patch to make irqs disabled check
a VM_WARN().

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte()
  2016-06-01  3:43   ` [3/3] " Michael Ellerman
@ 2016-06-01  5:35     ` Anton Blanchard
  0 siblings, 0 replies; 17+ messages in thread
From: Anton Blanchard @ 2016-06-01  5:35 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: benh, paulus, aneesh.kumar, acsawdey, linuxppc-dev

Hi Michael,

> I'd really rather __find_linux_pte_or_hugepte() was an internal
> detail, rather than the standard API.
> 
> We do already have quite a few uses, but adding more just further
> spreads the details about how the implementation works.
> 
> So I'm going to drop this in favor of Aneesh's patch to make irqs
> disabled check a VM_WARN().

Makes sense, I agree.

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  2016-05-29 12:03 [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Anton Blanchard
                   ` (2 preceding siblings ...)
  2016-05-30  8:15 ` [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Gabriel Paubert
@ 2016-06-15 12:39 ` Michael Ellerman
  3 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2016-06-15 12:39 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

On Sun, 2016-29-05 at 12:03:50 UTC, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> In both __giveup_fpu() and __giveup_altivec() we make two modifications
> to tsk->thread.regs->msr. gcc decides to do a read/modify/write of
> each change, so we end up with a load hit store:
> 
>         ld      r9,264(r10)
>         rldicl  r9,r9,50,1
>         rotldi  r9,r9,14
>         std     r9,264(r10)
> ...
>         ld      r9,264(r10)
>         rldicl  r9,r9,40,1
>         rotldi  r9,r9,24
>         std     r9,264(r10)
> 
> Fix this by using a temporary.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8eb9803723a14fd12675641b95

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [2/3] powerpc: Avoid load hit store in setup_sigcontext()
  2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
  2016-05-29 23:14   ` Michael Neuling
@ 2016-06-15 12:39   ` Michael Ellerman
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2016-06-15 12:39 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, aneesh.kumar, acsawdey; +Cc: linuxppc-dev

On Sun, 2016-29-05 at 12:03:51 UTC, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> In setup_sigcontext(), we set current->thread.vrsave then use it
> straight after. Since current is hidden from the compiler via inline
> assembly, it cannot optimise this and we end up with a load hit store.
> 
> Fix this by using a temporary.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d96f234f47aff593538f9e3d67

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-15 12:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-29 12:03 [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Anton Blanchard
2016-05-29 12:03 ` [PATCH 2/3] powerpc: Avoid load hit store in setup_sigcontext() Anton Blanchard
2016-05-29 23:14   ` Michael Neuling
2016-05-29 23:23     ` Anton Blanchard
2016-06-15 12:39   ` [2/3] " Michael Ellerman
2016-05-29 12:03 ` [PATCH 3/3] powerpc: Avoid load hit store when using find_linux_pte_or_hugepte() Anton Blanchard
2016-05-30  5:17   ` Aneesh Kumar K.V
2016-05-30 23:29   ` Michael Ellerman
2016-05-31  3:29     ` Aneesh Kumar K.V
2016-05-31  4:18       ` Michael Ellerman
2016-05-31  6:52         ` Aneesh Kumar K.V
2016-06-01  3:43   ` [3/3] " Michael Ellerman
2016-06-01  5:35     ` Anton Blanchard
2016-05-30  8:15 ` [PATCH 1/3] powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() Gabriel Paubert
2016-05-31 10:09   ` Anton Blanchard
2016-06-01  3:06     ` Michael Ellerman
2016-06-15 12:39 ` [1/3] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.