All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Michael Ellerman <mpe@ellerman.id.au>,
	msuchanek@suse.de, Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
Date: Tue, 09 Feb 2021 11:55:17 +1000	[thread overview]
Message-ID: <1612834634.qle1lc7n6y.astroid@bobo.none> (raw)
In-Reply-To: <981edfd50d4c980634b74c4bb76b765c499a87ec.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> When r3 is not modified, reload it from regs->orig_r3 to free
> volatile registers. This avoids a stack frame for the likely part
> of system_call_exception()

This doesn't on my 64s build, but it does reduce one non volatile
register save/restore. With quite a bit more register pressure
reduction 64s can avoid the stack frame as well.

It's a cool trick but quite code and compiler specific so I don't know 
how worthwhile it is to keep considering we're calling out into random
kernel C code after this.

Maybe just keep it PPC32 specific for the moment, will have to do more
tuning for 64 and we have other stuff to do there first.

If you are happy to make it 32-bit only then

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Before the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	7c 08 02 a6 	mflr    r0
> c000b4d8:	94 21 ff e0 	stwu    r1,-32(r1)
> c000b4dc:	93 e1 00 1c 	stw     r31,28(r1)
> c000b4e0:	90 01 00 24 	stw     r0,36(r1)
> c000b4e4:	90 6a 00 88 	stw     r3,136(r10)
> c000b4e8:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4ec:	69 6b 00 02 	xori    r11,r11,2
> c000b4f0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4f4:	0f 0b 00 00 	twnei   r11,0
> c000b4f8:	81 6a 00 a0 	lwz     r11,160(r10)
> c000b4fc:	55 6b 07 fe 	clrlwi  r11,r11,31
> c000b500:	0f 0b 00 00 	twnei   r11,0
> c000b504:	7c 0c 42 e6 	mftb    r0
> c000b508:	83 e2 00 08 	lwz     r31,8(r2)
> c000b50c:	81 82 00 28 	lwz     r12,40(r2)
> c000b510:	90 02 00 24 	stw     r0,36(r2)
> c000b514:	7d 8c f8 50 	subf    r12,r12,r31
> c000b518:	7c 0c 02 14 	add     r0,r12,r0
> c000b51c:	90 02 00 08 	stw     r0,8(r2)
> c000b520:	7c 10 13 a6 	mtspr   80,r0
> c000b524:	81 62 00 70 	lwz     r11,112(r2)
> c000b528:	71 60 86 91 	andi.   r0,r11,34449
> c000b52c:	40 82 00 34 	bne     c000b560 <system_call_exception+0x8c>
> c000b530:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b534:	41 9d 00 64 	bgt     cr7,c000b598 <system_call_exception+0xc4>
> c000b538:	3d 40 c0 5c 	lis     r10,-16292
> c000b53c:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b540:	39 4a 41 e8 	addi    r10,r10,16872
> c000b544:	80 01 00 24 	lwz     r0,36(r1)
> c000b548:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b54c:	7c 08 03 a6 	mtlr    r0
> c000b550:	7d 29 03 a6 	mtctr   r9
> c000b554:	83 e1 00 1c 	lwz     r31,28(r1)
> c000b558:	38 21 00 20 	addi    r1,r1,32
> c000b55c:	4e 80 04 20 	bctr
> 
> After the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4d8:	90 6a 00 88 	stw     r3,136(r10)
> c000b4dc:	69 6b 00 02 	xori    r11,r11,2
> c000b4e0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4e4:	0f 0b 00 00 	twnei   r11,0
> c000b4e8:	80 6a 00 a0 	lwz     r3,160(r10)
> c000b4ec:	54 63 07 fe 	clrlwi  r3,r3,31
> c000b4f0:	0f 03 00 00 	twnei   r3,0
> c000b4f4:	7d 6c 42 e6 	mftb    r11
> c000b4f8:	81 82 00 08 	lwz     r12,8(r2)
> c000b4fc:	80 02 00 28 	lwz     r0,40(r2)
> c000b500:	91 62 00 24 	stw     r11,36(r2)
> c000b504:	7c 00 60 50 	subf    r0,r0,r12
> c000b508:	7d 60 5a 14 	add     r11,r0,r11
> c000b50c:	91 62 00 08 	stw     r11,8(r2)
> c000b510:	7c 10 13 a6 	mtspr   80,r0
> c000b514:	80 62 00 70 	lwz     r3,112(r2)
> c000b518:	70 6b 86 91 	andi.   r11,r3,34449
> c000b51c:	40 82 00 28 	bne     c000b544 <system_call_exception+0x70>
> c000b520:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b524:	41 9d 00 84 	bgt     cr7,c000b5a8 <system_call_exception+0xd4>
> c000b528:	80 6a 00 88 	lwz     r3,136(r10)
> c000b52c:	3d 40 c0 5c 	lis     r10,-16292
> c000b530:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b534:	39 4a 41 e4 	addi    r10,r10,16868
> c000b538:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b53c:	7d 29 03 a6 	mtctr   r9
> c000b540:	4e 80 04 20 	bctr
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 107ec39f05cb..205902052112 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  			return regs->gpr[3];
>  		}
>  		return -ENOSYS;
> +	} else {
> +		/* Restore r3 from orig_gpr3 to free up a volatile reg */
> +		r3 = regs->orig_gpr3;
>  	}
>  
>  	/* May be faster to do array_index_nospec? */
> -- 
> 2.25.0
> 
> 

WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Michael Ellerman <mpe@ellerman.id.au>,
	msuchanek@suse.de, Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
Date: Tue, 09 Feb 2021 11:55:17 +1000	[thread overview]
Message-ID: <1612834634.qle1lc7n6y.astroid@bobo.none> (raw)
In-Reply-To: <981edfd50d4c980634b74c4bb76b765c499a87ec.1612796617.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> When r3 is not modified, reload it from regs->orig_r3 to free
> volatile registers. This avoids a stack frame for the likely part
> of system_call_exception()

This doesn't on my 64s build, but it does reduce one non volatile
register save/restore. With quite a bit more register pressure
reduction 64s can avoid the stack frame as well.

It's a cool trick but quite code and compiler specific so I don't know 
how worthwhile it is to keep considering we're calling out into random
kernel C code after this.

Maybe just keep it PPC32 specific for the moment, will have to do more
tuning for 64 and we have other stuff to do there first.

If you are happy to make it 32-bit only then

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Before the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	7c 08 02 a6 	mflr    r0
> c000b4d8:	94 21 ff e0 	stwu    r1,-32(r1)
> c000b4dc:	93 e1 00 1c 	stw     r31,28(r1)
> c000b4e0:	90 01 00 24 	stw     r0,36(r1)
> c000b4e4:	90 6a 00 88 	stw     r3,136(r10)
> c000b4e8:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4ec:	69 6b 00 02 	xori    r11,r11,2
> c000b4f0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4f4:	0f 0b 00 00 	twnei   r11,0
> c000b4f8:	81 6a 00 a0 	lwz     r11,160(r10)
> c000b4fc:	55 6b 07 fe 	clrlwi  r11,r11,31
> c000b500:	0f 0b 00 00 	twnei   r11,0
> c000b504:	7c 0c 42 e6 	mftb    r0
> c000b508:	83 e2 00 08 	lwz     r31,8(r2)
> c000b50c:	81 82 00 28 	lwz     r12,40(r2)
> c000b510:	90 02 00 24 	stw     r0,36(r2)
> c000b514:	7d 8c f8 50 	subf    r12,r12,r31
> c000b518:	7c 0c 02 14 	add     r0,r12,r0
> c000b51c:	90 02 00 08 	stw     r0,8(r2)
> c000b520:	7c 10 13 a6 	mtspr   80,r0
> c000b524:	81 62 00 70 	lwz     r11,112(r2)
> c000b528:	71 60 86 91 	andi.   r0,r11,34449
> c000b52c:	40 82 00 34 	bne     c000b560 <system_call_exception+0x8c>
> c000b530:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b534:	41 9d 00 64 	bgt     cr7,c000b598 <system_call_exception+0xc4>
> c000b538:	3d 40 c0 5c 	lis     r10,-16292
> c000b53c:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b540:	39 4a 41 e8 	addi    r10,r10,16872
> c000b544:	80 01 00 24 	lwz     r0,36(r1)
> c000b548:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b54c:	7c 08 03 a6 	mtlr    r0
> c000b550:	7d 29 03 a6 	mtctr   r9
> c000b554:	83 e1 00 1c 	lwz     r31,28(r1)
> c000b558:	38 21 00 20 	addi    r1,r1,32
> c000b55c:	4e 80 04 20 	bctr
> 
> After the patch:
> 
> c000b4d4 <system_call_exception>:
> c000b4d4:	81 6a 00 84 	lwz     r11,132(r10)
> c000b4d8:	90 6a 00 88 	stw     r3,136(r10)
> c000b4dc:	69 6b 00 02 	xori    r11,r11,2
> c000b4e0:	55 6b ff fe 	rlwinm  r11,r11,31,31,31
> c000b4e4:	0f 0b 00 00 	twnei   r11,0
> c000b4e8:	80 6a 00 a0 	lwz     r3,160(r10)
> c000b4ec:	54 63 07 fe 	clrlwi  r3,r3,31
> c000b4f0:	0f 03 00 00 	twnei   r3,0
> c000b4f4:	7d 6c 42 e6 	mftb    r11
> c000b4f8:	81 82 00 08 	lwz     r12,8(r2)
> c000b4fc:	80 02 00 28 	lwz     r0,40(r2)
> c000b500:	91 62 00 24 	stw     r11,36(r2)
> c000b504:	7c 00 60 50 	subf    r0,r0,r12
> c000b508:	7d 60 5a 14 	add     r11,r0,r11
> c000b50c:	91 62 00 08 	stw     r11,8(r2)
> c000b510:	7c 10 13 a6 	mtspr   80,r0
> c000b514:	80 62 00 70 	lwz     r3,112(r2)
> c000b518:	70 6b 86 91 	andi.   r11,r3,34449
> c000b51c:	40 82 00 28 	bne     c000b544 <system_call_exception+0x70>
> c000b520:	2b 89 01 b6 	cmplwi  cr7,r9,438
> c000b524:	41 9d 00 84 	bgt     cr7,c000b5a8 <system_call_exception+0xd4>
> c000b528:	80 6a 00 88 	lwz     r3,136(r10)
> c000b52c:	3d 40 c0 5c 	lis     r10,-16292
> c000b530:	55 29 10 3a 	rlwinm  r9,r9,2,0,29
> c000b534:	39 4a 41 e4 	addi    r10,r10,16868
> c000b538:	7d 2a 48 2e 	lwzx    r9,r10,r9
> c000b53c:	7d 29 03 a6 	mtctr   r9
> c000b540:	4e 80 04 20 	bctr
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 107ec39f05cb..205902052112 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  			return regs->gpr[3];
>  		}
>  		return -ENOSYS;
> +	} else {
> +		/* Restore r3 from orig_gpr3 to free up a volatile reg */
> +		r3 = regs->orig_gpr3;
>  	}
>  
>  	/* May be faster to do array_index_nospec? */
> -- 
> 2.25.0
> 
> 

  reply	other threads:[~2021-02-09  1:56 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 15:10 [PATCH v5 00/22] powerpc/32: Implement C syscall entry/exit Christophe Leroy
2021-02-08 15:10 ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 01/22] powerpc/32s: Add missing call to kuep_lock on syscall entry Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 02/22] powerpc/32: Always enable data translation " Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 03/22] powerpc/32: On syscall entry, enable instruction translation at the same time as data Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 04/22] powerpc/32: Reorder instructions to avoid using CTR in syscall entry Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 05/22] powerpc/irq: Add helper to set regs->softe Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:11   ` Nicholas Piggin
2021-02-09  1:11     ` Nicholas Piggin
2021-02-09  5:57     ` Christophe Leroy
2021-02-09  5:57       ` Christophe Leroy
2021-02-09  7:47       ` Nicholas Piggin
2021-02-09  7:47         ` Nicholas Piggin
2021-02-09  6:18     ` Christophe Leroy
2021-02-09  6:18       ` Christophe Leroy
2021-02-09  7:49       ` Nicholas Piggin
2021-02-09  7:49         ` Nicholas Piggin
2021-03-05  8:54         ` Christophe Leroy
2021-03-05  8:54           ` Christophe Leroy
2021-03-08  8:47           ` Nicholas Piggin
2021-03-08  8:47             ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 06/22] powerpc/irq: Rework helpers that manipulate MSR[EE/RI] Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:14   ` Nicholas Piggin
2021-02-09  1:14     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 07/22] powerpc/irq: Add stub irq_soft_mask_return() for PPC32 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:19   ` Nicholas Piggin
2021-02-09  1:19     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 08/22] powerpc/syscall: Rename syscall_64.c into interrupt.c Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:19   ` Nicholas Piggin
2021-02-09  1:19     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 09/22] powerpc/syscall: Make interrupt.c buildable on PPC32 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:27   ` Nicholas Piggin
2021-02-09  1:27     ` Nicholas Piggin
2021-02-09  6:02     ` Christophe Leroy
2021-02-09  6:02       ` Christophe Leroy
2021-02-09  7:50       ` Nicholas Piggin
2021-02-09  7:50         ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 10/22] powerpc/syscall: Use is_compat_task() Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:29   ` Nicholas Piggin
2021-02-09  1:29     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 11/22] powerpc/syscall: Save r3 in regs->orig_r3 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:29   ` Nicholas Piggin
2021-02-09  1:29     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 12/22] powerpc/syscall: Change condition to check MSR_RI Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:36   ` Nicholas Piggin
2021-02-09  1:36     ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 13/22] powerpc/32: Always save non volatile GPRs at syscall entry Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 14/22] powerpc/syscall: implement system call entry/exit logic in C for PPC32 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 15/22] powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception() Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:55   ` Nicholas Piggin [this message]
2021-02-09  1:55     ` Nicholas Piggin
2021-02-09 16:13     ` Christophe Leroy
2021-02-09 16:13       ` Christophe Leroy
2021-02-10  1:56       ` Nicholas Piggin
2021-02-10  1:56         ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 17/22] powerpc/syscall: Do not check unsupported scv vector on PPC32 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  2:00   ` Nicholas Piggin
2021-02-09  2:00     ` Nicholas Piggin
2021-02-09  6:13     ` Christophe Leroy
2021-02-09  6:13       ` Christophe Leroy
2021-02-09  7:56       ` Nicholas Piggin
2021-02-09  7:56         ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 18/22] powerpc/syscall: Remove FULL_REGS verification in system_call_exception Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  2:02   ` Nicholas Piggin
2021-02-09  2:02     ` Nicholas Piggin
2021-02-09 14:31     ` Christophe Leroy
2021-02-09 14:31       ` Christophe Leroy
2021-02-10  1:57       ` Nicholas Piggin
2021-02-10  1:57         ` Nicholas Piggin
2021-02-08 15:10 ` [PATCH v5 19/22] powerpc/syscall: Optimise checks in beginning of system_call_exception() Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  2:06   ` Nicholas Piggin
2021-02-09  2:06     ` Nicholas Piggin
2021-02-09 14:32     ` Christophe Leroy
2021-02-09 14:32       ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  2:36   ` Nicholas Piggin
2021-02-09  2:36     ` Nicholas Piggin
2021-02-09 13:50     ` Segher Boessenkool
2021-02-09 13:50       ` Segher Boessenkool
2021-02-09 14:31       ` David Laight
2021-02-09 14:31         ` David Laight
2021-02-09 17:03         ` Christophe Leroy
2021-02-09 17:03           ` Christophe Leroy
2021-02-09 17:16           ` David Laight
2021-02-09 17:16             ` David Laight
2021-02-10  2:00           ` Nicholas Piggin
2021-02-10  2:00             ` Nicholas Piggin
2021-02-10  8:45             ` Christophe Leroy
2021-02-10  8:45               ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 21/22] powerpc/32: Remove the counter in global_dbcr0 Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-08 15:10 ` [PATCH v5 22/22] powerpc/32: Handle bookE debugging in C in syscall entry/exit Christophe Leroy
2021-02-08 15:10   ` Christophe Leroy
2021-02-09  1:03 ` [PATCH v5 00/22] powerpc/32: Implement C " Nicholas Piggin
2021-02-09  1:03   ` Nicholas Piggin
2021-02-12  0:19 ` Michael Ellerman
2021-02-12  0:19   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1612834634.qle1lc7n6y.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.