linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] powerpc/irq: bring back ksp_limit management in C functions.
@ 2019-09-18 15:48 Christophe Leroy
  2019-09-18 15:48 ` [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq() Christophe Leroy
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2019-09-18 15:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, segher,
	npiggin
  Cc: linux-kernel, linuxppc-dev

Commit cbc9565ee826 ("powerpc: Remove ksp_limit on ppc64") moved
PPC32 ksp_limit handling in assembly functions call_do_softirq()
and call_do_irq() as they are different for PPC32 and PPC64.

In preparation of replacing these functions by inline assembly,
partialy revert that commit to bring back ksp_limit assignment
in the callers.

To get and set ksp_limit without a forest of #ifdefs CONFIG_PPC32,
use helpers that will void on PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
v2: added forward declaration of struct task_struct to avoid build failure.
v3: included linux/sched.h, forward declaration is not enough.
---
 arch/powerpc/include/asm/irq.h | 22 ++++++++++++++++++++++
 arch/powerpc/kernel/irq.c      | 14 +++++++++++++-
 arch/powerpc/kernel/misc_32.S  | 14 --------------
 3 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 814dfab7e392..0c6469983c66 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -10,6 +10,7 @@
 #include <linux/threads.h>
 #include <linux/list.h>
 #include <linux/radix-tree.h>
+#include <linux/sched.h>
 
 #include <asm/types.h>
 #include <linux/atomic.h>
@@ -64,5 +65,26 @@ extern void __do_irq(struct pt_regs *regs);
 
 int irq_choose_cpu(const struct cpumask *mask);
 
+#ifdef CONFIG_PPC32
+static inline unsigned long get_ksp_limit(struct task_struct *tsk)
+{
+	return tsk->thread.ksp_limit;
+}
+
+static inline void set_ksp_limit(struct task_struct *tsk, unsigned long limit)
+{
+	tsk->thread.ksp_limit = limit;
+}
+#else
+static inline unsigned long get_ksp_limit(struct task_struct *tsk)
+{
+	return 0;
+}
+
+static inline void set_ksp_limit(struct task_struct *tsk, unsigned long limit)
+{
+}
+#endif
+
 #endif /* _ASM_IRQ_H */
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5645bc9cbc09..04204be49577 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -646,6 +646,7 @@ void do_IRQ(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
 	void *cursp, *irqsp, *sirqsp;
+	unsigned long saved_ksp_limit = get_ksp_limit(current);
 
 	/* Switch to the irq stack to handle this */
 	cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
@@ -658,9 +659,15 @@ void do_IRQ(struct pt_regs *regs)
 		set_irq_regs(old_regs);
 		return;
 	}
+	/* Adjust the stack limit */
+	set_ksp_limit(current, (unsigned long)irqsp);
+
 	/* Switch stack and call */
 	call_do_irq(regs, irqsp);
 
+	/* Restore stack limit */
+	set_ksp_limit(current, saved_ksp_limit);
+
 	set_irq_regs(old_regs);
 }
 
@@ -681,7 +688,12 @@ void *hardirq_ctx[NR_CPUS] __read_mostly;
 
 void do_softirq_own_stack(void)
 {
-	call_do_softirq(softirq_ctx[smp_processor_id()]);
+	void *irqsp = softirq_ctx[smp_processor_id()];
+	unsigned long saved_ksp_limit = get_ksp_limit(current);
+
+	set_ksp_limit(current, (unsigned long)irqsp);
+	call_do_softirq(irqsp);
+	set_ksp_limit(current, saved_ksp_limit);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 82df4b09e79f..a5422f7782b3 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -33,23 +33,14 @@
 
 	.text
 
-/*
- * We store the saved ksp_limit in the unused part
- * of the STACK_FRAME_OVERHEAD
- */
 _GLOBAL(call_do_softirq)
 	mflr	r0
 	stw	r0,4(r1)
-	lwz	r10,THREAD+KSP_LIMIT(r2)
-	stw	r3, THREAD+KSP_LIMIT(r2)
 	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
 	mr	r1,r3
-	stw	r10,8(r1)
 	bl	__do_softirq
-	lwz	r10,8(r1)
 	lwz	r1,0(r1)
 	lwz	r0,4(r1)
-	stw	r10,THREAD+KSP_LIMIT(r2)
 	mtlr	r0
 	blr
 
@@ -59,16 +50,11 @@ _GLOBAL(call_do_softirq)
 _GLOBAL(call_do_irq)
 	mflr	r0
 	stw	r0,4(r1)
-	lwz	r10,THREAD+KSP_LIMIT(r2)
-	stw	r4, THREAD+KSP_LIMIT(r2)
 	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
 	mr	r1,r4
-	stw	r10,8(r1)
 	bl	__do_irq
-	lwz	r10,8(r1)
 	lwz	r1,0(r1)
 	lwz	r0,4(r1)
-	stw	r10,THREAD+KSP_LIMIT(r2)
 	mtlr	r0
 	blr
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()
  2019-09-18 15:48 [PATCH v3 1/2] powerpc/irq: bring back ksp_limit management in C functions Christophe Leroy
@ 2019-09-18 15:48 ` Christophe Leroy
  2019-09-18 16:39   ` Segher Boessenkool
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2019-09-18 15:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, segher,
	npiggin
  Cc: linux-kernel, linuxppc-dev

call_do_irq() and call_do_softirq() are quite similar on PPC32 and
PPC64 and are simple enough to be worth inlining.

Inlining them avoids an mflr/mtlr pair plus a save/reload on stack.

This is inspired from S390 arch. Several other arches do more or
less the same. The way sparc arch does seems odd thought.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

---
v2: no change.
v3: no change.
---
 arch/powerpc/include/asm/irq.h |  2 --
 arch/powerpc/kernel/irq.c      | 26 ++++++++++++++++++++++++++
 arch/powerpc/kernel/misc_32.S  | 25 -------------------------
 arch/powerpc/kernel/misc_64.S  | 22 ----------------------
 4 files changed, 26 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 0c6469983c66..10476d5283dc 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -57,8 +57,6 @@ extern void *mcheckirq_ctx[NR_CPUS];
 extern void *hardirq_ctx[NR_CPUS];
 extern void *softirq_ctx[NR_CPUS];
 
-void call_do_softirq(void *sp);
-void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 04204be49577..b028c49f9635 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -642,6 +642,20 @@ void __do_irq(struct pt_regs *regs)
 	irq_exit();
 }
 
+static inline void call_do_irq(struct pt_regs *regs, void *sp)
+{
+	register unsigned long r3 asm("r3") = (unsigned long)regs;
+
+	asm volatile(
+		"	"PPC_STLU"	1, %2(%1);\n"
+		"	mr		1, %1;\n"
+		"	bl		%3;\n"
+		"	"PPC_LL"	1, 0(1);\n" : "+r"(r3) :
+		"b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_irq) :
+		"lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
+		"r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
+}
+
 void do_IRQ(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
@@ -686,6 +700,18 @@ void *mcheckirq_ctx[NR_CPUS] __read_mostly;
 void *softirq_ctx[NR_CPUS] __read_mostly;
 void *hardirq_ctx[NR_CPUS] __read_mostly;
 
+static inline void call_do_softirq(const void *sp)
+{
+	asm volatile(
+		"	"PPC_STLU"	1, %1(%0);\n"
+		"	mr		1, %0;\n"
+		"	bl		%2;\n"
+		"	"PPC_LL"	1, 0(1);\n" : :
+		"b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_softirq) :
+		"lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
+		"r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
+}
+
 void do_softirq_own_stack(void)
 {
 	void *irqsp = softirq_ctx[smp_processor_id()];
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index a5422f7782b3..307307b57743 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -33,31 +33,6 @@
 
 	.text
 
-_GLOBAL(call_do_softirq)
-	mflr	r0
-	stw	r0,4(r1)
-	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
-	mr	r1,r3
-	bl	__do_softirq
-	lwz	r1,0(r1)
-	lwz	r0,4(r1)
-	mtlr	r0
-	blr
-
-/*
- * void call_do_irq(struct pt_regs *regs, void *sp);
- */
-_GLOBAL(call_do_irq)
-	mflr	r0
-	stw	r0,4(r1)
-	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
-	mr	r1,r4
-	bl	__do_irq
-	lwz	r1,0(r1)
-	lwz	r0,4(r1)
-	mtlr	r0
-	blr
-
 /*
  * This returns the high 64 bits of the product of two 64-bit numbers.
  */
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..69fd714a5236 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,28 +27,6 @@
 
 	.text
 
-_GLOBAL(call_do_softirq)
-	mflr	r0
-	std	r0,16(r1)
-	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
-	mr	r1,r3
-	bl	__do_softirq
-	ld	r1,0(r1)
-	ld	r0,16(r1)
-	mtlr	r0
-	blr
-
-_GLOBAL(call_do_irq)
-	mflr	r0
-	std	r0,16(r1)
-	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
-	mr	r1,r4
-	bl	__do_irq
-	ld	r1,0(r1)
-	ld	r0,16(r1)
-	mtlr	r0
-	blr
-
 	.section	".toc","aw"
 PPC64_CACHES:
 	.tc		ppc64_caches[TC],ppc64_caches
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()
  2019-09-18 15:48 ` [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq() Christophe Leroy
@ 2019-09-18 16:39   ` Segher Boessenkool
  2019-09-19  5:23     ` Christophe Leroy
  0 siblings, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2019-09-18 16:39 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	npiggin, linux-kernel, linuxppc-dev

Hi Christophe,

On Wed, Sep 18, 2019 at 03:48:20PM +0000, Christophe Leroy wrote:
> call_do_irq() and call_do_softirq() are quite similar on PPC32 and
> PPC64 and are simple enough to be worth inlining.
> 
> Inlining them avoids an mflr/mtlr pair plus a save/reload on stack.

But you hardcode the calling sequence in inline asm, which for various
reasons is not a great idea.

> +static inline void call_do_irq(struct pt_regs *regs, void *sp)
> +{
> +	register unsigned long r3 asm("r3") = (unsigned long)regs;
> +
> +	asm volatile(
> +		"	"PPC_STLU"	1, %2(%1);\n"
> +		"	mr		1, %1;\n"
> +		"	bl		%3;\n"
> +		"	"PPC_LL"	1, 0(1);\n" : "+r"(r3) :
> +		"b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_irq) :
> +		"lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
> +		"r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
> +}

I realise the original code had this...  Loading the old stack pointer
value back from the stack creates a bottleneck (via the store->load
forwarding it requires).  It could just use
  addi 1,1,-(%2)
here, which can also be written as
  addi 1,1,%n2
(that is portable to all architectures btw).


Please write the  "+r"(r3)  on the next line?  Not on the same line as
the multi-line template.  This make things more readable.


I don't know if using functions as an "i" works properly...  It probably
does, it's just not something that you see often :-)


What about r2?  Various ABIs handle that differently.  This might make
it impossible to share implementation between 32-bit and 64-bit for this.
But we could add it to the clobber list worst case, that will always work.


So anyway, it looks to me like it will work.  Nice cleanup.  Would be
better if you could do the call to __do_irq from C code, but maybe we
cannot have everything ;-)

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()
  2019-09-18 16:39   ` Segher Boessenkool
@ 2019-09-19  5:23     ` Christophe Leroy
  2019-09-19 19:05       ` Segher Boessenkool
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Leroy @ 2019-09-19  5:23 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	npiggin, linux-kernel, linuxppc-dev



Le 18/09/2019 à 18:39, Segher Boessenkool a écrit :
> Hi Christophe,
> 
> On Wed, Sep 18, 2019 at 03:48:20PM +0000, Christophe Leroy wrote:
>> call_do_irq() and call_do_softirq() are quite similar on PPC32 and
>> PPC64 and are simple enough to be worth inlining.
>>
>> Inlining them avoids an mflr/mtlr pair plus a save/reload on stack.
> 
> But you hardcode the calling sequence in inline asm, which for various
> reasons is not a great idea.
> 
>> +static inline void call_do_irq(struct pt_regs *regs, void *sp)
>> +{
>> +	register unsigned long r3 asm("r3") = (unsigned long)regs;
>> +
>> +	asm volatile(
>> +		"	"PPC_STLU"	1, %2(%1);\n"
>> +		"	mr		1, %1;\n"
>> +		"	bl		%3;\n"
>> +		"	"PPC_LL"	1, 0(1);\n" : "+r"(r3) :
>> +		"b"(sp), "i"(THREAD_SIZE - STACK_FRAME_OVERHEAD), "i"(__do_irq) :
>> +		"lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", "cr7",
>> +		"r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12");
>> +}
> 
> I realise the original code had this...  Loading the old stack pointer
> value back from the stack creates a bottleneck (via the store->load
> forwarding it requires).  It could just use
>    addi 1,1,-(%2)
> here, which can also be written as
>    addi 1,1,%n2
> (that is portable to all architectures btw).

No, we switched stack before the bl call, we replaced r1 by r3 after 
saving r1 into r3 stack. Now we have to restore the original r1.

> 
> 
> Please write the  "+r"(r3)  on the next line?  Not on the same line as
> the multi-line template.  This make things more readable.
> 
> 
> I don't know if using functions as an "i" works properly...  It probably
> does, it's just not something that you see often :-)
> 
> 
> What about r2?  Various ABIs handle that differently.  This might make
> it impossible to share implementation between 32-bit and 64-bit for this.
> But we could add it to the clobber list worst case, that will always work.

Isn't r2 non-volatile on all ABIs ?

> 
> 
> So anyway, it looks to me like it will work.  Nice cleanup.  Would be
> better if you could do the call to __do_irq from C code, but maybe we
> cannot have everything ;-)

sparc do it the following way, is there no risk that GCC adds unwanted 
code inbetween that is not aware there the stack pointer has changed ?

void do_softirq_own_stack(void)
{
	void *orig_sp, *sp = softirq_stack[smp_processor_id()];

	sp += THREAD_SIZE - 192 - STACK_BIAS;

	__asm__ __volatile__("mov %%sp, %0\n\t"
			     "mov %1, %%sp"
			     : "=&r" (orig_sp)
			     : "r" (sp));
	__do_softirq();
	__asm__ __volatile__("mov %0, %%sp"
			     : : "r" (orig_sp));
}

If the above is no risk, then can we do the same on powerpc ?

> 
> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>

Thanks for the review.

Christophe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq()
  2019-09-19  5:23     ` Christophe Leroy
@ 2019-09-19 19:05       ` Segher Boessenkool
  0 siblings, 0 replies; 5+ messages in thread
From: Segher Boessenkool @ 2019-09-19 19:05 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	npiggin, linux-kernel, linuxppc-dev

On Thu, Sep 19, 2019 at 07:23:18AM +0200, Christophe Leroy wrote:
> Le 18/09/2019 à 18:39, Segher Boessenkool a écrit :
> >I realise the original code had this...  Loading the old stack pointer
> >value back from the stack creates a bottleneck (via the store->load
> >forwarding it requires).  It could just use
> >   addi 1,1,-(%2)
> >here, which can also be written as
> >   addi 1,1,%n2
> >(that is portable to all architectures btw).
> 
> No, we switched stack before the bl call, we replaced r1 by r3 after 
> saving r1 into r3 stack. Now we have to restore the original r1.

Yeah wow, I missed that once again.  Whoops.

Add a comment for this?  Just before the asm maybe, "we temporarily switch
r1 to a different stack" or something.

> >What about r2?  Various ABIs handle that differently.  This might make
> >it impossible to share implementation between 32-bit and 64-bit for this.
> >But we could add it to the clobber list worst case, that will always work.
> 
> Isn't r2 non-volatile on all ABIs ?

It is not.  On ELFv2 it is (or will be) optionally volatile, but like on
ELFv1 it already has special rules as well: the linker is responsible for
restoring it if it is non-volatile, and for that there needs to be a nop
after the bl, etc.

But the existing code was in a similar situation and we survived that, I
think we should be fine this way too.  And it won't be too hard to change
again if needed.

> >So anyway, it looks to me like it will work.  Nice cleanup.  Would be
> >better if you could do the call to __do_irq from C code, but maybe we
> >cannot have everything ;-)
> 
> sparc do it the following way, is there no risk that GCC adds unwanted 
> code inbetween that is not aware there the stack pointer has changed ?
> 
> void do_softirq_own_stack(void)
> {
> 	void *orig_sp, *sp = softirq_stack[smp_processor_id()];
> 
> 	sp += THREAD_SIZE - 192 - STACK_BIAS;
> 
> 	__asm__ __volatile__("mov %%sp, %0\n\t"
> 			     "mov %1, %%sp"
> 			     : "=&r" (orig_sp)
> 			     : "r" (sp));
> 	__do_softirq();
> 	__asm__ __volatile__("mov %0, %%sp"
> 			     : : "r" (orig_sp));
> }
> 
> If the above is no risk, then can we do the same on powerpc ?

No, that is a quite bad idea: it depends on the stack pointer not being
used in any way between the two asms.  Which this code does not guarantee
(what if it is inlined, for example).

Doing the stack juggling and the actual call in a single asm is much more
safe and correct.  It's just that you then need asm for the actual call
that works for all ABIs you support, etc. :-)


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-19 19:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 15:48 [PATCH v3 1/2] powerpc/irq: bring back ksp_limit management in C functions Christophe Leroy
2019-09-18 15:48 ` [PATCH v3 2/2] powerpc/irq: inline call_do_irq() and call_do_softirq() Christophe Leroy
2019-09-18 16:39   ` Segher Boessenkool
2019-09-19  5:23     ` Christophe Leroy
2019-09-19 19:05       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).