All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] MIPS: memset: Fix `noreorder' issues
@ 2018-10-02 11:50 Maciej W. Rozycki
  2018-10-02 11:50 ` [PATCH 1/2] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Maciej W. Rozycki
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Maciej W. Rozycki @ 2018-10-02 11:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton; +Cc: linux-mips, linux-kernel

Hi,

 A recent change broke CPU_DADDI_WORKAROUNDS support in memset.S, due to a 
delay-slot instruction expanding to multiple hardware operations for the 
affected configurations.

 The underlying cause is the excessive use of the `noreorder' assembly 
mode, while it is only needed in couple of places where either there is a 
data dependency between a branch and its delay slot instruction, or there 
is a section switch involved that would prevent automatic delay slot 
scheduling.

 These changes address both problems and for clarity, not to mix multiple 
conceptually separate changes and to make backporting easier I made them a 
small patch series.  See individual change descriptions for details.

 This has been build-time and run-time verified with 32-bit and 64-bit 
DECstation configurations, build-time verified with big-endian and 
little-endian 64-bit SWARM configurations.  Build-time verification was 
made by running `objdump -d arch/mips/lib/memset.o' with a pristine and 
and a patched build to make sure there has been no change in machine code 
generation, except for the delay-slot multiple instruction with the 64-bit 
CPU_DADDI_WORKAROUNDS DECstation configuration.

 Please apply.

  Maciej

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression
  2018-10-02 11:50 [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Maciej W. Rozycki
@ 2018-10-02 11:50 ` Maciej W. Rozycki
  2018-10-02 11:50 ` [PATCH 2/2] MIPS: memset: Limit excessive `noreorder' assembly mode use Maciej W. Rozycki
  2018-10-11 16:36 ` [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Paul Burton
  2 siblings, 0 replies; 4+ messages in thread
From: Maciej W. Rozycki @ 2018-10-02 11:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton; +Cc: linux-mips, linux-kernel, stable

Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for 
small_memset") regression and remove assembly warnings:

arch/mips/lib/memset.S: Assembler messages:
arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot

triggering with the CPU_DADDI_WORKAROUNDS option set and this code:

	PTR_SUBU	a2, t1, a0
	jr		ra
	 PTR_ADDIU	a2, 1

This is because with that option in place the DADDIU instruction, which 
the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn 
expands to an LI/DADDU (or actually ADDIU/DADDU) sequence:

 13c:	01a4302f 	dsubu	a2,t1,a0
 140:	03e00008 	jr	ra
 144:	24010001 	li	at,1
 148:	00c1302d 	daddu	a2,a2,at
	...

Correct this by switching off the `noreorder' assembly mode and letting 
GAS schedule this jump's delay slot, as there is nothing special about 
it that would require manual scheduling.  With this change in place 
correct code is produced:

 13c:	01a4302f 	dsubu	a2,t1,a0
 140:	24010001 	li	at,1
 144:	03e00008 	jr	ra
 148:	00c1302d 	daddu	a2,a2,at
	...

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset")
Cc: stable@vger.kernel.org # 4.17+
---
 arch/mips/lib/memset.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

linux-mips-memset-jr-ra-nodaddi-fix.patch
Index: linux-20180930-3maxp-defconfig/arch/mips/lib/memset.S
===================================================================
--- linux-20180930-3maxp-defconfig.orig/arch/mips/lib/memset.S
+++ linux-20180930-3maxp-defconfig/arch/mips/lib/memset.S
@@ -280,9 +280,11 @@
 	 * unset_bytes = end_addr - current_addr + 1
 	 *      a2     =    t1    -      a0      + 1
 	 */
+	.set		reorder
 	PTR_SUBU	a2, t1, a0
+	PTR_ADDIU	a2, 1
 	jr		ra
-	 PTR_ADDIU	a2, 1
+	.set		noreorder
 
 	.endm
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] MIPS: memset: Limit excessive `noreorder' assembly mode use
  2018-10-02 11:50 [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Maciej W. Rozycki
  2018-10-02 11:50 ` [PATCH 1/2] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Maciej W. Rozycki
@ 2018-10-02 11:50 ` Maciej W. Rozycki
  2018-10-11 16:36 ` [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Paul Burton
  2 siblings, 0 replies; 4+ messages in thread
From: Maciej W. Rozycki @ 2018-10-02 11:50 UTC (permalink / raw)
  To: Ralf Baechle, Paul Burton; +Cc: linux-mips, linux-kernel

Rewrite to use the `reorder' assembly mode and remove manually scheduled 
delay slots except where GAS cannot schedule a delay-slot instruction 
due to a data dependency or a section switch (as is the case with the EX 
macro).  No change in machine code produced.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
 arch/mips/lib/memset.S |   48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

linux-mips-memset-reorder.patch
Index: linux-20180930-3maxp-defconfig/arch/mips/lib/memset.S
===================================================================
--- linux-20180930-3maxp-defconfig.orig/arch/mips/lib/memset.S
+++ linux-20180930-3maxp-defconfig/arch/mips/lib/memset.S
@@ -78,7 +78,6 @@
 #endif
 	.endm
 
-	.set	noreorder
 	.align	5
 
 	/*
@@ -94,13 +93,16 @@
 	.endif
 
 	sltiu		t0, a2, STORSIZE	/* very small region? */
+	.set		noreorder
 	bnez		t0, .Lsmall_memset\@
 	 andi		t0, a0, STORMASK	/* aligned? */
+	.set		reorder
 
 #ifdef CONFIG_CPU_MICROMIPS
 	move		t8, a1			/* used by 'swp' instruction */
 	move		t9, a1
 #endif
+	.set		noreorder
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS
 	beqz		t0, 1f
 	 PTR_SUBU	t0, STORSIZE		/* alignment in bytes */
@@ -111,6 +113,7 @@
 	 PTR_SUBU	t0, AT			/* alignment in bytes */
 	.set		at
 #endif
+	.set		reorder
 
 #ifndef CONFIG_CPU_MIPSR6
 	R10KCBARRIER(0(ra))
@@ -125,8 +128,10 @@
 #else /* CONFIG_CPU_MIPSR6 */
 #define STORE_BYTE(N)				\
 	EX(sb, a1, N(a0), .Lbyte_fixup\@);	\
+	.set		noreorder;		\
 	beqz		t0, 0f;			\
-	PTR_ADDU	t0, 1;
+	 PTR_ADDU	t0, 1;			\
+	.set		reorder;
 
 	PTR_ADDU	a2, t0			/* correct size */
 	PTR_ADDU	t0, 1
@@ -148,16 +153,14 @@
 #endif /* CONFIG_CPU_MIPSR6 */
 1:	ori		t1, a2, 0x3f		/* # of full blocks */
 	xori		t1, 0x3f
+	andi		t0, a2, 0x40-STORSIZE
 	beqz		t1, .Lmemset_partial\@	/* no block to fill */
-	 andi		t0, a2, 0x40-STORSIZE
 
 	PTR_ADDU	t1, a0			/* end address */
-	.set		reorder
 1:	PTR_ADDIU	a0, 64
 	R10KCBARRIER(0(ra))
 	f_fill64 a0, -64, FILL64RG, .Lfwd_fixup\@, \mode
 	bne		t1, a0, 1b
-	.set		noreorder
 
 .Lmemset_partial\@:
 	R10KCBARRIER(0(ra))
@@ -173,20 +176,18 @@
 	PTR_SUBU	t1, AT
 	.set		at
 #endif
+	PTR_ADDU	a0, t0			/* dest ptr */
 	jr		t1
-	 PTR_ADDU	a0, t0			/* dest ptr */
 
-	.set		push
-	.set		noreorder
-	.set		nomacro
 	/* ... but first do longs ... */
 	f_fill64 a0, -64, FILL64RG, .Lpartial_fixup\@, \mode
-2:	.set		pop
-	andi		a2, STORMASK		/* At most one long to go */
+2:	andi		a2, STORMASK		/* At most one long to go */
 
+	.set		noreorder
 	beqz		a2, 1f
 #ifndef CONFIG_CPU_MIPSR6
 	 PTR_ADDU	a0, a2			/* What's left */
+	.set		reorder
 	R10KCBARRIER(0(ra))
 #ifdef __MIPSEB__
 	EX(LONG_S_R, a1, -1(a0), .Llast_fixup\@)
@@ -195,6 +196,7 @@
 #endif
 #else
 	 PTR_SUBU	t0, $0, a2
+	.set		reorder
 	move		a2, zero		/* No remaining longs */
 	PTR_ADDIU	t0, 1
 	STORE_BYTE(0)
@@ -210,20 +212,22 @@
 #endif
 0:
 #endif
-1:	jr		ra
-	 move		a2, zero
+1:	move		a2, zero
+	jr		ra
 
 .Lsmall_memset\@:
+	PTR_ADDU	t1, a0, a2
 	beqz		a2, 2f
-	 PTR_ADDU	t1, a0, a2
 
 1:	PTR_ADDIU	a0, 1			/* fill bytewise */
 	R10KCBARRIER(0(ra))
+	.set		noreorder
 	bne		t1, a0, 1b
 	 EX(sb, a1, -1(a0), .Lsmall_fixup\@)
+	.set		reorder
 
-2:	jr		ra			/* done */
-	 move		a2, zero
+2:	move		a2, zero
+	jr		ra			/* done */
 	.if __memset == 1
 	END(memset)
 	.set __memset, 0
@@ -237,14 +241,13 @@
 	 *      a2     =             a2                -              t0                   + 1
 	 */
 	PTR_SUBU	a2, t0
+	PTR_ADDIU	a2, 1
 	jr		ra
-	 PTR_ADDIU	a2, 1
 #endif /* CONFIG_CPU_MIPSR6 */
 
 .Lfirst_fixup\@:
 	/* unset_bytes already in a2 */
 	jr	ra
-	 nop
 
 .Lfwd_fixup\@:
 	/*
@@ -255,8 +258,8 @@
 	andi		a2, 0x3f
 	LONG_L		t0, THREAD_BUADDR(t0)
 	LONG_ADDU	a2, t1
+	LONG_SUBU	a2, t0
 	jr		ra
-	 LONG_SUBU	a2, t0
 
 .Lpartial_fixup\@:
 	/*
@@ -267,24 +270,21 @@
 	andi		a2, STORMASK
 	LONG_L		t0, THREAD_BUADDR(t0)
 	LONG_ADDU	a2, a0
+	LONG_SUBU	a2, t0
 	jr		ra
-	 LONG_SUBU	a2, t0
 
 .Llast_fixup\@:
 	/* unset_bytes already in a2 */
 	jr		ra
-	 nop
 
 .Lsmall_fixup\@:
 	/*
 	 * unset_bytes = end_addr - current_addr + 1
 	 *      a2     =    t1    -      a0      + 1
 	 */
-	.set		reorder
 	PTR_SUBU	a2, t1, a0
 	PTR_ADDIU	a2, 1
 	jr		ra
-	.set		noreorder
 
 	.endm
 
@@ -298,8 +298,8 @@
 
 LEAF(memset)
 EXPORT_SYMBOL(memset)
+	move		v0, a0			/* result */
 	beqz		a1, 1f
-	 move		v0, a0			/* result */
 
 	andi		a1, 0xff		/* spread fillword */
 	LONG_SLL		t1, a1, 8

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] MIPS: memset: Fix `noreorder' issues
  2018-10-02 11:50 [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Maciej W. Rozycki
  2018-10-02 11:50 ` [PATCH 1/2] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Maciej W. Rozycki
  2018-10-02 11:50 ` [PATCH 2/2] MIPS: memset: Limit excessive `noreorder' assembly mode use Maciej W. Rozycki
@ 2018-10-11 16:36 ` Paul Burton
  2 siblings, 0 replies; 4+ messages in thread
From: Paul Burton @ 2018-10-11 16:36 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips, linux-kernel

Hi Maciej,

On Tue, Oct 02, 2018 at 12:50:00PM +0100, Maciej W. Rozycki wrote:
>  A recent change broke CPU_DADDI_WORKAROUNDS support in memset.S, due to a 
> delay-slot instruction expanding to multiple hardware operations for the 
> affected configurations.
> 
>  The underlying cause is the excessive use of the `noreorder' assembly 
> mode, while it is only needed in couple of places where either there is a 
> data dependency between a branch and its delay slot instruction, or there 
> is a section switch involved that would prevent automatic delay slot 
> scheduling.
> 
>  These changes address both problems and for clarity, not to mix multiple 
> conceptually separate changes and to make backporting easier I made them a 
> small patch series.  See individual change descriptions for details.
> 
>  This has been build-time and run-time verified with 32-bit and 64-bit 
> DECstation configurations, build-time verified with big-endian and 
> little-endian 64-bit SWARM configurations.  Build-time verification was 
> made by running `objdump -d arch/mips/lib/memset.o' with a pristine and 
> and a patched build to make sure there has been no change in machine code 
> generation, except for the delay-slot multiple instruction with the 64-bit 
> CPU_DADDI_WORKAROUNDS DECstation configuration.

Thanks - I applied patch 1 to mips-fixes & it's already been merged to
master.

Patch 2 applied to mips-next for 4.20.

Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-11 16:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02 11:50 [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Maciej W. Rozycki
2018-10-02 11:50 ` [PATCH 1/2] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Maciej W. Rozycki
2018-10-02 11:50 ` [PATCH 2/2] MIPS: memset: Limit excessive `noreorder' assembly mode use Maciej W. Rozycki
2018-10-11 16:36 ` [PATCH 0/2] MIPS: memset: Fix `noreorder' issues Paul Burton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.