All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 15:26 ` Maciej W. Rozycki
  0 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 15:26 UTC (permalink / raw)
  To: Ralf Baechle, James Hogan; +Cc: linux-mips

Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
architecturally truncates the value requested to 32 bits on 64-bit MIPS 
hardware regardless of whether the input operand is or is not a properly 
sign-extended 32-bit value.

Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
---
 Tested by compilation only to verify syntax correctnes as I do not know 
if this execution path is actually used by any configuration (suggestions 
welcome).  I believe it to be technically correct though, being 
sufficiently straightforward to verify by proofreading, and an obvious 
improvement.

 Therefore, please apply.

 NB if this turns out indeed used, then we might have to do something 
about DMFC0 hazard avoidance for the sake of MIPS III support, and also
choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.

  Maciej

---
 arch/mips/include/asm/mipsregs.h |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

linux-mips-read-64bit-c0-split-sll.diff
Index: linux-sfr-test/arch/mips/include/asm/mipsregs.h
===================================================================
--- linux-sfr-test.orig/arch/mips/include/asm/mipsregs.h	2017-07-08 15:32:02.000000000 +0100
+++ linux-sfr-test/arch/mips/include/asm/mipsregs.h	2017-09-29 01:02:01.390974000 +0100
@@ -1344,19 +1344,17 @@ do {									\
 	if (sel == 0)							\
 		__asm__ __volatile__(					\
 			".set\tmips64\n\t"				\
-			"dmfc0\t%M0, " #source "\n\t"			\
-			"dsll\t%L0, %M0, 32\n\t"			\
-			"dsra\t%M0, %M0, 32\n\t"			\
-			"dsra\t%L0, %L0, 32\n\t"			\
+			"dmfc0\t%L0, " #source "\n\t"			\
+			"dsra\t%M0, %L0, 32\n\t"			\
+			"sll\t%L0, %L0, 0\n\t"				\
 			".set\tmips0"					\
 			: "=r" (__val));				\
 	else								\
 		__asm__ __volatile__(					\
 			".set\tmips64\n\t"				\
-			"dmfc0\t%M0, " #source ", " #sel "\n\t"		\
-			"dsll\t%L0, %M0, 32\n\t"			\
-			"dsra\t%M0, %M0, 32\n\t"			\
-			"dsra\t%L0, %L0, 32\n\t"			\
+			"dmfc0\t%L0, " #source ", " #sel "\n\t"		\
+			"dsra\t%M0, %L0, 32\n\t"			\
+			"sll\t%L0, %L0, 0\n\t"				\
 			".set\tmips0"					\
 			: "=r" (__val));				\
 	local_irq_restore(__flags);					\

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 15:26 ` Maciej W. Rozycki
  0 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 15:26 UTC (permalink / raw)
  To: Ralf Baechle, James Hogan; +Cc: linux-mips

Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
architecturally truncates the value requested to 32 bits on 64-bit MIPS 
hardware regardless of whether the input operand is or is not a properly 
sign-extended 32-bit value.

Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
---
 Tested by compilation only to verify syntax correctnes as I do not know 
if this execution path is actually used by any configuration (suggestions 
welcome).  I believe it to be technically correct though, being 
sufficiently straightforward to verify by proofreading, and an obvious 
improvement.

 Therefore, please apply.

 NB if this turns out indeed used, then we might have to do something 
about DMFC0 hazard avoidance for the sake of MIPS III support, and also
choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.

  Maciej

---
 arch/mips/include/asm/mipsregs.h |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

linux-mips-read-64bit-c0-split-sll.diff
Index: linux-sfr-test/arch/mips/include/asm/mipsregs.h
===================================================================
--- linux-sfr-test.orig/arch/mips/include/asm/mipsregs.h	2017-07-08 15:32:02.000000000 +0100
+++ linux-sfr-test/arch/mips/include/asm/mipsregs.h	2017-09-29 01:02:01.390974000 +0100
@@ -1344,19 +1344,17 @@ do {									\
 	if (sel == 0)							\
 		__asm__ __volatile__(					\
 			".set\tmips64\n\t"				\
-			"dmfc0\t%M0, " #source "\n\t"			\
-			"dsll\t%L0, %M0, 32\n\t"			\
-			"dsra\t%M0, %M0, 32\n\t"			\
-			"dsra\t%L0, %L0, 32\n\t"			\
+			"dmfc0\t%L0, " #source "\n\t"			\
+			"dsra\t%M0, %L0, 32\n\t"			\
+			"sll\t%L0, %L0, 0\n\t"				\
 			".set\tmips0"					\
 			: "=r" (__val));				\
 	else								\
 		__asm__ __volatile__(					\
 			".set\tmips64\n\t"				\
-			"dmfc0\t%M0, " #source ", " #sel "\n\t"		\
-			"dsll\t%L0, %M0, 32\n\t"			\
-			"dsra\t%M0, %M0, 32\n\t"			\
-			"dsra\t%L0, %L0, 32\n\t"			\
+			"dmfc0\t%L0, " #source ", " #sel "\n\t"		\
+			"dsra\t%M0, %L0, 32\n\t"			\
+			"sll\t%L0, %L0, 0\n\t"				\
 			".set\tmips0"					\
 			: "=r" (__val));				\
 	local_irq_restore(__flags);					\

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 19:59   ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-09-29 19:59 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On Fri, Sep 29, 2017 at 04:26:31PM +0100, Maciej W. Rozycki wrote:
> Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
> observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
> architecturally truncates the value requested to 32 bits on 64-bit MIPS 
> hardware regardless of whether the input operand is or is not a properly 
> sign-extended 32-bit value.
> 
> Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
> ---
>  Tested by compilation only to verify syntax correctnes as I do not know 
> if this execution path is actually used by any configuration (suggestions 
> welcome).  I believe it to be technically correct though, being 
> sufficiently straightforward to verify by proofreading, and an obvious 
> improvement.

Agreed, I did something similar locally too but didn't bother to submit
it :).

> 
>  Therefore, please apply.
> 
>  NB if this turns out indeed used, then we might have to do something 
> about DMFC0 hazard avoidance for the sake of MIPS III support, and also
> choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.

This would have to depend on MVH bit, but in practice I suspect it isn't
worthwhile doing it here instead of in a separate macro to use depending
on the register.

Using MVH would have the advantage of avoiding the potential window when
a 32-bit EJTAG or Cache error handler potentially canonicalises register
state I suppose.

That's another advantage of this patch actually, it reduces the size of
that window to a single instruction.

Reviewed-by: James Hogan <james.hogan@imgtec.com>

Cheers
James

> 
>   Maciej
> 
> ---
>  arch/mips/include/asm/mipsregs.h |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> linux-mips-read-64bit-c0-split-sll.diff
> Index: linux-sfr-test/arch/mips/include/asm/mipsregs.h
> ===================================================================
> --- linux-sfr-test.orig/arch/mips/include/asm/mipsregs.h	2017-07-08 15:32:02.000000000 +0100
> +++ linux-sfr-test/arch/mips/include/asm/mipsregs.h	2017-09-29 01:02:01.390974000 +0100
> @@ -1344,19 +1344,17 @@ do {									\
>  	if (sel == 0)							\
>  		__asm__ __volatile__(					\
>  			".set\tmips64\n\t"				\
> -			"dmfc0\t%M0, " #source "\n\t"			\
> -			"dsll\t%L0, %M0, 32\n\t"			\
> -			"dsra\t%M0, %M0, 32\n\t"			\
> -			"dsra\t%L0, %L0, 32\n\t"			\
> +			"dmfc0\t%L0, " #source "\n\t"			\
> +			"dsra\t%M0, %L0, 32\n\t"			\
> +			"sll\t%L0, %L0, 0\n\t"				\
>  			".set\tmips0"					\
>  			: "=r" (__val));				\
>  	else								\
>  		__asm__ __volatile__(					\
>  			".set\tmips64\n\t"				\
> -			"dmfc0\t%M0, " #source ", " #sel "\n\t"		\
> -			"dsll\t%L0, %M0, 32\n\t"			\
> -			"dsra\t%M0, %M0, 32\n\t"			\
> -			"dsra\t%L0, %L0, 32\n\t"			\
> +			"dmfc0\t%L0, " #source ", " #sel "\n\t"		\
> +			"dsra\t%M0, %L0, 32\n\t"			\
> +			"sll\t%L0, %L0, 0\n\t"				\
>  			".set\tmips0"					\
>  			: "=r" (__val));				\
>  	local_irq_restore(__flags);					\

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 19:59   ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-09-29 19:59 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On Fri, Sep 29, 2017 at 04:26:31PM +0100, Maciej W. Rozycki wrote:
> Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
> observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
> architecturally truncates the value requested to 32 bits on 64-bit MIPS 
> hardware regardless of whether the input operand is or is not a properly 
> sign-extended 32-bit value.
> 
> Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
> ---
>  Tested by compilation only to verify syntax correctnes as I do not know 
> if this execution path is actually used by any configuration (suggestions 
> welcome).  I believe it to be technically correct though, being 
> sufficiently straightforward to verify by proofreading, and an obvious 
> improvement.

Agreed, I did something similar locally too but didn't bother to submit
it :).

> 
>  Therefore, please apply.
> 
>  NB if this turns out indeed used, then we might have to do something 
> about DMFC0 hazard avoidance for the sake of MIPS III support, and also
> choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.

This would have to depend on MVH bit, but in practice I suspect it isn't
worthwhile doing it here instead of in a separate macro to use depending
on the register.

Using MVH would have the advantage of avoiding the potential window when
a 32-bit EJTAG or Cache error handler potentially canonicalises register
state I suppose.

That's another advantage of this patch actually, it reduces the size of
that window to a single instruction.

Reviewed-by: James Hogan <james.hogan@imgtec.com>

Cheers
James

> 
>   Maciej
> 
> ---
>  arch/mips/include/asm/mipsregs.h |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> linux-mips-read-64bit-c0-split-sll.diff
> Index: linux-sfr-test/arch/mips/include/asm/mipsregs.h
> ===================================================================
> --- linux-sfr-test.orig/arch/mips/include/asm/mipsregs.h	2017-07-08 15:32:02.000000000 +0100
> +++ linux-sfr-test/arch/mips/include/asm/mipsregs.h	2017-09-29 01:02:01.390974000 +0100
> @@ -1344,19 +1344,17 @@ do {									\
>  	if (sel == 0)							\
>  		__asm__ __volatile__(					\
>  			".set\tmips64\n\t"				\
> -			"dmfc0\t%M0, " #source "\n\t"			\
> -			"dsll\t%L0, %M0, 32\n\t"			\
> -			"dsra\t%M0, %M0, 32\n\t"			\
> -			"dsra\t%L0, %L0, 32\n\t"			\
> +			"dmfc0\t%L0, " #source "\n\t"			\
> +			"dsra\t%M0, %L0, 32\n\t"			\
> +			"sll\t%L0, %L0, 0\n\t"				\
>  			".set\tmips0"					\
>  			: "=r" (__val));				\
>  	else								\
>  		__asm__ __volatile__(					\
>  			".set\tmips64\n\t"				\
> -			"dmfc0\t%M0, " #source ", " #sel "\n\t"		\
> -			"dsll\t%L0, %M0, 32\n\t"			\
> -			"dsra\t%M0, %M0, 32\n\t"			\
> -			"dsra\t%L0, %L0, 32\n\t"			\
> +			"dmfc0\t%L0, " #source ", " #sel "\n\t"		\
> +			"dsra\t%M0, %L0, 32\n\t"			\
> +			"sll\t%L0, %L0, 0\n\t"				\
>  			".set\tmips0"					\
>  			: "=r" (__val));				\
>  	local_irq_restore(__flags);					\

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 22:39     ` Maciej W. Rozycki
  0 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 22:39 UTC (permalink / raw)
  To: James Hogan; +Cc: Ralf Baechle, linux-mips

On Fri, 29 Sep 2017, James Hogan wrote:

> >  NB if this turns out indeed used, then we might have to do something 
> > about DMFC0 hazard avoidance for the sake of MIPS III support, and also
> > choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.
> 
> This would have to depend on MVH bit, but in practice I suspect it isn't
> worthwhile doing it here instead of in a separate macro to use depending
> on the register.

 Boot-time patching would be more appropriate IMO.

> Using MVH would have the advantage of avoiding the potential window when
> a 32-bit EJTAG or Cache error handler potentially canonicalises register
> state I suppose.
> 
> That's another advantage of this patch actually, it reduces the size of
> that window to a single instruction.

 That still sounds scary and justifies the use of MFHC0 (and MTHC0 in 
`__write_64bit_c0_split') where available more than just the minuscule 
performance improvement.

 Thanks for your review.

  Maciej

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-09-29 22:39     ` Maciej W. Rozycki
  0 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 22:39 UTC (permalink / raw)
  To: James Hogan; +Cc: Ralf Baechle, linux-mips

On Fri, 29 Sep 2017, James Hogan wrote:

> >  NB if this turns out indeed used, then we might have to do something 
> > about DMFC0 hazard avoidance for the sake of MIPS III support, and also
> > choose to use an MFC0/MFHC0 instruction pair instead on MIPS64r5+.
> 
> This would have to depend on MVH bit, but in practice I suspect it isn't
> worthwhile doing it here instead of in a separate macro to use depending
> on the register.

 Boot-time patching would be more appropriate IMO.

> Using MVH would have the advantage of avoiding the potential window when
> a 32-bit EJTAG or Cache error handler potentially canonicalises register
> state I suppose.
> 
> That's another advantage of this patch actually, it reduces the size of
> that window to a single instruction.

 That still sounds scary and justifies the use of MFHC0 (and MTHC0 in 
`__write_64bit_c0_split') where available more than just the minuscule 
performance improvement.

 Thanks for your review.

  Maciej

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-11-08 22:06   ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-11-08 22:06 UTC (permalink / raw)
  To: Maciej W. Rozycki ; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Fri, Sep 29, 2017 at 04:26:31PM +0100, Maciej W. Rozycki wrote:
> Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
> observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
> architecturally truncates the value requested to 32 bits on 64-bit MIPS 
> hardware regardless of whether the input operand is or is not a properly 
> sign-extended 32-bit value.
> 
> Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
> ---
>  Tested by compilation only to verify syntax correctnes as I do not know 
> if this execution path is actually used by any configuration (suggestions 
> welcome).  I believe it to be technically correct though, being 
> sufficiently straightforward to verify by proofreading, and an obvious 
> improvement.
> 
>  Therefore, please apply.

Thanks, Applied for 4.15.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split'
@ 2017-11-08 22:06   ` James Hogan
  0 siblings, 0 replies; 8+ messages in thread
From: James Hogan @ 2017-11-08 22:06 UTC (permalink / raw)
  To: Maciej W. Rozycki ; +Cc: Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Fri, Sep 29, 2017 at 04:26:31PM +0100, Maciej W. Rozycki wrote:
> Optimize `__read_64bit_c0_split' and reduce the instruction count by 1, 
> observing that a DSLL/DSRA pair by 32, is equivalent to SLL by 0, which 
> architecturally truncates the value requested to 32 bits on 64-bit MIPS 
> hardware regardless of whether the input operand is or is not a properly 
> sign-extended 32-bit value.
> 
> Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
> ---
>  Tested by compilation only to verify syntax correctnes as I do not know 
> if this execution path is actually used by any configuration (suggestions 
> welcome).  I believe it to be technically correct though, being 
> sufficiently straightforward to verify by proofreading, and an obvious 
> improvement.
> 
>  Therefore, please apply.

Thanks, Applied for 4.15.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-08 22:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-29 15:26 [PATCH] MIPS: Use SLL by 0 for 32-bit truncation in `__read_64bit_c0_split' Maciej W. Rozycki
2017-09-29 15:26 ` Maciej W. Rozycki
2017-09-29 19:59 ` James Hogan
2017-09-29 19:59   ` James Hogan
2017-09-29 22:39   ` Maciej W. Rozycki
2017-09-29 22:39     ` Maciej W. Rozycki
2017-11-08 22:06 ` James Hogan
2017-11-08 22:06   ` James Hogan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.