qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions
@ 2019-10-04 13:43 Stefan Brankovic
  2019-10-04 13:43 ` Stefan Brankovic
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Brankovic @ 2019-10-04 13:43 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: richard.hendreson, mark.cave-ayland, pc, amarkovic,
	stefan.brankovic, alex.bennee, david

This patch fixes bug in optimized vsl/vsr instructions reported by Mark
Cave-Ayland and Paul Clarke. Sorry for not responding earlier, I was absent
last couple of days. I also integrated some suggestions made by Aleksandar
Markovic. New soultion is tested and still has noticable performance
improvement compared to old helper implementation.

V1 of this patch was not sent to qemu-devel and I am now sending V2 to
appropriate email adresses.

Stefan Brankovic (1):
  target/ppc: Fix for optimized vsl/vsr instructions

 target/ppc/translate/vmx-impl.inc.c | 84 ++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 44 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions
  2019-10-04 13:43 [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions Stefan Brankovic
@ 2019-10-04 13:43 ` Stefan Brankovic
  2019-10-04 18:45   ` Aleksandar Markovic
  2019-10-04 18:52   ` Paul Clarke
  0 siblings, 2 replies; 5+ messages in thread
From: Stefan Brankovic @ 2019-10-04 13:43 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: richard.hendreson, mark.cave-ayland, pc, amarkovic,
	stefan.brankovic, alex.bennee, david

In previous implementation, invocation of TCG shift function could request
shift of TCG variable by 64 bits when variable 'sh' is 0, which is not
supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes
this by using two separate invocation of TCG shift functions, with maximum
shift amount of 32.

Name of variable 'shifted' is changed to 'carry' so variable naming
is similar to old helper implementation.

Variables 'avrA' and 'avrB' are replaced with variable 'avr'.

Fixes: 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
Reported-by: Paul Clark <pc@us.ibm.com>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Suggested-by: Aleksandar Markovic <aleksandar.markovic@rt-rk.com>
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
---
 target/ppc/translate/vmx-impl.inc.c | 84 ++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 44 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 2472a52..81d5a7a 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -590,40 +590,38 @@ static void trans_vsl(DisasContext *ctx)
     int VT = rD(ctx->opcode);
     int VA = rA(ctx->opcode);
     int VB = rB(ctx->opcode);
-    TCGv_i64 avrA = tcg_temp_new_i64();
-    TCGv_i64 avrB = tcg_temp_new_i64();
+    TCGv_i64 avr = tcg_temp_new_i64();
     TCGv_i64 sh = tcg_temp_new_i64();
-    TCGv_i64 shifted = tcg_temp_new_i64();
+    TCGv_i64 carry = tcg_temp_new_i64();
     TCGv_i64 tmp = tcg_temp_new_i64();
 
-    /* Place bits 125-127 of vB in sh. */
-    get_avr64(avrB, VB, false);
-    tcg_gen_andi_i64(sh, avrB, 0x07ULL);
+    /* Place bits 125-127 of vB in 'sh'. */
+    get_avr64(avr, VB, false);
+    tcg_gen_andi_i64(sh, avr, 0x07ULL);
 
     /*
-     * Save highest sh bits of lower doubleword element of vA in variable
-     * shifted and perform shift on lower doubleword.
+     * Save highest 'sh' bits of lower doubleword element of vA in variable
+     * 'carry' and perform shift on lower doubleword.
      */
-    get_avr64(avrA, VA, false);
-    tcg_gen_subfi_i64(tmp, 64, sh);
-    tcg_gen_shr_i64(shifted, avrA, tmp);
-    tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
-    tcg_gen_shl_i64(avrA, avrA, sh);
-    set_avr64(VT, avrA, false);
+    get_avr64(avr, VA, false);
+    tcg_gen_subfi_i64(tmp, 32, sh);
+    tcg_gen_shri_i64(carry, avr, 32);
+    tcg_gen_shr_i64(carry, carry, tmp);
+    tcg_gen_shl_i64(avr, avr, sh);
+    set_avr64(VT, avr, false);
 
     /*
      * Perform shift on higher doubleword element of vA and replace lowest
-     * sh bits with shifted.
+     * 'sh' bits with 'carry'.
      */
-    get_avr64(avrA, VA, true);
-    tcg_gen_shl_i64(avrA, avrA, sh);
-    tcg_gen_or_i64(avrA, avrA, shifted);
-    set_avr64(VT, avrA, true);
+    get_avr64(avr, VA, true);
+    tcg_gen_shl_i64(avr, avr, sh);
+    tcg_gen_or_i64(avr, avr, carry);
+    set_avr64(VT, avr, true);
 
-    tcg_temp_free_i64(avrA);
-    tcg_temp_free_i64(avrB);
+    tcg_temp_free_i64(avr);
     tcg_temp_free_i64(sh);
-    tcg_temp_free_i64(shifted);
+    tcg_temp_free_i64(carry);
     tcg_temp_free_i64(tmp);
 }
 
@@ -639,39 +637,37 @@ static void trans_vsr(DisasContext *ctx)
     int VT = rD(ctx->opcode);
     int VA = rA(ctx->opcode);
     int VB = rB(ctx->opcode);
-    TCGv_i64 avrA = tcg_temp_new_i64();
-    TCGv_i64 avrB = tcg_temp_new_i64();
+    TCGv_i64 avr = tcg_temp_new_i64();
     TCGv_i64 sh = tcg_temp_new_i64();
-    TCGv_i64 shifted = tcg_temp_new_i64();
+    TCGv_i64 carry = tcg_temp_new_i64();
     TCGv_i64 tmp = tcg_temp_new_i64();
 
-    /* Place bits 125-127 of vB in sh. */
-    get_avr64(avrB, VB, false);
-    tcg_gen_andi_i64(sh, avrB, 0x07ULL);
+    /* Place bits 125-127 of vB in 'sh'. */
+    get_avr64(avr, VB, false);
+    tcg_gen_andi_i64(sh, avr, 0x07ULL);
 
     /*
-     * Save lowest sh bits of higher doubleword element of vA in variable
-     * shifted and perform shift on higher doubleword.
+     * Save lowest 'sh' bits of higher doubleword element of vA in variable
+     * 'carry' and perform shift on higher doubleword.
      */
-    get_avr64(avrA, VA, true);
-    tcg_gen_subfi_i64(tmp, 64, sh);
-    tcg_gen_shl_i64(shifted, avrA, tmp);
-    tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
-    tcg_gen_shr_i64(avrA, avrA, sh);
-    set_avr64(VT, avrA, true);
+    get_avr64(avr, VA, true);
+    tcg_gen_subfi_i64(tmp, 32, sh);
+    tcg_gen_shli_i64(carry, avr, 32);
+    tcg_gen_shl_i64(carry, carry, tmp);
+    tcg_gen_shr_i64(avr, avr, sh);
+    set_avr64(VT, avr, true);
     /*
      * Perform shift on lower doubleword element of vA and replace highest
-     * sh bits with shifted.
+     * 'sh' bits with 'carry'.
      */
-    get_avr64(avrA, VA, false);
-    tcg_gen_shr_i64(avrA, avrA, sh);
-    tcg_gen_or_i64(avrA, avrA, shifted);
-    set_avr64(VT, avrA, false);
+    get_avr64(avr, VA, false);
+    tcg_gen_shr_i64(avr, avr, sh);
+    tcg_gen_or_i64(avr, avr, carry);
+    set_avr64(VT, avr, false);
 
-    tcg_temp_free_i64(avrA);
-    tcg_temp_free_i64(avrB);
+    tcg_temp_free_i64(avr);
     tcg_temp_free_i64(sh);
-    tcg_temp_free_i64(shifted);
+    tcg_temp_free_i64(carry);
     tcg_temp_free_i64(tmp);
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions
  2019-10-04 13:43 ` Stefan Brankovic
@ 2019-10-04 18:45   ` Aleksandar Markovic
  2019-10-04 18:52   ` Paul Clarke
  1 sibling, 0 replies; 5+ messages in thread
From: Aleksandar Markovic @ 2019-10-04 18:45 UTC (permalink / raw)
  To: Stefan Brankovic
  Cc: richard.hendreson, Mark Cave-Ayland, QEMU Developers,
	open list:ppc4xx, Paul A. Clarke, Aleksandar Markovic,
	Alex Bennée, David Gibson

> Reported-by: Paul Clark <pc@us.ibm.com>

Stefan,

Paul's full name is Paul A. Clarke.

Thanks for the fix!

Aleksandar


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:  [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions
  2019-10-04 13:43 ` Stefan Brankovic
  2019-10-04 18:45   ` Aleksandar Markovic
@ 2019-10-04 18:52   ` Paul Clarke
  2019-10-06  0:48     ` David Gibson
  1 sibling, 1 reply; 5+ messages in thread
From: Paul Clarke @ 2019-10-04 18:52 UTC (permalink / raw)
  To: Stefan Brankovic, qemu-devel, qemu-ppc
  Cc: alex.bennee, richard.hendreson, mark.cave-ayland, amarkovic, david

On 10/4/19 8:43 AM, Stefan Brankovic wrote:
> In previous implementation, invocation of TCG shift function could request
> shift of TCG variable by 64 bits when variable 'sh' is 0, which is not
> supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes
> this by using two separate invocation of TCG shift functions, with maximum
> shift amount of 32.
> 
> Name of variable 'shifted' is changed to 'carry' so variable naming
> is similar to old helper implementation.
> 
> Variables 'avrA' and 'avrB' are replaced with variable 'avr'.
> 
> Fixes: 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
> Reported-by: Paul Clark <pc@us.ibm.com>

Preferred: "Paul A. Clarke" (for historical consistency)

> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Suggested-by: Aleksandar Markovic <aleksandar.markovic@rt-rk.com>
> Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>

Applying this patch on top of dce5a787c05fe1a3e54d92871cdeba2af6798e0d eliminated the failures that I reported in https://bugs.launchpad.net/qemu/+bug/1841990 associated with vsl/vsr.

Tested-by: Paul A. Clarke  <pc@us.ibm.com>

PC


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions
  2019-10-04 18:52   ` Paul Clarke
@ 2019-10-06  0:48     ` David Gibson
  0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2019-10-06  0:48 UTC (permalink / raw)
  To: Paul Clarke
  Cc: richard.hendreson, mark.cave-ayland, qemu-devel, qemu-ppc,
	amarkovic, Stefan Brankovic, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

On Fri, Oct 04, 2019 at 01:52:41PM -0500, Paul Clarke wrote:
> On 10/4/19 8:43 AM, Stefan Brankovic wrote:
> > In previous implementation, invocation of TCG shift function could request
> > shift of TCG variable by 64 bits when variable 'sh' is 0, which is not
> > supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes
> > this by using two separate invocation of TCG shift functions, with maximum
> > shift amount of 32.
> > 
> > Name of variable 'shifted' is changed to 'carry' so variable naming
> > is similar to old helper implementation.
> > 
> > Variables 'avrA' and 'avrB' are replaced with variable 'avr'.
> > 
> > Fixes: 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
> > Reported-by: Paul Clark <pc@us.ibm.com>
> 
> Preferred: "Paul A. Clarke" (for historical consistency)
> 
> > Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> > Suggested-by: Aleksandar Markovic <aleksandar.markovic@rt-rk.com>
> > Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
> 
> Applying this patch on top of dce5a787c05fe1a3e54d92871cdeba2af6798e0d eliminated the failures that I reported in https://bugs.launchpad.net/qemu/+bug/1841990 associated with vsl/vsr.
> 
> Tested-by: Paul A. Clarke  <pc@us.ibm.com>

I've applied this to ppc-for-4.2, adjusting the Reported-by line as
suggested above.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-06  0:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 13:43 [PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions Stefan Brankovic
2019-10-04 13:43 ` Stefan Brankovic
2019-10-04 18:45   ` Aleksandar Markovic
2019-10-04 18:52   ` Paul Clarke
2019-10-06  0:48     ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).