All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/14] VSX Stage 4
@ 2013-11-06 20:31 Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag Tom Musta
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This is the fourth and final series of patches that add emulation support
to QEMU for the PowerPC Vector Scalar Extension (VSX).

This series adds the instructions that were newly introduced with Power ISA
V2.07.  This includes 3 scalar load instructions, 2 scalar store instructions,
7 standard single precision scalar arithmetic instructions, 8 scalar single
precision fused multiply/add instructions, two integer-to-single-precision
conversion instructions and 3 vector logical instructions.

The single-precision scalar arithmetic instructions all interpret the most
significant 64 bits of a VSR as a single precision floating point number
stored in double precision format (similar to the standard PowerPC floating
point single precision instructions).  Thus a common theme in the supporting
code is rounding of an intermediate double-precision number to single 
precision.

Tom Musta (14):
  VSX Stage 4: Add VSX 2.07 Flag
  VSX Stage 4: Refactor lxsdx
  VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx
  VSX Stage 4: Refactor stxsdx
  VSX Stage 4: Add stxsiwx and stxsspx
  VSX Stage 4: Add xsaddsp and xssubsp
  VSX Stage 4: Add xsmulsp
  VSX Stage 4: Add xsdivsp
  VSX Stage 4: Add xsresp
  VSX Stage 4: Add xssqrtsp
  VSX Stage 4: add xsrsqrtesp
  VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  VSX Stage 4: Add xscvsxdsp and xscvuxdsp
  VSX Stage 4: Add xxleqv, xxlnand and xxlorc

 target-ppc/cpu.h            |    4 +-
 target-ppc/fpu_helper.c     |  191 ++++++++++++++++++++++++++++---------------
 target-ppc/helper.h         |   18 ++++
 target-ppc/translate.c      |  110 +++++++++++++++++++------
 target-ppc/translate_init.c |    2 +-
 5 files changed, 232 insertions(+), 93 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 02/14] VSX Stage 4: Refactor lxsdx Tom Musta
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds a flag to identify those VSX instructions that are
new to Power ISA V2.07.  The flag is added to the Power 8 processor
initialization so that the P8 models understand how to decode and
emulate instructions in this category.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/cpu.h            |    4 +++-
 target-ppc/translate_init.c |    2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index bb84767..0abc848 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1875,9 +1875,11 @@ enum {
     PPC2_DBRX          = 0x0000000000000010ULL,
     /* Book I 2.05 PowerPC specification                                     */
     PPC2_ISA205        = 0x0000000000000020ULL,
+    /* VSX additions in ISA 2.07                                             */
+    PPC2_VSX207        = 0x0000000000000040ULL,
 
 #define PPC_TCG_INSNS2 (PPC2_BOOKE206 | PPC2_VSX | PPC2_PRCNTL | PPC2_DBRX | \
-  PPC2_ISA205)
+                        PPC2_ISA205 | PPC2_VSX207)
 };
 
 /*****************************************************************************/
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 13457ec..e14ab63 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7270,7 +7270,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
                        PPC_64B | PPC_ALTIVEC |
                        PPC_SEGMENT_64B | PPC_SLBI |
                        PPC_POPCNTB | PPC_POPCNTWD;
-    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
+    pcc->insns_flags2 = PPC2_VSX | PPC2_VSX207 | PPC2_DFP | PPC2_DBRX;
     pcc->msr_mask = 0x800000000284FF36ULL;
     pcc->mmu_model = POWERPC_MMU_2_06;
 #if defined(CONFIG_SOFTMMU)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 02/14] VSX Stage 4: Refactor lxsdx
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 03/14] VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx Tom Musta
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch refactors the lxsdx generator. Resuable code is isolated
into a macro.  The macro will be used in subsequent patches in this
series to implement other scalar load instructions.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/translate.c |   31 +++++++++++++++++--------------
 1 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 52d7165..2541b5f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7006,20 +7006,23 @@ static inline TCGv_i64 cpu_vsrl(int n)
     }
 }
 
-static void gen_lxsdx(DisasContext *ctx)
-{
-    TCGv EA;
-    if (unlikely(!ctx->vsx_enabled)) {
-        gen_exception(ctx, POWERPC_EXCP_VSXU);
-        return;
-    }
-    gen_set_access_type(ctx, ACCESS_INT);
-    EA = tcg_temp_new();
-    gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
-    /* NOTE: cpu_vsrl is undefined */
-    tcg_temp_free(EA);
-}
+#define VSX_LOAD_SCALAR(name, operation)                      \
+static void gen_##name(DisasContext *ctx)                     \
+{                                                             \
+    TCGv EA;                                                  \
+    if (unlikely(!ctx->vsx_enabled)) {                        \
+        gen_exception(ctx, POWERPC_EXCP_VSXU);                \
+        return;                                               \
+    }                                                         \
+    gen_set_access_type(ctx, ACCESS_INT);                     \
+    EA = tcg_temp_new();                                      \
+    gen_addr_reg_index(ctx, EA);                              \
+    gen_qemu_##operation(ctx, cpu_vsrh(xT(ctx->opcode)), EA); \
+    /* NOTE: cpu_vsrl is undefined */                         \
+    tcg_temp_free(EA);                                        \
+}
+
+VSX_LOAD_SCALAR(lxsdx, ld64)
 
 static void gen_lxvd2x(DisasContext *ctx)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 03/14] VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 02/14] VSX Stage 4: Refactor lxsdx Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 04/14] VSX Stage 4: Refactor stxsdx Tom Musta
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the scalar load instructions introduced in ISA
V2.07:

  - Load VSX Scalar as Integer Word Algebraic Indexd (lxsiwax)
  - Load VSX Scalar as Integer Word and Zero Indexed (lxsiwzx)
  - Load VSX Scalar Single-Precision Indexed (lxsspx)

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/translate.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2541b5f..ad40d27 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7023,6 +7023,9 @@ static void gen_##name(DisasContext *ctx)                     \
 }
 
 VSX_LOAD_SCALAR(lxsdx, ld64)
+VSX_LOAD_SCALAR(lxsiwax, ld32s)
+VSX_LOAD_SCALAR(lxsiwzx, ld32u)
+VSX_LOAD_SCALAR(lxsspx, ld32fs)
 
 static void gen_lxvd2x(DisasContext *ctx)
 {
@@ -10036,6 +10039,9 @@ GEN_VAFORM_PAIRED(vsel, vperm, 21),
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23),
 
 GEN_HANDLER_E(lxsdx, 0x1F, 0x0C, 0x12, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(lxsiwax, 0x1F, 0x0C, 0x02, 0, PPC_NONE, PPC2_VSX207),
+GEN_HANDLER_E(lxsiwzx, 0x1F, 0x0C, 0x00, 0, PPC_NONE, PPC2_VSX207),
+GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 04/14] VSX Stage 4: Refactor stxsdx
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (2 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 03/14] VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx Tom Musta
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch refactors the stxsdx instruction.  Reusable code is
extracted into a macro which will be used in subsequent patches
in this series.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/translate.c |   27 +++++++++++++++------------
 1 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ad40d27..52e487d 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7086,20 +7086,23 @@ static void gen_lxvw4x(DisasContext *ctx)
     tcg_temp_free(tmp);
 }
 
-static void gen_stxsdx(DisasContext *ctx)
-{
-    TCGv EA;
-    if (unlikely(!ctx->vsx_enabled)) {
-        gen_exception(ctx, POWERPC_EXCP_VSXU);
-        return;
-    }
-    gen_set_access_type(ctx, ACCESS_INT);
-    EA = tcg_temp_new();
-    gen_addr_reg_index(ctx, EA);
-    gen_qemu_st64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
-    tcg_temp_free(EA);
+#define VSX_STORE_SCALAR(name, operation)                     \
+static void gen_##name(DisasContext *ctx)                     \
+{                                                             \
+    TCGv EA;                                                  \
+    if (unlikely(!ctx->vsx_enabled)) {                        \
+        gen_exception(ctx, POWERPC_EXCP_VSXU);                \
+        return;                                               \
+    }                                                         \
+    gen_set_access_type(ctx, ACCESS_INT);                     \
+    EA = tcg_temp_new();                                      \
+    gen_addr_reg_index(ctx, EA);                              \
+    gen_qemu_##operation(ctx, cpu_vsrh(xS(ctx->opcode)), EA); \
+    tcg_temp_free(EA);                                        \
 }
 
+VSX_STORE_SCALAR(stxsdx, st64)
+
 static void gen_stxvd2x(DisasContext *ctx)
 {
     TCGv EA;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (3 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 04/14] VSX Stage 4: Refactor stxsdx Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp Tom Musta
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds two store scalar instructions:

  - Store VSX Scalar as Integer Word Indexed (stxsiwx)
  - Store VSX Scalar Single-Precision Indexed (stxsspx)

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/translate.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 52e487d..62604fd 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7102,6 +7102,8 @@ static void gen_##name(DisasContext *ctx)                     \
 }
 
 VSX_STORE_SCALAR(stxsdx, st64)
+VSX_STORE_SCALAR(stxsiwx, st32)
+VSX_STORE_SCALAR(stxsspx, st32fs)
 
 static void gen_stxvd2x(DisasContext *ctx)
 {
@@ -10050,6 +10052,8 @@ GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
+GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (4 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp Tom Musta
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Add Single-Precision (xsaddsp) and
VSX Scalar Subtract Single-Precision (xssubsp) instructions.

The existing VSX_ADD_SUB macro is modified to support the rounding
of the (intermediate) result to single-precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   21 ++++++++++++++-------
 target-ppc/helper.h     |    3 +++
 target-ppc/translate.c  |    6 ++++++
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index f3d02cc..7b4958a 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1768,7 +1768,7 @@ static void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_ADD_SUB(name, op, nels, tp, fld, sfprf)                          \
+#define VSX_ADD_SUB(name, op, nels, tp, fld, sfprf, r2sp)                    \
 void helper_##name(CPUPPCState *env, uint32_t opcode)                        \
 {                                                                            \
     ppc_vsr_t xt, xa, xb;                                                    \
@@ -1794,6 +1794,11 @@ void helper_##name(CPUPPCState *env, uint32_t opcode)                        \
             }                                                                \
         }                                                                    \
                                                                              \
+        if (r2sp) {                                                          \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);  \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);          \
+        }                                                                    \
+                                                                             \
         if (sfprf) {                                                         \
             helper_compute_fprf(env, xt.fld[i], sfprf);                      \
         }                                                                    \
@@ -1802,12 +1807,14 @@ void helper_##name(CPUPPCState *env, uint32_t opcode)                        \
     helper_float_check_status(env);                                          \
 }
 
-VSX_ADD_SUB(xsadddp, add, 1, float64, f64, 1)
-VSX_ADD_SUB(xvadddp, add, 2, float64, f64, 0)
-VSX_ADD_SUB(xvaddsp, add, 4, float32, f32, 0)
-VSX_ADD_SUB(xssubdp, sub, 1, float64, f64, 1)
-VSX_ADD_SUB(xvsubdp, sub, 2, float64, f64, 0)
-VSX_ADD_SUB(xvsubsp, sub, 4, float32, f32, 0)
+VSX_ADD_SUB(xsadddp, add, 1, float64, f64, 1, 0)
+VSX_ADD_SUB(xsaddsp, add, 1, float64, f64, 1, 1)
+VSX_ADD_SUB(xvadddp, add, 2, float64, f64, 0, 0)
+VSX_ADD_SUB(xvaddsp, add, 4, float32, f32, 0, 0)
+VSX_ADD_SUB(xssubdp, sub, 1, float64, f64, 1, 0)
+VSX_ADD_SUB(xssubsp, sub, 1, float64, f64, 1, 1)
+VSX_ADD_SUB(xvsubdp, sub, 2, float64, f64, 0, 0)
+VSX_ADD_SUB(xvsubsp, sub, 4, float32, f32, 0, 0)
 
 /* VSX_MUL - VSX floating point multiply
  *   op    - instruction mnemonic
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 0276b02..696b9d3 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -286,6 +286,9 @@ DEF_HELPER_2(xsrdpim, void, env, i32)
 DEF_HELPER_2(xsrdpip, void, env, i32)
 DEF_HELPER_2(xsrdpiz, void, env, i32)
 
+DEF_HELPER_2(xsaddsp, void, env, i32)
+DEF_HELPER_2(xssubsp, void, env, i32)
+
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
 DEF_HELPER_2(xvmuldp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 62604fd..bd639cc 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7341,6 +7341,9 @@ GEN_VSX_HELPER_2(xsrdpim, 0x12, 0x07, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsrdpip, 0x12, 0x06, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsrdpiz, 0x12, 0x05, 0, PPC2_VSX)
 
+GEN_VSX_HELPER_2(xsaddsp, 0x00, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xssubsp, 0x00, 0x01, 0, PPC2_VSX207)
+
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvmuldp, 0x00, 0x0E, 0, PPC2_VSX)
@@ -10148,6 +10151,9 @@ GEN_XX2FORM(xsrdpim, 0x12, 0x07, PPC2_VSX),
 GEN_XX2FORM(xsrdpip, 0x12, 0x06, PPC2_VSX),
 GEN_XX2FORM(xsrdpiz, 0x12, 0x05, PPC2_VSX),
 
+GEN_XX3FORM(xsaddsp, 0x00, 0x00, PPC2_VSX207),
+GEN_XX3FORM(xssubsp, 0x00, 0x01, PPC2_VSX207),
+
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
 GEN_XX3FORM(xvmuldp, 0x00, 0x0E, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (5 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp Tom Musta
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Multiply Single-Precision (xsmulsp)
instruction.

The existing VSX_MUL macro is modified to support rounding of the
intermediate result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   14 ++++++++++----
 target-ppc/helper.h     |    1 +
 target-ppc/translate.c  |    2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 7b4958a..e5127b2 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1823,7 +1823,7 @@ VSX_ADD_SUB(xvsubsp, sub, 4, float32, f32, 0, 0)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_MUL(op, nels, tp, fld, sfprf)                                    \
+#define VSX_MUL(op, nels, tp, fld, sfprf, r2sp)                              \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
 {                                                                            \
     ppc_vsr_t xt, xa, xb;                                                    \
@@ -1850,6 +1850,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
             }                                                                \
         }                                                                    \
                                                                              \
+        if (r2sp) {                                                          \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);  \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);          \
+        }                                                                    \
+                                                                             \
         if (sfprf) {                                                         \
             helper_compute_fprf(env, xt.fld[i], sfprf);                      \
         }                                                                    \
@@ -1859,9 +1864,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
     helper_float_check_status(env);                                          \
 }
 
-VSX_MUL(xsmuldp, 1, float64, f64, 1)
-VSX_MUL(xvmuldp, 2, float64, f64, 0)
-VSX_MUL(xvmulsp, 4, float32, f32, 0)
+VSX_MUL(xsmuldp, 1, float64, f64, 1, 0)
+VSX_MUL(xsmulsp, 1, float64, f64, 1, 1)
+VSX_MUL(xvmuldp, 2, float64, f64, 0, 0)
+VSX_MUL(xvmulsp, 4, float32, f32, 0, 0)
 
 /* VSX_DIV - VSX floating point divide
  *   op    - instruction mnemonic
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 696b9d3..0ccdc96 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -288,6 +288,7 @@ DEF_HELPER_2(xsrdpiz, void, env, i32)
 
 DEF_HELPER_2(xsaddsp, void, env, i32)
 DEF_HELPER_2(xssubsp, void, env, i32)
+DEF_HELPER_2(xsmulsp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index bd639cc..450ab88 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7343,6 +7343,7 @@ GEN_VSX_HELPER_2(xsrdpiz, 0x12, 0x05, 0, PPC2_VSX)
 
 GEN_VSX_HELPER_2(xsaddsp, 0x00, 0x00, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssubsp, 0x00, 0x01, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmulsp, 0x00, 0x02, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10153,6 +10154,7 @@ GEN_XX2FORM(xsrdpiz, 0x12, 0x05, PPC2_VSX),
 
 GEN_XX3FORM(xsaddsp, 0x00, 0x00, PPC2_VSX207),
 GEN_XX3FORM(xssubsp, 0x00, 0x01, PPC2_VSX207),
+GEN_XX3FORM(xsmulsp, 0x00, 0x02, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (6 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp Tom Musta
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Divide Single Precision (xsdivsp)
instruction.

The existing VSX_DIV macro is modified to support rounding of the
intermediate double precision result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   14 ++++++++++----
 target-ppc/helper.h     |    1 +
 target-ppc/translate.c  |    2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index e5127b2..1133591 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1876,7 +1876,7 @@ VSX_MUL(xvmulsp, 4, float32, f32, 0, 0)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_DIV(op, nels, tp, fld, sfprf)                                     \
+#define VSX_DIV(op, nels, tp, fld, sfprf, r2sp)                               \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
 {                                                                             \
     ppc_vsr_t xt, xa, xb;                                                     \
@@ -1905,6 +1905,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
             }                                                                 \
         }                                                                     \
                                                                               \
+        if (r2sp) {                                                           \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);   \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);           \
+        }                                                                     \
+                                                                              \
         if (sfprf) {                                                          \
             helper_compute_fprf(env, xt.fld[i], sfprf);                       \
         }                                                                     \
@@ -1914,9 +1919,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
     helper_float_check_status(env);                                           \
 }
 
-VSX_DIV(xsdivdp, 1, float64, f64, 1)
-VSX_DIV(xvdivdp, 2, float64, f64, 0)
-VSX_DIV(xvdivsp, 4, float32, f32, 0)
+VSX_DIV(xsdivdp, 1, float64, f64, 1, 0)
+VSX_DIV(xsdivsp, 1, float64, f64, 1, 1)
+VSX_DIV(xvdivdp, 2, float64, f64, 0, 0)
+VSX_DIV(xvdivsp, 4, float32, f32, 0, 0)
 
 /* VSX_RE  - VSX floating point reciprocal estimate
  *   op    - instruction mnemonic
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 0ccdc96..308f97c 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -289,6 +289,7 @@ DEF_HELPER_2(xsrdpiz, void, env, i32)
 DEF_HELPER_2(xsaddsp, void, env, i32)
 DEF_HELPER_2(xssubsp, void, env, i32)
 DEF_HELPER_2(xsmulsp, void, env, i32)
+DEF_HELPER_2(xsdivsp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 450ab88..896dbc2 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7344,6 +7344,7 @@ GEN_VSX_HELPER_2(xsrdpiz, 0x12, 0x05, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsaddsp, 0x00, 0x00, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssubsp, 0x00, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsmulsp, 0x00, 0x02, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10155,6 +10156,7 @@ GEN_XX2FORM(xsrdpiz, 0x12, 0x05, PPC2_VSX),
 GEN_XX3FORM(xsaddsp, 0x00, 0x00, PPC2_VSX207),
 GEN_XX3FORM(xssubsp, 0x00, 0x01, PPC2_VSX207),
 GEN_XX3FORM(xsmulsp, 0x00, 0x02, PPC2_VSX207),
+GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (7 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp Tom Musta
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Reciprocal Estimate Single Precision
(xsresp) instruction.

The existing VSX_RE macro is modified to support rounding of the
intermediate double precision result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   15 +++++++++++----
 target-ppc/helper.h     |    1 +
 target-ppc/translate.c  |    2 ++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 1133591..862f855 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1931,7 +1931,7 @@ VSX_DIV(xvdivsp, 4, float32, f32, 0, 0)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_RE(op, nels, tp, fld, sfprf)                                      \
+#define VSX_RE(op, nels, tp, fld, sfprf, r2sp)                                \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
 {                                                                             \
     ppc_vsr_t xt, xb;                                                         \
@@ -1946,6 +1946,12 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
                 fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf);    \
         }                                                                     \
         xt.fld[i] = tp##_div(tp##_one, xb.fld[i], &env->fp_status);           \
+                                                                              \
+        if (r2sp) {                                                           \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);   \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);           \
+        }                                                                     \
+                                                                              \
         if (sfprf) {                                                          \
             helper_compute_fprf(env, xt.fld[0], sfprf);                       \
         }                                                                     \
@@ -1955,9 +1961,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
     helper_float_check_status(env);                                           \
 }
 
-VSX_RE(xsredp, 1, float64, f64, 1)
-VSX_RE(xvredp, 2, float64, f64, 0)
-VSX_RE(xvresp, 4, float32, f32, 0)
+VSX_RE(xsredp, 1, float64, f64, 1, 0)
+VSX_RE(xsresp, 1, float64, f64, 1, 1)
+VSX_RE(xvredp, 2, float64, f64, 0, 0)
+VSX_RE(xvresp, 4, float32, f32, 0, 0)
 
 /* VSX_SQRT - VSX floating point square root
  *   op    - instruction mnemonic
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 308f97c..b1cf3c0 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -290,6 +290,7 @@ DEF_HELPER_2(xsaddsp, void, env, i32)
 DEF_HELPER_2(xssubsp, void, env, i32)
 DEF_HELPER_2(xsmulsp, void, env, i32)
 DEF_HELPER_2(xsdivsp, void, env, i32)
+DEF_HELPER_2(xsresp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 896dbc2..c4c57a1 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7345,6 +7345,7 @@ GEN_VSX_HELPER_2(xsaddsp, 0x00, 0x00, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssubsp, 0x00, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsmulsp, 0x00, 0x02, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10157,6 +10158,7 @@ GEN_XX3FORM(xsaddsp, 0x00, 0x00, PPC2_VSX207),
 GEN_XX3FORM(xssubsp, 0x00, 0x01, PPC2_VSX207),
 GEN_XX3FORM(xsmulsp, 0x00, 0x02, PPC2_VSX207),
 GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
+GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (8 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp Tom Musta
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Square Root Single Precision (xssqrtsp)
instruction.

The existing VSX_SQRT() macro is modified to support rounding of the
intermediate double-precision result to single-precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   14 ++++++++++----
 target-ppc/helper.h     |    1 +
 target-ppc/translate.c  |    2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 862f855..c4e52ea 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1973,7 +1973,7 @@ VSX_RE(xvresp, 4, float32, f32, 0, 0)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_SQRT(op, nels, tp, fld, sfprf)                                   \
+#define VSX_SQRT(op, nels, tp, fld, sfprf, r2sp)                             \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
 {                                                                            \
     ppc_vsr_t xt, xb;                                                        \
@@ -1997,6 +1997,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
             }                                                                \
         }                                                                    \
                                                                              \
+        if (r2sp) {                                                          \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);  \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);          \
+        }                                                                    \
+                                                                             \
         if (sfprf) {                                                         \
             helper_compute_fprf(env, xt.fld[i], sfprf);                      \
         }                                                                    \
@@ -2006,9 +2011,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
     helper_float_check_status(env);                                          \
 }
 
-VSX_SQRT(xssqrtdp, 1, float64, f64, 1)
-VSX_SQRT(xvsqrtdp, 2, float64, f64, 0)
-VSX_SQRT(xvsqrtsp, 4, float32, f32, 0)
+VSX_SQRT(xssqrtdp, 1, float64, f64, 1, 0)
+VSX_SQRT(xssqrtsp, 1, float64, f64, 1, 1)
+VSX_SQRT(xvsqrtdp, 2, float64, f64, 0, 0)
+VSX_SQRT(xvsqrtsp, 4, float32, f32, 0, 0)
 
 /* VSX_RSQRTE - VSX floating point reciprocal square root estimate
  *   op    - instruction mnemonic
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index b1cf3c0..0192043 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -291,6 +291,7 @@ DEF_HELPER_2(xssubsp, void, env, i32)
 DEF_HELPER_2(xsmulsp, void, env, i32)
 DEF_HELPER_2(xsdivsp, void, env, i32)
 DEF_HELPER_2(xsresp, void, env, i32)
+DEF_HELPER_2(xssqrtsp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index c4c57a1..b9cd35b 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7346,6 +7346,7 @@ GEN_VSX_HELPER_2(xssubsp, 0x00, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsmulsp, 0x00, 0x02, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10159,6 +10160,7 @@ GEN_XX3FORM(xssubsp, 0x00, 0x01, PPC2_VSX207),
 GEN_XX3FORM(xsmulsp, 0x00, 0x02, PPC2_VSX207),
 GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
 GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
+GEN_XX2FORM(xssqrtsp,  0x16, 0x00, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (9 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds Tom Musta
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Reciprocal Square Root Estimate
Single Precision (xsrsqrtesp) instruction.

The existing VSX_RSQRTE() macro is modified to support rounding
of the intermediate double-precision result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   14 ++++++++++----
 target-ppc/helper.h     |    1 +
 target-ppc/translate.c  |    2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index c4e52ea..e08f317 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2023,7 +2023,7 @@ VSX_SQRT(xvsqrtsp, 4, float32, f32, 0, 0)
  *   fld   - vsr_t field (f32 or f64)
  *   sfprf - set FPRF
  */
-#define VSX_RSQRTE(op, nels, tp, fld, sfprf)                                 \
+#define VSX_RSQRTE(op, nels, tp, fld, sfprf, r2sp)                           \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
 {                                                                            \
     ppc_vsr_t xt, xb;                                                        \
@@ -2048,6 +2048,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
             }                                                                \
         }                                                                    \
                                                                              \
+        if (r2sp) {                                                          \
+            float32 tmp32 = float64_to_float32(xt.fld[i], &env->fp_status);  \
+            xt.fld[i] = float32_to_float64(tmp32, &env->fp_status);          \
+        }                                                                    \
+                                                                             \
         if (sfprf) {                                                         \
             helper_compute_fprf(env, xt.fld[i], sfprf);                      \
         }                                                                    \
@@ -2057,9 +2062,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                          \
     helper_float_check_status(env);                                          \
 }
 
-VSX_RSQRTE(xsrsqrtedp, 1, float64, f64, 1)
-VSX_RSQRTE(xvrsqrtedp, 2, float64, f64, 0)
-VSX_RSQRTE(xvrsqrtesp, 4, float32, f32, 0)
+VSX_RSQRTE(xsrsqrtedp, 1, float64, f64, 1, 0)
+VSX_RSQRTE(xsrsqrtesp, 1, float64, f64, 1, 1)
+VSX_RSQRTE(xvrsqrtedp, 2, float64, f64, 0, 0)
+VSX_RSQRTE(xvrsqrtesp, 4, float32, f32, 0, 0)
 
 static inline int ppc_float32_get_unbiased_exp(float32 f)
 {
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 0192043..84c6ee1 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -292,6 +292,7 @@ DEF_HELPER_2(xsmulsp, void, env, i32)
 DEF_HELPER_2(xsdivsp, void, env, i32)
 DEF_HELPER_2(xsresp, void, env, i32)
 DEF_HELPER_2(xssqrtsp, void, env, i32)
+DEF_HELPER_2(xsrsqrtesp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index b9cd35b..ae80289 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7347,6 +7347,7 @@ GEN_VSX_HELPER_2(xsmulsp, 0x00, 0x02, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10161,6 +10162,7 @@ GEN_XX3FORM(xsmulsp, 0x00, 0x02, PPC2_VSX207),
 GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
 GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
 GEN_XX2FORM(xssqrtsp,  0x16, 0x00, PPC2_VSX207),
+GEN_XX2FORM(xsrsqrtesp,  0x14, 0x00, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (10 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-07 23:28   ` Richard Henderson
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp Tom Musta
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the Single Precision VSX Scalar Fused Multiply-Add
instructions: xsmaddasp, xsmaddmsp, xssubasp, xssubmsp, xsnmaddasp,
xsnmaddmsp, xsnmsubasp, xsnmsubmsp.

The existing VSX_MADD() macro is modified to support rounding of the
intermediate double precision result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   70 +++++++++++++++++++++++++++++------------------
 target-ppc/helper.h     |    8 +++++
 target-ppc/translate.c  |   16 +++++++++++
 3 files changed, 67 insertions(+), 27 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index e08f317..c86778f 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2198,7 +2198,7 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23)
  *   afrm  - A form (1=A, 0=M)
  *   sfprf - set FPRF
  */
-#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf)                    \
+#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp)              \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
 {                                                                             \
     ppc_vsr_t xt_in, xa, xb, xt_out;                                          \
@@ -2248,6 +2248,13 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
                 fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf);     \
             }                                                                 \
         }                                                                     \
+                                                                              \
+        if (r2sp) {                                                           \
+            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
+                                               &env->fp_status);              \
+            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
+        }                                                                     \
+                                                                              \
         if (sfprf) {                                                          \
             helper_compute_fprf(env, xt_out.fld[i], sfprf);                   \
         }                                                                     \
@@ -2261,32 +2268,41 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
 #define NMADD_FLGS float_muladd_negate_result
 #define NMSUB_FLGS (float_muladd_negate_c | float_muladd_negate_result)
 
-VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1)
-VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1)
-VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1)
-VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1)
-VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1)
-VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1)
-VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1)
-VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1)
-
-VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0)
-VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0)
-VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0)
-VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0)
-VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0)
-
-VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0)
-VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0)
-VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0)
-VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0)
-VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0)
+VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1, 0)
+VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1, 0)
+VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1, 0)
+VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1, 0)
+VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1, 0)
+VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1, 0)
+VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1, 0)
+VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1, 0)
+
+VSX_MADD(xsmaddasp, 1, float64, f64, MADD_FLGS, 1, 1, 1)
+VSX_MADD(xsmaddmsp, 1, float64, f64, MADD_FLGS, 0, 1, 1)
+VSX_MADD(xsmsubasp, 1, float64, f64, MSUB_FLGS, 1, 1, 1)
+VSX_MADD(xsmsubmsp, 1, float64, f64, MSUB_FLGS, 0, 1, 1)
+VSX_MADD(xsnmaddasp, 1, float64, f64, NMADD_FLGS, 1, 1, 1)
+VSX_MADD(xsnmaddmsp, 1, float64, f64, NMADD_FLGS, 0, 1, 1)
+VSX_MADD(xsnmsubasp, 1, float64, f64, NMSUB_FLGS, 1, 1, 1)
+VSX_MADD(xsnmsubmsp, 1, float64, f64, NMSUB_FLGS, 0, 1, 1)
+
+VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0, 0)
+VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0, 0)
+VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0, 0)
+VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0, 0)
+VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0, 0)
+VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0, 0)
+VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0, 0)
+VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0, 0)
+
+VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0, 0)
+VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0, 0)
+VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0, 0)
+VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0, 0)
+VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0, 0)
+VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0, 0)
+VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0, 0)
+VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0, 0)
 
 #define VSX_SCALAR_CMP(op, ordered)                                      \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                      \
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 84c6ee1..655b670 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -293,6 +293,14 @@ DEF_HELPER_2(xsdivsp, void, env, i32)
 DEF_HELPER_2(xsresp, void, env, i32)
 DEF_HELPER_2(xssqrtsp, void, env, i32)
 DEF_HELPER_2(xsrsqrtesp, void, env, i32)
+DEF_HELPER_2(xsmaddasp, void, env, i32)
+DEF_HELPER_2(xsmaddmsp, void, env, i32)
+DEF_HELPER_2(xsmsubasp, void, env, i32)
+DEF_HELPER_2(xsmsubmsp, void, env, i32)
+DEF_HELPER_2(xsnmaddasp, void, env, i32)
+DEF_HELPER_2(xsnmaddmsp, void, env, i32)
+DEF_HELPER_2(xsnmsubasp, void, env, i32)
+DEF_HELPER_2(xsnmsubmsp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ae80289..672cf0a 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7348,6 +7348,14 @@ GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmaddasp, 0x04, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmaddmsp, 0x04, 0x01, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmsubasp, 0x04, 0x02, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmsubmsp, 0x04, 0x03, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmaddasp, 0x04, 0x10, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmaddmsp, 0x04, 0x11, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmsubasp, 0x04, 0x12, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmsubmsp, 0x04, 0x13, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10163,6 +10171,14 @@ GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
 GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
 GEN_XX2FORM(xssqrtsp,  0x16, 0x00, PPC2_VSX207),
 GEN_XX2FORM(xsrsqrtesp,  0x14, 0x00, PPC2_VSX207),
+GEN_XX3FORM(xsmaddasp, 0x04, 0x00, PPC2_VSX207),
+GEN_XX3FORM(xsmaddmsp, 0x04, 0x01, PPC2_VSX207),
+GEN_XX3FORM(xsmsubasp, 0x04, 0x02, PPC2_VSX207),
+GEN_XX3FORM(xsmsubmsp, 0x04, 0x03, PPC2_VSX207),
+GEN_XX3FORM(xsnmaddasp, 0x04, 0x10, PPC2_VSX207),
+GEN_XX3FORM(xsnmaddmsp, 0x04, 0x11, PPC2_VSX207),
+GEN_XX3FORM(xsnmsubasp, 0x04, 0x12, PPC2_VSX207),
+GEN_XX3FORM(xsnmsubmsp, 0x04, 0x13, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (11 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc Tom Musta
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patch adds the VSX Scalar Convert Unsigned Integer Doubleword
to Floating Point Format and Round to Single Precision (xscvuxdsp)
and VSX Scalar Convert Signed Integer Douglbeword to Floating Point
Format and Round to Single Precision (xscvsxdsp) instructions.

The existing integer to floating point conversion macro (VSX_CVT_INT_TO_FP)
is modified to support the rounding of the intermediate floating point
result to single precision.

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/fpu_helper.c |   29 ++++++++++++++++++-----------
 target-ppc/helper.h     |    2 ++
 target-ppc/translate.c  |    4 ++++
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index c86778f..30de9da 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2567,7 +2567,7 @@ VSX_CVT_FP_TO_INT(xvcvspuxws, 4, float32, uint32, f32[j], u32[i], i, 0)
  *   jdef  - definition of the j index (i or 2*i)
  *   sfprf - set FPRF
  */
-#define VSX_CVT_INT_TO_FP(op, nels, stp, ttp, sfld, tfld, jdef, sfprf)  \
+#define VSX_CVT_INT_TO_FP(op, nels, stp, ttp, sfld, tfld, jdef, sfprf, r2sp) \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                     \
 {                                                                       \
     ppc_vsr_t xt, xb;                                                   \
@@ -2579,6 +2579,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                     \
     for (i = 0; i < nels; i++) {                                        \
         int j = jdef;                                                   \
         xt.tfld = stp##_to_##ttp(xb.sfld, &env->fp_status);             \
+        if (r2sp) {                                                     \
+            float32 tmp32 = float64_to_float32(xt.tfld,                 \
+                                               &env->fp_status);        \
+            xt.tfld = float32_to_float64(tmp32, &env->fp_status);       \
+        }                                                               \
         if (sfprf) {                                                    \
             helper_compute_fprf(env, xt.tfld, sfprf);                   \
         }                                                               \
@@ -2588,20 +2593,22 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)                     \
     helper_float_check_status(env);                                     \
 }
 
-VSX_CVT_INT_TO_FP(xscvsxddp, 1, int64, float64, u64[j], f64[i], i, 1)
-VSX_CVT_INT_TO_FP(xscvuxddp, 1, uint64, float64, u64[j], f64[i], i, 1)
-VSX_CVT_INT_TO_FP(xvcvsxddp, 2, int64, float64, u64[j], f64[i], i, 0)
-VSX_CVT_INT_TO_FP(xvcvuxddp, 2, uint64, float64, u64[j], f64[i], i, 0)
+VSX_CVT_INT_TO_FP(xscvsxddp, 1, int64, float64, u64[j], f64[i], i, 1, 0)
+VSX_CVT_INT_TO_FP(xscvuxddp, 1, uint64, float64, u64[j], f64[i], i, 1, 0)
+VSX_CVT_INT_TO_FP(xscvsxdsp, 1, int64, float64, u64[j], f64[i], i, 1, 1)
+VSX_CVT_INT_TO_FP(xscvuxdsp, 1, uint64, float64, u64[j], f64[i], i, 1, 1)
+VSX_CVT_INT_TO_FP(xvcvsxddp, 2, int64, float64, u64[j], f64[i], i, 0, 0)
+VSX_CVT_INT_TO_FP(xvcvuxddp, 2, uint64, float64, u64[j], f64[i], i, 0, 0)
 VSX_CVT_INT_TO_FP(xvcvsxwdp, 2, int32, float64, u32[j], f64[i], \
-                  2*i + JOFFSET, 0)
+                  2*i + JOFFSET, 0, 0)
 VSX_CVT_INT_TO_FP(xvcvuxwdp, 2, uint64, float64, u32[j], f64[i], \
-                  2*i + JOFFSET, 0)
+                  2*i + JOFFSET, 0, 0)
 VSX_CVT_INT_TO_FP(xvcvsxdsp, 2, int64, float32, u64[i], f32[j], \
-                  2*i + JOFFSET, 0)
+                  2*i + JOFFSET, 0, 0)
 VSX_CVT_INT_TO_FP(xvcvuxdsp, 2, uint64, float32, u64[i], f32[j], \
-                  2*i + JOFFSET, 0)
-VSX_CVT_INT_TO_FP(xvcvsxwsp, 4, int32, float32, u32[j], f32[i], i, 0)
-VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, u32[j], f32[i], i, 0)
+                  2*i + JOFFSET, 0, 0)
+VSX_CVT_INT_TO_FP(xvcvsxwsp, 4, int32, float32, u32[j], f32[i], i, 0, 0)
+VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, u32[j], f32[i], i, 0, 0)
 
 /* For "use current rounding mode", define a value that will not be one of
  * the existing rounding model enums.
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 655b670..6250eba 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -279,6 +279,8 @@ DEF_HELPER_2(xscvdpsxws, void, env, i32)
 DEF_HELPER_2(xscvdpuxds, void, env, i32)
 DEF_HELPER_2(xscvdpuxws, void, env, i32)
 DEF_HELPER_2(xscvsxddp, void, env, i32)
+DEF_HELPER_2(xscvuxdsp, void, env, i32)
+DEF_HELPER_2(xscvsxdsp, void, env, i32)
 DEF_HELPER_2(xscvuxddp, void, env, i32)
 DEF_HELPER_2(xsrdpi, void, env, i32)
 DEF_HELPER_2(xsrdpic, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 672cf0a..e13bb8f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7356,6 +7356,8 @@ GEN_VSX_HELPER_2(xsnmaddasp, 0x04, 0x10, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsnmaddmsp, 0x04, 0x11, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsnmsubasp, 0x04, 0x12, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsnmsubmsp, 0x04, 0x13, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xscvsxdsp, 0x10, 0x13, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xscvuxdsp, 0x10, 0x12, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10179,6 +10181,8 @@ GEN_XX3FORM(xsnmaddasp, 0x04, 0x10, PPC2_VSX207),
 GEN_XX3FORM(xsnmaddmsp, 0x04, 0x11, PPC2_VSX207),
 GEN_XX3FORM(xsnmsubasp, 0x04, 0x12, PPC2_VSX207),
 GEN_XX3FORM(xsnmsubmsp, 0x04, 0x13, PPC2_VSX207),
+GEN_XX2FORM(xscvsxdsp, 0x10, 0x13, PPC2_VSX207),
+GEN_XX2FORM(xscvuxdsp, 0x10, 0x12, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (12 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp Tom Musta
@ 2013-11-06 20:31 ` Tom Musta
  2013-11-08  0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
  2013-11-08 15:55 ` Andreas Färber
  15 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-06 20:31 UTC (permalink / raw)
  To: qemu-devel, tommusta; +Cc: qemu-ppc

This patchs adds the VSX Logical instructions that are new with
ISA V2.07:

  - VSX Logical Equivalence (xxleqv)
  - VSX Logical NAND (xxlnand)
  - VSX Logical ORC (xxlorc)

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 target-ppc/translate.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index e13bb8f..1f7e499 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7451,6 +7451,9 @@ VSX_LOGICAL(xxlandc, tcg_gen_andc_tl)
 VSX_LOGICAL(xxlor, tcg_gen_or_tl)
 VSX_LOGICAL(xxlxor, tcg_gen_xor_tl)
 VSX_LOGICAL(xxlnor, tcg_gen_nor_tl)
+VSX_LOGICAL(xxleqv, tcg_gen_eqv_tl)
+VSX_LOGICAL(xxlnand, tcg_gen_nand_tl)
+VSX_LOGICAL(xxlorc, tcg_gen_orc_tl)
 
 #define VSX_XXMRG(name, high)                               \
 static void glue(gen_, name)(DisasContext * ctx)            \
@@ -10267,6 +10270,9 @@ VSX_LOGICAL(xxlandc, 0x8, 0x11, PPC2_VSX),
 VSX_LOGICAL(xxlor, 0x8, 0x12, PPC2_VSX),
 VSX_LOGICAL(xxlxor, 0x8, 0x13, PPC2_VSX),
 VSX_LOGICAL(xxlnor, 0x8, 0x14, PPC2_VSX),
+VSX_LOGICAL(xxleqv, 0x8, 0x17, PPC2_VSX207),
+VSX_LOGICAL(xxlnand, 0x8, 0x16, PPC2_VSX207),
+VSX_LOGICAL(xxlorc, 0x8, 0x15, PPC2_VSX207),
 GEN_XX3FORM(xxmrghw, 0x08, 0x02, PPC2_VSX),
 GEN_XX3FORM(xxmrglw, 0x08, 0x06, PPC2_VSX),
 GEN_XX2FORM(xxspltw, 0x08, 0x0A, PPC2_VSX),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds Tom Musta
@ 2013-11-07 23:28   ` Richard Henderson
  2013-11-07 23:30     ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2013-11-07 23:28 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: qemu-ppc

On 11/07/2013 06:31 AM, Tom Musta wrote:
>          }                                                                     \
> +                                                                              \
> +        if (r2sp) {                                                           \
> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
> +                                               &env->fp_status);              \
> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
> +        }                                                                     \
> +                                                                              \

You can't get correct results for a single-precision fma from a
double-precision fma and merely rounding the results.

See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-07 23:28   ` Richard Henderson
@ 2013-11-07 23:30     ` Richard Henderson
  2013-11-08  0:13       ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2013-11-07 23:30 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: qemu-ppc

On 11/08/2013 09:28 AM, Richard Henderson wrote:
> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>          }                                                                     \
>> +                                                                              \
>> +        if (r2sp) {                                                           \
>> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
>> +                                               &env->fp_status);              \
>> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
>> +        }                                                                     \
>> +                                                                              \
> 
> You can't get correct results for a single-precision fma from a
> double-precision fma and merely rounding the results.
> 
> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.

Blah, nevermind.  That would be using separate add+mul in double-precision, not
using a double-precision fma primitive.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-07 23:30     ` Richard Henderson
@ 2013-11-08  0:13       ` Richard Henderson
  2013-11-13 20:49         ` Tom Musta
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2013-11-08  0:13 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: qemu-ppc

On 11/08/2013 09:30 AM, Richard Henderson wrote:
> On 11/08/2013 09:28 AM, Richard Henderson wrote:
>> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>>          }                                                                     \
>>> +                                                                              \
>>> +        if (r2sp) {                                                           \
>>> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
>>> +                                               &env->fp_status);              \
>>> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
>>> +        }                                                                     \
>>> +                                                                              \
>>
>> You can't get correct results for a single-precision fma from a
>> double-precision fma and merely rounding the results.
>>
>> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.
> 
> Blah, nevermind.  That would be using separate add+mul in double-precision, not
> using a double-precision fma primitive.

Hmph.  I was right the first time.  See

> http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/

for example inputs that suffer from double-rounding.

What's needed in each of the examples are infinite precision values containing
55 bits.  This is easy to accomplish with fma.

Two 23-bit inputs can create a product with 46 significant bits.  One can
append 23 more significant bits by choosing an exponent for the addend that
does not overlap the product.  Thus one can create (almost) every intermediate
result with up to 69 consecutive bits (the exception being products without
factors that can be represented in 23-bits).

I'm too lazy to decompose the examples therein to actual single-precision
inputs, but I'm certain it can be done.

Thus you *do* need the round-to-zero plus inexact solution that glibc uses.
(Or to perform the fma in 128-bits and round once, but I think that would be
way more intrusive wrt the rest of the code, and more expensive than necessary.)


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (13 preceding siblings ...)
  2013-11-06 20:31 ` [Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc Tom Musta
@ 2013-11-08  0:23 ` Richard Henderson
  2013-11-08 14:53   ` Tom Musta
  2013-11-13 14:35   ` Tom Musta
  2013-11-08 15:55 ` Andreas Färber
  15 siblings, 2 replies; 25+ messages in thread
From: Richard Henderson @ 2013-11-08  0:23 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: qemu-ppc

On 11/07/2013 06:31 AM, Tom Musta wrote:
> The single-precision scalar arithmetic instructions all interpret the most
> significant 64 bits of a VSR as a single precision floating point number
> stored in double precision format (similar to the standard PowerPC floating
> point single precision instructions).  Thus a common theme in the supporting
> code is rounding of an intermediate double-precision number to single 
> precision.

Modulo my comments wrt the actual computation of fma, the patches all look fine.

But I'd like to also mention a pre-existing flaw/niggle in the ppc port.

The conversions to/from in-register representation for the single-precision
values should never raise exceptions.  Yet we always use

    d.d = float32_to_float64(f.f, &env->fp_status);
    f.f = float64_to_float32(d.d, &env->fp_status);

The use of env->fp_status is either wrong or extremely misleading.  It sure
looks like the operation affects global cpu state.  It may be that that state
is never copied back to the "real" fpscr and so doesn't actually affect cpu
state, but how can I see that for sure?

I think it would be better to implement ConvertSPtoDP_NP and ConvertSP64toSP
exactly as written in the spec.

Or at minimum use a dummy fp_status that's not associated with env.  It should
not matter what the "real" rounding mode is in either case, since values that
are not exactly representable as single-precision values give undefined results.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4
  2013-11-08  0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
@ 2013-11-08 14:53   ` Tom Musta
  2013-11-13 14:35   ` Tom Musta
  1 sibling, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-08 14:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc

On 11/7/2013 6:23 PM, Richard Henderson wrote:
> On 11/07/2013 06:31 AM, Tom Musta wrote:
>> The single-precision scalar arithmetic instructions all interpret the most
>> significant 64 bits of a VSR as a single precision floating point number
>> stored in double precision format (similar to the standard PowerPC floating
>> point single precision instructions).  Thus a common theme in the supporting
>> code is rounding of an intermediate double-precision number to single 
>> precision.
> 
> Modulo my comments wrt the actual computation of fma, the patches all look fine.
> 
> But I'd like to also mention a pre-existing flaw/niggle in the ppc port.
> 
> The conversions to/from in-register representation for the single-precision
> values should never raise exceptions.  Yet we always use
> 
>     d.d = float32_to_float64(f.f, &env->fp_status);
>     f.f = float64_to_float32(d.d, &env->fp_status);
> 
> The use of env->fp_status is either wrong or extremely misleading.  It sure
> looks like the operation affects global cpu state.  It may be that that state
> is never copied back to the "real" fpscr and so doesn't actually affect cpu
> state, but how can I see that for sure?
> 
> I think it would be better to implement ConvertSPtoDP_NP and ConvertSP64toSP
> exactly as written in the spec.
> 
> Or at minimum use a dummy fp_status that's not associated with env.  It should
> not matter what the "real" rounding mode is in either case, since values that
> are not exactly representable as single-precision values give undefined results.

Richard:

Thanks for your comments.  I concur with this comment on fp_status.

I am looking into the comments on fused multiply-add.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4
  2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
                   ` (14 preceding siblings ...)
  2013-11-08  0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
@ 2013-11-08 15:55 ` Andreas Färber
  15 siblings, 0 replies; 25+ messages in thread
From: Andreas Färber @ 2013-11-08 15:55 UTC (permalink / raw)
  To: Tom Musta; +Cc: qemu-ppc, qemu-devel, Alexander Graf

Hi,

Am 06.11.2013 21:31, schrieb Tom Musta:
> This is the fourth and final series of patches that add emulation support
> to QEMU for the PowerPC Vector Scalar Extension (VSX). 
[...]
>  target-ppc/cpu.h            |    4 +-
>  target-ppc/fpu_helper.c     |  191 ++++++++++++++++++++++++++++---------------
>  target-ppc/helper.h         |   18 ++++
>  target-ppc/translate.c      |  110 +++++++++++++++++++------
>  target-ppc/translate_init.c |    2 +-
>  5 files changed, 232 insertions(+), 93 deletions(-)

Please make it clear from all your mail (and future commit) subjects
that this is about PowerPC. Traditionally that would be "target-ppc: ",
Alex sometimes uses "PPC: ".

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4
  2013-11-08  0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
  2013-11-08 14:53   ` Tom Musta
@ 2013-11-13 14:35   ` Tom Musta
  1 sibling, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-13 14:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc

On 11/7/2013 6:23 PM, Richard Henderson wrote:
> Modulo my comments wrt the actual computation of fma, the patches all look fine.
> 
> But I'd like to also mention a pre-existing flaw/niggle in the ppc port.
> 
> The conversions to/from in-register representation for the single-precision
> values should never raise exceptions.  Yet we always use
> 
>     d.d = float32_to_float64(f.f, &env->fp_status);
>     f.f = float64_to_float32(d.d, &env->fp_status);
> 
> The use of env->fp_status is either wrong or extremely misleading.  It sure
> looks like the operation affects global cpu state.  It may be that that state
> is never copied back to the "real" fpscr and so doesn't actually affect cpu
> state, but how can I see that for sure?
> 
> I think it would be better to implement ConvertSPtoDP_NP and ConvertSP64toSP
> exactly as written in the spec.
> 
> Or at minimum use a dummy fp_status that's not associated with env.  It should
> not matter what the "real" rounding mode is in either case, since values that
> are not exactly representable as single-precision values give undefined results.

I've looked more closely at the code and have performed some experiments.  There
are several status flags that are being set by the float32_to_float64 call. And
they are copied back near the end of these routines via the helper_float_check_status.
So I think this is all necessary.

That said, rather than repeating the float32_to_float64() / float64_to_float32()
pattern everywhere, I should have reused the existing helper_frsp() routine.

So I will be publishing a V2.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-08  0:13       ` Richard Henderson
@ 2013-11-13 20:49         ` Tom Musta
  2013-11-13 23:14           ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Tom Musta @ 2013-11-13 20:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc

On 11/7/2013 6:13 PM, Richard Henderson wrote:
> On 11/08/2013 09:30 AM, Richard Henderson wrote:
>> On 11/08/2013 09:28 AM, Richard Henderson wrote:
>>> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>>>          }                                                                     \
>>>> +                                                                              \
>>>> +        if (r2sp) {                                                           \
>>>> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],                 \
>>>> +                                               &env->fp_status);              \
>>>> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);       \
>>>> +        }                                                                     \
>>>> +                                                                              \
>>>
>>> You can't get correct results for a single-precision fma from a
>>> double-precision fma and merely rounding the results.
>>>
>>> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.
>>
>> Blah, nevermind.  That would be using separate add+mul in double-precision, not
>> using a double-precision fma primitive.
> 
> Hmph.  I was right the first time.  See
> 
>> http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/
> 
> for example inputs that suffer from double-rounding.
> 
> What's needed in each of the examples are infinite precision values containing
> 55 bits.  This is easy to accomplish with fma.
> 
> Two 23-bit inputs can create a product with 46 significant bits.  One can
> append 23 more significant bits by choosing an exponent for the addend that
> does not overlap the product.  Thus one can create (almost) every intermediate
> result with up to 69 consecutive bits (the exception being products without
> factors that can be represented in 23-bits).
> 
> I'm too lazy to decompose the examples therein to actual single-precision
> inputs, but I'm certain it can be done.
> 
> Thus you *do* need the round-to-zero plus inexact solution that glibc uses.
> (Or to perform the fma in 128-bits and round once, but I think that would be
> way more intrusive wrt the rest of the code, and more expensive than necessary.)

I have reviewed the code and the spec and I cannot see a flaw.  The sequence is
effectively this:

  - float64_muladd   - performs proper FMA for 64 bit numbers)
  - float64_to_float32 - converts to single precision, including proper rounding
  - float32_to_float64

The implementation of float64_muladd would seem to provide enough mantissa bits
for proper handling of the case you describe.  The only rounding occurs in the
second step.

I have also done quite a bit of random and targeted random testing using Power
hardware to produce expected results.  The targeted random tests followed your
suggestion above: generate AxB + C where abs(exp(A) - exp(B)) = 23 and
abs(exp(A) - exp(C)) = 46.  Several million test patterns have been generated
and played back through QEMU without any miscompares in the numerical results.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-13 20:49         ` Tom Musta
@ 2013-11-13 23:14           ` Richard Henderson
  2013-11-14 20:58             ` Tom Musta
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2013-11-13 23:14 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: qemu-ppc

On 11/14/2013 06:49 AM, Tom Musta wrote:
> I have reviewed the code and the spec and I cannot see a flaw.  The sequence is
> effectively this:
> 
>   - float64_muladd   - performs proper FMA for 64 bit numbers)
>   - float64_to_float32 - converts to single precision, including proper rounding
>   - float32_to_float64
> 
> The implementation of float64_muladd would seem to provide enough mantissa bits
> for proper handling of the case you describe.  The only rounding occurs in the
> second step.
> 
> I have also done quite a bit of random and targeted random testing using Power
> hardware to produce expected results.  The targeted random tests followed your
> suggestion above: generate AxB + C where abs(exp(A) - exp(B)) = 23 and
> abs(exp(A) - exp(C)) = 46.  Several million test patterns have been generated
> and played back through QEMU without any miscompares in the numerical results.

Here's an example that produces wrong results when rounding to double first.
Replace the portable math.h calls with ppc asm as necessary.


r~


$ cat z.c
#include <stdio.h>
#include <math.h>

float a = 65281;
float b = 257;
float c = 0x1p-29f;

int main()
{
    double dd = fma(a, b, c);
    float d = dd;
    float e = fmaf(a, b, c);

    printf("a = %a\n", a);
    printf("b = %a\n", b);
    printf("c = %a\n", c);
    printf("dd= %a\n", dd);
    printf("d = %a\n", d);
    printf("e = %a\n", e);
    return 0;
}
$ ./a.out
a = 0x1.fe02p+15
b = 0x1.01p+8
c = 0x1p-29
dd= 0x1.000001p+24
d = 0x1p+24
e = 0x1.000002p+24

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds
  2013-11-13 23:14           ` Richard Henderson
@ 2013-11-14 20:58             ` Tom Musta
  0 siblings, 0 replies; 25+ messages in thread
From: Tom Musta @ 2013-11-14 20:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc

On 11/13/2013 5:14 PM, Richard Henderson wrote:
> On 11/14/2013 06:49 AM, Tom Musta wrote:
>> I have also done quite a bit of random and targeted random testing using Power
>> hardware to produce expected results.  The targeted random tests followed your
>> suggestion above: generate AxB + C where abs(exp(A) - exp(B)) = 23 and
>> abs(exp(A) - exp(C)) = 46.  Several million test patterns have been generated
>> and played back through QEMU without any miscompares in the numerical results.
> 
> Here's an example that produces wrong results when rounding to double first.
> Replace the portable math.h calls with ppc asm as necessary.
> 
> <snip>
> r~
> 

Thanks, Richard.  You have convinced me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-11-14 20:59 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-06 20:31 [Qemu-devel] [PATCH 00/14] VSX Stage 4 Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 01/14] VSX Stage 4: Add VSX 2.07 Flag Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 02/14] VSX Stage 4: Refactor lxsdx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 03/14] VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 04/14] VSX Stage 4: Refactor stxsdx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds Tom Musta
2013-11-07 23:28   ` Richard Henderson
2013-11-07 23:30     ` Richard Henderson
2013-11-08  0:13       ` Richard Henderson
2013-11-13 20:49         ` Tom Musta
2013-11-13 23:14           ` Richard Henderson
2013-11-14 20:58             ` Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp Tom Musta
2013-11-06 20:31 ` [Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc Tom Musta
2013-11-08  0:23 ` [Qemu-devel] [PATCH 00/14] VSX Stage 4 Richard Henderson
2013-11-08 14:53   ` Tom Musta
2013-11-13 14:35   ` Tom Musta
2013-11-08 15:55 ` Andreas Färber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.