All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending)
@ 2016-09-16 10:51 Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction Nikunj A Dadhania
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

This series contains 4 new instructions for POWER9 ISA3.0
Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.

Patches:
    01:  darn: Deliver a random number
    02:  lxvw4x - improve implementation
    03:  stxv4x - improve implementation
    04:  lxvh8x:  Load VSX Vector Halfword*8
         stxvh8x:  Store VSX Vector Halfword*8
    05:  lxvb16x: Load VSX Vector Byte*16
         stxvb16x: Store VSX Vector Byte*16

Changelog:
v2: 
* Fix lxvw4x/stxv4x translation as LE/BE were both similar 
  one in tcg and other as helper
* Rename bswap32x2 to deposit32x2 as it does not need to 
  swap content(32bit)
* stxvh8x had a bug as David suggested.

v1: 
* More load/store cleanups in byte reverse routines
* ld64/st64 converted to newer macro and updated call sites
* Cleanup load with reservation and store conditional
* Return invalid random for darn instruction

v0:
* darn - read /dev/random to get the random number
* xxspltib - make is PPC64 only
* Consolidate load/store operations and use macros to generate qemu_st/ld
* Simplify load/store vsx endian manipulation

Nikunj A Dadhania (4):
  target-ppc: improve lxvw4x implementation
  target-ppc: improve stxvw4x implementation
  target-ppc: add lxvh8x and stxvh8x
  target-ppc: add lxvb16x and stxvb16x

Ravi Bangoria (1):
  target-ppc: implement darn instruction

 target-ppc/helper.h                 |   4 +
 target-ppc/int_helper.c             |  16 ++++
 target-ppc/mem_helper.c             |  11 +++
 target-ppc/translate.c              |  18 +++++
 target-ppc/translate/vsx-impl.inc.c | 146 ++++++++++++++++++++++++++++++------
 target-ppc/translate/vsx-ops.inc.c  |   4 +
 6 files changed, 175 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
@ 2016-09-16 10:51 ` Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh, Ravi Bangoria

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

darn: Deliver A Random Number

Currently return invalid random number for all the case. This needs
proper algorithm to provide cryptographically suitable random data.
Reading from /dev/random can block and that is not an expected behaviour
while the cpu instruction is getting executed. Moreover, /dev/random
would only work for linux-user

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h     |  2 ++
 target-ppc/int_helper.c | 16 ++++++++++++++++
 target-ppc/translate.c  | 18 ++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index e75d070..966f2ce 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -50,6 +50,8 @@ DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
+DEF_HELPER_0(darn32, tl)
+DEF_HELPER_0(darn64, tl)
 #endif
 
 DEF_HELPER_FLAGS_1(cntlsw32, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 291fba0..51a9ac5 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -182,6 +182,22 @@ target_ulong helper_cnttzd(target_ulong t)
 {
     return ctz64(t);
 }
+
+/* Return invalid random number.
+ *
+ * FIXME: Add rng backend or other mechanism to get cryptographically suitable
+ * random number
+ */
+target_ulong helper_darn32(void)
+{
+    return -1;
+}
+
+target_ulong helper_darn64(void)
+{
+    return -1;
+}
+
 #endif
 
 #if defined(TARGET_PPC64)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index e747c1f..eb681de 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -528,6 +528,8 @@ EXTRACT_HELPER(FPW, 16, 1);
 
 /* addpcis */
 EXTRACT_HELPER_DXFORM(DX, 10, 6, 6, 5, 16, 1, 1, 0, 0)
+/* darn */
+EXTRACT_HELPER(L, 16, 2);
 
 /***                            Jump target decoding                       ***/
 /* Immediate address */
@@ -1895,6 +1897,21 @@ static void gen_cnttzd(DisasContext *ctx)
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
 }
+
+/* darn */
+static void gen_darn(DisasContext *ctx)
+{
+    int l = L(ctx->opcode);
+
+    if (l == 0) {
+        gen_helper_darn32(cpu_gpr[rD(ctx->opcode)]);
+    } else if (l <= 2) {
+        /* Return 64-bit random for both CRN and RRN */
+        gen_helper_darn64(cpu_gpr[rD(ctx->opcode)]);
+    } else {
+        tcg_gen_movi_i64(cpu_gpr[rD(ctx->opcode)], -1);
+    }
+}
 #endif
 
 /***                             Integer rotate                            ***/
@@ -6216,6 +6233,7 @@ GEN_HANDLER_E(prtyw, 0x1F, 0x1A, 0x04, 0x0000F801, PPC_NONE, PPC2_ISA205),
 GEN_HANDLER(popcntd, 0x1F, 0x1A, 0x0F, 0x0000F801, PPC_POPCNTWD),
 GEN_HANDLER(cntlzd, 0x1F, 0x1A, 0x01, 0x00000000, PPC_64B),
 GEN_HANDLER_E(cnttzd, 0x1F, 0x1A, 0x11, 0x00000000, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(darn, 0x1F, 0x13, 0x17, 0x001CF801, PPC_NONE, PPC2_ISA300),
 GEN_HANDLER_E(prtyd, 0x1F, 0x1A, 0x05, 0x0000F801, PPC_NONE, PPC2_ISA205),
 GEN_HANDLER_E(bpermd, 0x1F, 0x1C, 0x07, 0x00000001, PPC_NONE, PPC2_PERM_ISA206),
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction Nikunj A Dadhania
@ 2016-09-16 10:51 ` Nikunj A Dadhania
  2016-09-19  6:19   ` David Gibson
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Load 8byte at a time and manipulate.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |  1 +
 target-ppc/mem_helper.c             |  5 +++++
 target-ppc/translate/vsx-impl.inc.c | 19 +++++--------------
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 966f2ce..9f6705d 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
 DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
+DEF_HELPER_1(deposit32x2, i64, i64)
 DEF_HELPER_3(stvebx, void, env, avr, tl)
 DEF_HELPER_3(stvehx, void, env, avr, tl)
 DEF_HELPER_3(stvewx, void, env, avr, tl)
diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
index 6548715..86e493e 100644
--- a/target-ppc/mem_helper.c
+++ b/target-ppc/mem_helper.c
@@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
 #undef I
 #undef LVE
 
+uint64_t helper_deposit32x2(uint64_t x)
+{
+    return deposit64((x >> 32), 32, 32, (x));
+}
+
 #undef HI_IDX
 #undef LO_IDX
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index eee6052..df278df 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 tmp;
     TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
     TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
     if (unlikely(!ctx->vsx_enabled)) {
@@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
-    tmp = tcg_temp_new_i64();
 
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xth, EA);
-    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
-
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xtl, EA);
-    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
-
+    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+    gen_helper_deposit32x2(xth, xth);
+    tcg_gen_addi_tl(EA, EA, 8);
+    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+    gen_helper_deposit32x2(xtl, xtl);
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define VSX_STORE_SCALAR(name, operation)                     \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
@ 2016-09-16 10:51 ` Nikunj A Dadhania
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x Nikunj A Dadhania
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Manipulate data and store 8bytes instead of 4bytes.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index df278df..a3868f6 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -133,7 +133,8 @@ static void gen_stxvd2x(DisasContext *ctx)
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 tmp;
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -142,21 +143,13 @@ static void gen_stxvw4x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    tmp = tcg_temp_new_i64();
-
-    tcg_gen_shri_i64(tmp, cpu_vsrh(xS(ctx->opcode)), 32);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
-
-    tcg_gen_shri_i64(tmp, cpu_vsrl(xS(ctx->opcode)), 32);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
 
+    gen_helper_deposit32x2(xsh, xsh);
+    tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+    tcg_gen_addi_tl(EA, EA, 8);
+    gen_helper_deposit32x2(xsl, xsl);
+    tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
@ 2016-09-16 10:51 ` Nikunj A Dadhania
  2016-09-19  6:33   ` David Gibson
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x Nikunj A Dadhania
  2016-09-19  6:51 ` [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) David Gibson
  5 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxvh8x:  Load VSX Vector Halfword*8
stxvh8x:  Store VSX Vector Halfword*8

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |  1 +
 target-ppc/mem_helper.c             |  6 ++++
 target-ppc/translate/vsx-impl.inc.c | 59 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  2 ++
 4 files changed, 68 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 9f6705d..ae91c3b 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -298,6 +298,7 @@ DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
 DEF_HELPER_1(deposit32x2, i64, i64)
+DEF_HELPER_1(bswap16x4, i64, i64)
 DEF_HELPER_3(stvebx, void, env, avr, tl)
 DEF_HELPER_3(stvehx, void, env, avr, tl)
 DEF_HELPER_3(stvewx, void, env, avr, tl)
diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
index 86e493e..26d5617 100644
--- a/target-ppc/mem_helper.c
+++ b/target-ppc/mem_helper.c
@@ -290,6 +290,12 @@ uint64_t helper_deposit32x2(uint64_t x)
     return deposit64((x >> 32), 32, 32, (x));
 }
 
+uint64_t helper_bswap16x4(uint64_t x)
+{
+    uint64_t m = 0x00ff00ff00ff00ffull;
+    return ((x & m) << 8) | ((x >> 8) & m);
+}
+
 #undef HI_IDX
 #undef LO_IDX
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index a3868f6..9e7588d 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -93,6 +93,36 @@ static void gen_lxvw4x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_lxvh8x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+
+    if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+        gen_helper_bswap16x4(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+        gen_helper_bswap16x4(xtl, xtl);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_deposit32x2(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_deposit32x2(xtl, xtl);
+    }
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
@@ -152,6 +182,35 @@ static void gen_stxvw4x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_stxvh8x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    if (ctx->le_mode) {
+        gen_helper_bswap16x4(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_bswap16x4(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        gen_helper_deposit32x2(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_deposit32x2(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
+    }
+    tcg_temp_free(EA);
+}
+
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
 static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 414b73b..21f9064 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -7,6 +7,7 @@ GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
@@ -15,6 +16,7 @@ GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
 
 GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
                   ` (3 preceding siblings ...)
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x Nikunj A Dadhania
@ 2016-09-16 10:51 ` Nikunj A Dadhania
  2016-09-19  6:35   ` David Gibson
  2016-09-19  6:51 ` [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) David Gibson
  5 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16 10:51 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxvb16x: Load VSX Vector Byte*16
stxvb16x: Store VSX Vector Byte*16

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 55 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  2 ++
 2 files changed, 57 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 9e7588d..306cc55 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -123,6 +123,33 @@ static void gen_lxvh8x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_lxvb16x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_deposit32x2(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_deposit32x2(xtl, xtl);
+    }
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
@@ -211,6 +238,34 @@ static void gen_stxvh8x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_stxvb16x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+
+    if (ctx->le_mode) {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        gen_helper_deposit32x2(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_deposit32x2(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
+    }
+    tcg_temp_free(EA);
+}
+
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
 static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 21f9064..f5afa0f 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -8,6 +8,7 @@ GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(lxvb16x, 0x1F, 0x0C, 0x1B, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
@@ -17,6 +18,7 @@ GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(stxvb16x, 0x1F, 0x0C, 0x1F, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
@ 2016-09-19  6:19   ` David Gibson
  2016-09-19  6:50     ` David Gibson
  2016-09-19  8:32     ` Nikunj A Dadhania
  0 siblings, 2 replies; 18+ messages in thread
From: David Gibson @ 2016-09-19  6:19 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3311 bytes --]

On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> Load 8byte at a time and manipulate.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h                 |  1 +
>  target-ppc/mem_helper.c             |  5 +++++
>  target-ppc/translate/vsx-impl.inc.c | 19 +++++--------------
>  3 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 966f2ce..9f6705d 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
> +DEF_HELPER_1(deposit32x2, i64, i64)
>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>  DEF_HELPER_3(stvewx, void, env, avr, tl)
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 6548715..86e493e 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
>  #undef I
>  #undef LVE
>  
> +uint64_t helper_deposit32x2(uint64_t x)
> +{
> +    return deposit64((x >> 32), 32, 32, (x));
> +}

It seems a shame to drop out to a helper for something this simple.
How hard would it be to implement this.. wordswap, I guess you'd call
it.. in tcg ops?

I'm also not particularly fond of the deposit32x2 name, though a
better one doesn't quickly come to mind.

> +
>  #undef HI_IDX
>  #undef LO_IDX
>  
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index eee6052..df278df 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>  static void gen_lxvw4x(DisasContext *ctx)
>  {
>      TCGv EA;
> -    TCGv_i64 tmp;
>      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>      }
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
> -    tmp = tcg_temp_new_i64();
>  
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, xth, EA);
> -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> -
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> -
> +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +    gen_helper_deposit32x2(xth, xth);
> +    tcg_gen_addi_tl(EA, EA, 8);
> +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +    gen_helper_deposit32x2(xtl, xtl);
>      tcg_temp_free(EA);
> -    tcg_temp_free_i64(tmp);
>  }
>  
>  #define VSX_STORE_SCALAR(name, operation)                     \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x Nikunj A Dadhania
@ 2016-09-19  6:33   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-09-19  6:33 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 6581 bytes --]


On Fri, Sep 16, 2016 at 04:21:50PM +0530, Nikunj A Dadhania wrote:
> lxvh8x:  Load VSX Vector Halfword*8
> stxvh8x:  Store VSX Vector Halfword*8
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h                 |  1 +
>  target-ppc/mem_helper.c             |  6 ++++
>  target-ppc/translate/vsx-impl.inc.c | 59 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  2 ++
>  4 files changed, 68 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 9f6705d..ae91c3b 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -298,6 +298,7 @@ DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
>  DEF_HELPER_1(deposit32x2, i64, i64)
> +DEF_HELPER_1(bswap16x4, i64, i64)
>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>  DEF_HELPER_3(stvewx, void, env, avr, tl)
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 86e493e..26d5617 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -290,6 +290,12 @@ uint64_t helper_deposit32x2(uint64_t x)
>      return deposit64((x >> 32), 32, 32, (x));
>  }
>  
> +uint64_t helper_bswap16x4(uint64_t x)
> +{
> +    uint64_t m = 0x00ff00ff00ff00ffull;
> +    return ((x & m) << 8) | ((x >> 8) & m);
> +}
> +
>  #undef HI_IDX
>  #undef LO_IDX
>  
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index a3868f6..9e7588d 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -93,6 +93,36 @@ static void gen_lxvw4x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_lxvh8x(DisasContext *ctx)
> +{
> +    TCGv EA;
> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
> +        gen_helper_bswap16x4(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
> +        gen_helper_bswap16x4(xtl, xtl);
> +    } else {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_deposit32x2(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_deposit32x2(xtl, xtl);

The BE path still looks wrong.  In fact, it's almost as wrong as it
could possibly be.

The LE load means the bytes of each halfword will be in the wrong
order.  It also means lower-address halfwords will end up in less
significant halfwords internally, which is also wrong.  Finally the
deposit32x2 will get pairs of halfwords in the right relative order,
but the order of the halfwords within the pair will still be wrong.

I really think you need to make yourself some testcases for these...

I think what you actually want is a BE load in all cases (which gets
the halfword order correct), then apply the 16x4 bswap if and only if
you're in LE mode, to get the bytes within each halfword in the right
order.

> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define VSX_STORE_SCALAR(name, operation)                     \
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
> @@ -152,6 +182,35 @@ static void gen_stxvw4x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_stxvh8x(DisasContext *ctx)
> +{
> +    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> +    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
> +    TCGv EA;
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +    if (ctx->le_mode) {
> +        gen_helper_bswap16x4(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_bswap16x4(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        gen_helper_deposit32x2(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_deposit32x2(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);

Same problems as the load side.

> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
>  static void gen_##name(DisasContext *ctx)                       \
>  {                                                               \
> diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
> index 414b73b..21f9064 100644
> --- a/target-ppc/translate/vsx-ops.inc.c
> +++ b/target-ppc/translate/vsx-ops.inc.c
> @@ -7,6 +7,7 @@ GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
> +GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
>  
>  GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
> @@ -15,6 +16,7 @@ GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
> +GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
>  
>  GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x Nikunj A Dadhania
@ 2016-09-19  6:35   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-09-19  6:35 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 4812 bytes --]

On Fri, Sep 16, 2016 at 04:21:51PM +0530, Nikunj A Dadhania wrote:
> lxvb16x: Load VSX Vector Byte*16
> stxvb16x: Store VSX Vector Byte*16
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate/vsx-impl.inc.c | 55 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  2 ++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index 9e7588d..306cc55 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -123,6 +123,33 @@ static void gen_lxvh8x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_lxvb16x(DisasContext *ctx)
> +{
> +    TCGv EA;
> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_deposit32x2(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_deposit32x2(xtl, xtl);

Again, not correct.  Here just a BE load should suffice, you don't
need any byte or word swapping at all.  The instruction is defined as
loading bytes at increasing address into bytes of decreasing
significance.  This is exactly a big-endian load.

> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define VSX_STORE_SCALAR(name, operation)                     \
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
> @@ -211,6 +238,34 @@ static void gen_stxvh8x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_stxvb16x(DisasContext *ctx)
> +{
> +    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> +    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
> +    TCGv EA;
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        gen_helper_deposit32x2(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_deposit32x2(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);

Ditto.

> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
>  static void gen_##name(DisasContext *ctx)                       \
>  {                                                               \
> diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
> index 21f9064..f5afa0f 100644
> --- a/target-ppc/translate/vsx-ops.inc.c
> +++ b/target-ppc/translate/vsx-ops.inc.c
> @@ -8,6 +8,7 @@ GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
> +GEN_HANDLER_E(lxvb16x, 0x1F, 0x0C, 0x1B, 0, PPC_NONE, PPC2_ISA300),
>  
>  GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
> @@ -17,6 +18,7 @@ GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
> +GEN_HANDLER_E(stxvb16x, 0x1F, 0x0C, 0x1F, 0, PPC_NONE, PPC2_ISA300),
>  
>  GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-19  6:19   ` David Gibson
@ 2016-09-19  6:50     ` David Gibson
  2016-09-19 10:36       ` Nikunj A Dadhania
  2016-09-19  8:32     ` Nikunj A Dadhania
  1 sibling, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-19  6:50 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> > Load 8byte at a time and manipulate.
> > 
> > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> > ---
> >  target-ppc/helper.h                 |  1 +
> >  target-ppc/mem_helper.c             |  5 +++++
> >  target-ppc/translate/vsx-impl.inc.c | 19 +++++--------------
> >  3 files changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> > index 966f2ce..9f6705d 100644
> > --- a/target-ppc/helper.h
> > +++ b/target-ppc/helper.h
> > @@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
> >  DEF_HELPER_3(lvebx, void, env, avr, tl)
> >  DEF_HELPER_3(lvehx, void, env, avr, tl)
> >  DEF_HELPER_3(lvewx, void, env, avr, tl)
> > +DEF_HELPER_1(deposit32x2, i64, i64)
> >  DEF_HELPER_3(stvebx, void, env, avr, tl)
> >  DEF_HELPER_3(stvehx, void, env, avr, tl)
> >  DEF_HELPER_3(stvewx, void, env, avr, tl)
> > diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> > index 6548715..86e493e 100644
> > --- a/target-ppc/mem_helper.c
> > +++ b/target-ppc/mem_helper.c
> > @@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
> >  #undef I
> >  #undef LVE
> >  
> > +uint64_t helper_deposit32x2(uint64_t x)
> > +{
> > +    return deposit64((x >> 32), 32, 32, (x));
> > +}
> 
> It seems a shame to drop out to a helper for something this simple.
> How hard would it be to implement this.. wordswap, I guess you'd call
> it.. in tcg ops?
> 
> I'm also not particularly fond of the deposit32x2 name, though a
> better one doesn't quickly come to mind.
> 
> > +
> >  #undef HI_IDX
> >  #undef LO_IDX
> >  
> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> > index eee6052..df278df 100644
> > --- a/target-ppc/translate/vsx-impl.inc.c
> > +++ b/target-ppc/translate/vsx-impl.inc.c
> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
> >  static void gen_lxvw4x(DisasContext *ctx)
> >  {
> >      TCGv EA;
> > -    TCGv_i64 tmp;
> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> >      if (unlikely(!ctx->vsx_enabled)) {
> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
> >      }
> >      gen_set_access_type(ctx, ACCESS_INT);
> >      EA = tcg_temp_new();
> > -    tmp = tcg_temp_new_i64();
> >  
> >      gen_addr_reg_index(ctx, EA);
> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> > -    tcg_gen_addi_tl(EA, EA, 4);
> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> > -
> > -    tcg_gen_addi_tl(EA, EA, 4);
> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> > -    tcg_gen_addi_tl(EA, EA, 4);
> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> > -
> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> > +    gen_helper_deposit32x2(xth, xth);
> > +    tcg_gen_addi_tl(EA, EA, 8);
> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> > +    gen_helper_deposit32x2(xtl, xtl);

..and I think this is wrong for BE mode.  The deposit32x2 will get the
words in the right order, but the bytes within each word will be wrong
because of the LE mode load on a BE setup.


> >      tcg_temp_free(EA);
> > -    tcg_temp_free_i64(tmp);
> >  }
> >  
> >  #define VSX_STORE_SCALAR(name, operation)                     \
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending)
  2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
                   ` (4 preceding siblings ...)
  2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x Nikunj A Dadhania
@ 2016-09-19  6:51 ` David Gibson
  2016-09-19  8:30   ` Nikunj A Dadhania
  5 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-19  6:51 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

On Fri, Sep 16, 2016 at 04:21:46PM +0530, Nikunj A Dadhania wrote:
> This series contains 4 new instructions for POWER9 ISA3.0
> Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.
> 
> Patches:
>     01:  darn: Deliver a random number

I've applied this one.

>     02:  lxvw4x - improve implementation
>     03:  stxv4x - improve implementation
>     04:  lxvh8x:  Load VSX Vector Halfword*8
>          stxvh8x:  Store VSX Vector Halfword*8
>     05:  lxvb16x: Load VSX Vector Byte*16
>          stxvb16x: Store VSX Vector Byte*16

I think the remainder still have bugs in BE mode.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending)
  2016-09-19  6:51 ` [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) David Gibson
@ 2016-09-19  8:30   ` Nikunj A Dadhania
  0 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-19  8:30 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Fri, Sep 16, 2016 at 04:21:46PM +0530, Nikunj A Dadhania wrote:
>> This series contains 4 new instructions for POWER9 ISA3.0
>> Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.
>> 
>> Patches:
>>     01:  darn: Deliver a random number
>
> I've applied this one.
>
>>     02:  lxvw4x - improve implementation
>>     03:  stxv4x - improve implementation
>>     04:  lxvh8x:  Load VSX Vector Halfword*8
>>          stxvh8x:  Store VSX Vector Halfword*8
>>     05:  lxvb16x: Load VSX Vector Byte*16
>>          stxvb16x: Store VSX Vector Byte*16
>
> I think the remainder still have bugs in BE mode.

Let me get back to the ISA and re-confirm, its a bit confusing.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-19  6:19   ` David Gibson
  2016-09-19  6:50     ` David Gibson
@ 2016-09-19  8:32     ` Nikunj A Dadhania
  1 sibling, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-19  8:32 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> Load 8byte at a time and manipulate.
>> 
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target-ppc/helper.h                 |  1 +
>>  target-ppc/mem_helper.c             |  5 +++++
>>  target-ppc/translate/vsx-impl.inc.c | 19 +++++--------------
>>  3 files changed, 11 insertions(+), 14 deletions(-)
>> 
>> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
>> index 966f2ce..9f6705d 100644
>> --- a/target-ppc/helper.h
>> +++ b/target-ppc/helper.h
>> @@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
>>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>>  DEF_HELPER_3(lvewx, void, env, avr, tl)
>> +DEF_HELPER_1(deposit32x2, i64, i64)
>>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>>  DEF_HELPER_3(stvewx, void, env, avr, tl)
>> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
>> index 6548715..86e493e 100644
>> --- a/target-ppc/mem_helper.c
>> +++ b/target-ppc/mem_helper.c
>> @@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
>>  #undef I
>>  #undef LVE
>>  
>> +uint64_t helper_deposit32x2(uint64_t x)
>> +{
>> +    return deposit64((x >> 32), 32, 32, (x));
>> +}
>
> It seems a shame to drop out to a helper for something this simple.
> How hard would it be to implement this.. wordswap, I guess you'd call
> it.. in tcg ops?

There is a tcg_ops for deposit64, I need to do the shifting and call it.
I will change that.

> I'm also not particularly fond of the deposit32x2 name, though a
> better one doesn't quickly come to mind.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-19  6:50     ` David Gibson
@ 2016-09-19 10:36       ` Nikunj A Dadhania
  2016-09-20  4:34         ` David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-19 10:36 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:
> [ Unknown signature status ]
> On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
>> > index eee6052..df278df 100644
>> > --- a/target-ppc/translate/vsx-impl.inc.c
>> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >  static void gen_lxvw4x(DisasContext *ctx)
>> >  {
>> >      TCGv EA;
>> > -    TCGv_i64 tmp;
>> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >      if (unlikely(!ctx->vsx_enabled)) {
>> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >      }
>> >      gen_set_access_type(ctx, ACCESS_INT);
>> >      EA = tcg_temp_new();
>> > -    tmp = tcg_temp_new_i64();
>> >  
>> >      gen_addr_reg_index(ctx, EA);
>> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> > -    tcg_gen_addi_tl(EA, EA, 4);
>> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> > -
>> > -    tcg_gen_addi_tl(EA, EA, 4);
>> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> > -    tcg_gen_addi_tl(EA, EA, 4);
>> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> > -
>> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> > +    gen_helper_deposit32x2(xth, xth);
>> > +    tcg_gen_addi_tl(EA, EA, 8);
>> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> > +    gen_helper_deposit32x2(xtl, xtl);
>
> ..and I think this is wrong for BE mode.  The deposit32x2 will get the
> words in the right order, but the bytes within each word will be wrong
> because of the LE mode load on a BE setup.

Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
The order within the word is not changed. Snippet of the test code at
the end of email. Can share full code if needed (maybe will do it in
kvm-unit-test)

Fedora24VM BE:

    [fedora@cloudimg ~]$ uname -a
    Linux cloudimg.localdomain 4.5.5-300.fc24.ppc64 #1 SMP Tue May 24 12:24:54 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux
    [fedora@cloudimg ~]$ ./lxv_x
    VRT32 = 00010203 20212223 30313233 40414243 
    
    [fedora@cloudimg ~]$ ./stxv_x 
     E0E1E2E3  E4E5E6E7  F0F1F2F3  F4F5F6F7 


TCG Result BE:
==============
    $ ./ppc64-linux-user/qemu-ppc64  -cpu POWER9 lxv_x
    VRT32 = 00010203 20212223 30313233 40414243 
    
    $ ./ppc64-linux-user/qemu-ppc64  -cpu POWER9 stxv_x
     E0E1E2E3  E4E5E6E7  F0F1F2F3  F4F5F6F7


Fedora24VM LE:
==============
    [fedora@cloudimg ~]$ uname -a
    Linux cloudimg.localdomain 4.5.5-300.fc24.ppc64le #1 SMP Tue May 24 12:23:26 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
    [fedora@cloudimg ~]$ ./lxv_x 
    VRT32 = 40414243 30313233 20212223 00010203 
    
    [fedora@cloudimg ~]$ ./stxv_x 
     F4F5F6F7  F0F1F2F3  E4E5E6E7  E0E1E2E3 

TCG Result LE:
==============
    $ ./ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 lxv_x
    VRT32 = 40414243 30313233 20212223 00010203 
    
    $ ./ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 stxv_x
     F4F5F6F7  F0F1F2F3  E4E5E6E7  E0E1E2E3 

Regards,
Nikunj


vsx.h:
======
#define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))

typedef union {
    __vector uint32_t v;
    uint32_t a[U32_SIZE];
} vuint32_t;

static void vec_put_u32(__vector uint32_t v) {
    int i;
    vuint32_t u;

    for (u.v = v, i = 0; i < U32_SIZE; ++i) {
        printf("%08x ", u.a[i]);
    }

    printf("\n");
}

static void print4x4(uint32_t *p)
{
    int i;
    if (!p)
        return;
    for(i = 0; i < 4; i++)
        printf(" %08X ", p[i]);
    printf("\n");
}

lxv_x.c:
========
  uint32_t rb32[4] = {0x00010203, 0x20212223, 0x30313233, 0x40414243};
  vuint32_t vrt32;
  
  asm("lxvw4x %x0, 0, %1 \n\t" \
      : "=ws"(vrt32) : "r"(&rb32));
  printf("VRT32 = "); vec_put_u32(vrt32);

stxv_x.c:
=========
  vuint32_t vrt32;

  vrt32.a[0] = 0xE0E1E2E3;
  vrt32.a[1] = 0xE4E5E6E7;
  vrt32.a[2] = 0xF0F1F2F3;
  vrt32.a[3] = 0xF4F5F6F7;

  asm("stxvw4x %x0, 0, %1 \n\t" \
      : : "ws"(vrt32.v), "r"(&rb32));
  print4x4(rb32);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-19 10:36       ` Nikunj A Dadhania
@ 2016-09-20  4:34         ` David Gibson
  2016-09-20 17:10           ` Nikunj A Dadhania
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-20  4:34 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 5881 bytes --]

On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> > [ Unknown signature status ]
> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> >> > index eee6052..df278df 100644
> >> > --- a/target-ppc/translate/vsx-impl.inc.c
> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
> >> >  static void gen_lxvw4x(DisasContext *ctx)
> >> >  {
> >> >      TCGv EA;
> >> > -    TCGv_i64 tmp;
> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> >> >      if (unlikely(!ctx->vsx_enabled)) {
> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
> >> >      }
> >> >      gen_set_access_type(ctx, ACCESS_INT);
> >> >      EA = tcg_temp_new();
> >> > -    tmp = tcg_temp_new_i64();
> >> >  
> >> >      gen_addr_reg_index(ctx, EA);
> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> >> > -
> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> >> > -
> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> >> > +    gen_helper_deposit32x2(xth, xth);
> >> > +    tcg_gen_addi_tl(EA, EA, 8);
> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> >> > +    gen_helper_deposit32x2(xtl, xtl);
> >
> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
> > words in the right order, but the bytes within each word will be wrong
> > because of the LE mode load on a BE setup.
> 
> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
> The order within the word is not changed. Snippet of the test code at
> the end of email. Can share full code if needed (maybe will do it in
> kvm-unit-test)

Ugh.. now I'm confused.  I would not have expected the results you've
seen from these tests.  But I still can't understand *how* the
emulation could be correct: IIUC MO_LEQ would mean it loads the 8
bytes as a single 64-bit LE integer.  Which should be the same as
loading one 32-bit LE integer into the low half of the target
register, then a 32-bit LE integer into the high half ot the target
register.

As I said above, the deposit32x2 will swap the order of the two ints,
but it won't byteswap the individual int32s which should have been BE
in memory.

Can you find the flaw in my reasoning?

> Fedora24VM BE:
> 
>     [fedora@cloudimg ~]$ uname -a
>     Linux cloudimg.localdomain 4.5.5-300.fc24.ppc64 #1 SMP Tue May 24 12:24:54 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux
>     [fedora@cloudimg ~]$ ./lxv_x
>     VRT32 = 00010203 20212223 30313233 40414243 
>     
>     [fedora@cloudimg ~]$ ./stxv_x 
>      E0E1E2E3  E4E5E6E7  F0F1F2F3  F4F5F6F7 
> 
> 
> TCG Result BE:
> ==============
>     $ ./ppc64-linux-user/qemu-ppc64  -cpu POWER9 lxv_x
>     VRT32 = 00010203 20212223 30313233 40414243 
>     
>     $ ./ppc64-linux-user/qemu-ppc64  -cpu POWER9 stxv_x
>      E0E1E2E3  E4E5E6E7  F0F1F2F3  F4F5F6F7
> 
> 
> Fedora24VM LE:
> ==============
>     [fedora@cloudimg ~]$ uname -a
>     Linux cloudimg.localdomain 4.5.5-300.fc24.ppc64le #1 SMP Tue May 24 12:23:26 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
>     [fedora@cloudimg ~]$ ./lxv_x 
>     VRT32 = 40414243 30313233 20212223 00010203 
>     
>     [fedora@cloudimg ~]$ ./stxv_x 
>      F4F5F6F7  F0F1F2F3  E4E5E6E7  E0E1E2E3 
> 
> TCG Result LE:
> ==============
>     $ ./ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 lxv_x
>     VRT32 = 40414243 30313233 20212223 00010203 
>     
>     $ ./ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 stxv_x
>      F4F5F6F7  F0F1F2F3  E4E5E6E7  E0E1E2E3 
> 
> Regards,
> Nikunj
> 
> 
> vsx.h:
> ======
> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
> 
> typedef union {
>     __vector uint32_t v;
>     uint32_t a[U32_SIZE];
> } vuint32_t;

I am a little suspicious that whatever the compiler does to convert
the vector to an array via this union might be undoing a byte reverse.

I'd be more confident if you used VSX instructions to extract and
store separately one of the 32-bit subwords of the vector.

> 
> static void vec_put_u32(__vector uint32_t v) {
>     int i;
>     vuint32_t u;
> 
>     for (u.v = v, i = 0; i < U32_SIZE; ++i) {
>         printf("%08x ", u.a[i]);
>     }
> 
>     printf("\n");
> }
> 
> static void print4x4(uint32_t *p)
> {
>     int i;
>     if (!p)
>         return;
>     for(i = 0; i < 4; i++)
>         printf(" %08X ", p[i]);
>     printf("\n");
> }
> 
> lxv_x.c:
> ========
>   uint32_t rb32[4] = {0x00010203, 0x20212223, 0x30313233, 0x40414243};
>   vuint32_t vrt32;
>   
>   asm("lxvw4x %x0, 0, %1 \n\t" \
>       : "=ws"(vrt32) : "r"(&rb32));
>   printf("VRT32 = "); vec_put_u32(vrt32);
> 
> stxv_x.c:
> =========
>   vuint32_t vrt32;
> 
>   vrt32.a[0] = 0xE0E1E2E3;
>   vrt32.a[1] = 0xE4E5E6E7;
>   vrt32.a[2] = 0xF0F1F2F3;
>   vrt32.a[3] = 0xF4F5F6F7;
> 
>   asm("stxvw4x %x0, 0, %1 \n\t" \
>       : : "ws"(vrt32.v), "r"(&rb32));
>   print4x4(rb32);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-20  4:34         ` David Gibson
@ 2016-09-20 17:10           ` Nikunj A Dadhania
  2016-09-21  1:57             ` David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-20 17:10 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
>> >> > index eee6052..df278df 100644
>> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> >  static void gen_lxvw4x(DisasContext *ctx)
>> >> >  {
>> >> >      TCGv EA;
>> >> > -    TCGv_i64 tmp;
>> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> >      if (unlikely(!ctx->vsx_enabled)) {
>> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> >      }
>> >> >      gen_set_access_type(ctx, ACCESS_INT);
>> >> >      EA = tcg_temp_new();
>> >> > -    tmp = tcg_temp_new_i64();
>> >> >  
>> >> >      gen_addr_reg_index(ctx, EA);
>> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> > -
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> > -
>> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> > +    gen_helper_deposit32x2(xth, xth);
>> >> > +    tcg_gen_addi_tl(EA, EA, 8);
>> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> > +    gen_helper_deposit32x2(xtl, xtl);
>> >
>> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
>> > words in the right order, but the bytes within each word will be wrong
>> > because of the LE mode load on a BE setup.
>> 
>> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> The order within the word is not changed. Snippet of the test code at
>> the end of email. Can share full code if needed (maybe will do it in
>> kvm-unit-test)
>
> Ugh.. now I'm confused.  I would not have expected the results you've
> seen from these tests.  But I still can't understand *how* the
> emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> bytes as a single 64-bit LE integer.

For both the case LE/BE we do a LE read ...

> Which should be the same as
> loading one 32-bit LE integer into the low half of the target
> register, then a 32-bit LE integer into the high half ot the target
> register.

.. The 64-bit integer read is not same in these cases. The input itself
would be in the order of the format.

Input rb32[]:  00010203  20212223  30313233  40414243 

LE:
helper_deposit32x2: 2021222300010203 
helper_deposit32x2: 4041424330313233

BE
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130

>
> As I said above, the deposit32x2 will swap the order of the two ints,
> but it won't byteswap the individual int32s which should have been BE
> in memory.
>
> Can you find the flaw in my reasoning?

One anomaly that I see in BE code generation: it also generates a
stxvw4x after lxvw4x. I am not sure why.

>>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
Input rb32[]:   00010203  20212223  30313233  40414243 

gen_lxvw4x: called
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130
gen_stxvw4x: called
helper_deposit32x2: 0302010023222120
helper_deposit32x2: 3332313043424140
Output VRT32:  00010203 20212223 30313233 40414243 

>> vsx.h:
>> ======
>> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>> 
>> typedef union {
>>     __vector uint32_t v;
>>     uint32_t a[U32_SIZE];
>> } vuint32_t;
>
> I am a little suspicious that whatever the compiler does to convert
> the vector to an array via this union might be undoing a byte reverse.
>
> I'd be more confident if you used VSX instructions to extract and
> store separately one of the 32-bit subwords of the vector.

I will try to figure those instructions.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-20 17:10           ` Nikunj A Dadhania
@ 2016-09-21  1:57             ` David Gibson
  2016-09-21  3:44               ` Nikunj A Dadhania
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-21  1:57 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 5571 bytes --]

On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > [ Unknown signature status ]
> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> > [ Unknown signature status ]
> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> >> >> > index eee6052..df278df 100644
> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
> >> >> >  static void gen_lxvw4x(DisasContext *ctx)
> >> >> >  {
> >> >> >      TCGv EA;
> >> >> > -    TCGv_i64 tmp;
> >> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> >> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> >> >> >      if (unlikely(!ctx->vsx_enabled)) {
> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
> >> >> >      }
> >> >> >      gen_set_access_type(ctx, ACCESS_INT);
> >> >> >      EA = tcg_temp_new();
> >> >> > -    tmp = tcg_temp_new_i64();
> >> >> >  
> >> >> >      gen_addr_reg_index(ctx, EA);
> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
> >> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> >> >> > -
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> >> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> >> >> > -
> >> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> >> >> > +    gen_helper_deposit32x2(xth, xth);
> >> >> > +    tcg_gen_addi_tl(EA, EA, 8);
> >> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> >> >> > +    gen_helper_deposit32x2(xtl, xtl);
> >> >
> >> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
> >> > words in the right order, but the bytes within each word will be wrong
> >> > because of the LE mode load on a BE setup.
> >> 
> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
> >> The order within the word is not changed. Snippet of the test code at
> >> the end of email. Can share full code if needed (maybe will do it in
> >> kvm-unit-test)
> >
> > Ugh.. now I'm confused.  I would not have expected the results you've
> > seen from these tests.  But I still can't understand *how* the
> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> > bytes as a single 64-bit LE integer.
> 
> For both the case LE/BE we do a LE read ...

.. and I can't see how that can be right for the BE case.

> > Which should be the same as
> > loading one 32-bit LE integer into the low half of the target
> > register, then a 32-bit LE integer into the high half ot the target
> > register.
> 
> .. The 64-bit integer read is not same in these cases. The input itself
> would be in the order of the format.
> 
> Input rb32[]:  00010203  20212223  30313233  40414243 
> 
> LE:
> helper_deposit32x2: 2021222300010203 
> helper_deposit32x2: 4041424330313233
> 
> BE
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130


Sorry.. I can't really follow the above, because I'm not sure if
you're displaying the bytes within each word in significance order, or
increasing-address order.

> >
> > As I said above, the deposit32x2 will swap the order of the two ints,
> > but it won't byteswap the individual int32s which should have been BE
> > in memory.
> >
> > Can you find the flaw in my reasoning?
> 
> One anomaly that I see in BE code generation: it also generates a
> stxvw4x after lxvw4x. I am not sure why.

Ah... see I'm wondering if it's using the stxvw4x to store back to the
union which you then get the results from.  If that's so it could
explain the results, since the bug I suspect is in lxvw4x would be
cancelled out by the corresponding bug in stxv4wx, which is exactly
why I'd prefer the approach to testing mentioned below.

> 
> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
> Input rb32[]:   00010203  20212223  30313233  40414243 
> 
> gen_lxvw4x: called
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130
> gen_stxvw4x: called
> helper_deposit32x2: 0302010023222120
> helper_deposit32x2: 3332313043424140
> Output VRT32:  00010203 20212223 30313233 40414243 
> 
> >> vsx.h:
> >> ======
> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
> >> 
> >> typedef union {
> >>     __vector uint32_t v;
> >>     uint32_t a[U32_SIZE];
> >> } vuint32_t;
> >
> > I am a little suspicious that whatever the compiler does to convert
> > the vector to an array via this union might be undoing a byte reverse.
> >
> > I'd be more confident if you used VSX instructions to extract and
> > store separately one of the 32-bit subwords of the vector.
> 
> I will try to figure those instructions.

Ok, thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
  2016-09-21  1:57             ` David Gibson
@ 2016-09-21  3:44               ` Nikunj A Dadhania
  0 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-21  3:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson <david@gibson.dropbear.id.au> writes:
>> >> > [ Unknown signature status ]
>> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > index eee6052..df278df 100644
>> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> >> >  static void gen_lxvw4x(DisasContext *ctx)
>> >> >> >  {
>> >> >> >      TCGv EA;
>> >> >> > -    TCGv_i64 tmp;
>> >> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> >> >      if (unlikely(!ctx->vsx_enabled)) {
>> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> >> >      }
>> >> >> >      gen_set_access_type(ctx, ACCESS_INT);
>> >> >> >      EA = tcg_temp_new();
>> >> >> > -    tmp = tcg_temp_new_i64();
>> >> >> >  
>> >> >> >      gen_addr_reg_index(ctx, EA);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> >> > -
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> >> > -
>> >> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > +    gen_helper_deposit32x2(xth, xth);
>> >> >> > +    tcg_gen_addi_tl(EA, EA, 8);
>> >> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > +    gen_helper_deposit32x2(xtl, xtl);
>> >> >
>> >> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
>> >> > words in the right order, but the bytes within each word will be wrong
>> >> > because of the LE mode load on a BE setup.
>> >> 
>> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> >> The order within the word is not changed. Snippet of the test code at
>> >> the end of email. Can share full code if needed (maybe will do it in
>> >> kvm-unit-test)
>> >
>> > Ugh.. now I'm confused.  I would not have expected the results you've
>> > seen from these tests.  But I still can't understand *how* the
>> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
>> > bytes as a single 64-bit LE integer.
>> 
>> For both the case LE/BE we do a LE read ...
>
> .. and I can't see how that can be right for the BE case.
>
>> > Which should be the same as
>> > loading one 32-bit LE integer into the low half of the target
>> > register, then a 32-bit LE integer into the high half ot the target
>> > register.
>> 
>> .. The 64-bit integer read is not same in these cases. The input itself
>> would be in the order of the format.
>> 
>> Input rb32[]:  00010203  20212223  30313233  40414243 
>> 
>> LE:
>> helper_deposit32x2: 2021222300010203 
>> helper_deposit32x2: 4041424330313233
>> 
>> BE
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>
>
> Sorry.. I can't really follow the above, because I'm not sure if
> you're displaying the bytes within each word in significance order, or
> increasing-address order.

Ah, thats just a print inside the helper just to check what MO_LEQ returned:

 uint64_t helper_deposit32x2(uint64_t x)
 {
     fprintf(stderr, "%s: %016lx\n", __func__, x);
     return deposit64((x >> 32), 32, 32, (x));
 }

>
>> >
>> > As I said above, the deposit32x2 will swap the order of the two ints,
>> > but it won't byteswap the individual int32s which should have been BE
>> > in memory.
>> >
>> > Can you find the flaw in my reasoning?
>> 
>> One anomaly that I see in BE code generation: it also generates a
>> stxvw4x after lxvw4x. I am not sure why.
>
> Ah... see I'm wondering if it's using the stxvw4x to store back to the
> union which you then get the results from.  If that's so it could
> explain the results, since the bug I suspect is in lxvw4x would be
> cancelled out by the corresponding bug in stxv4wx, which is exactly
> why I'd prefer the approach to testing mentioned below.

Yes, I am investigating it.

>> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
>> Input rb32[]:   00010203  20212223  30313233  40414243 
>> 
>> gen_lxvw4x: called
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>> gen_stxvw4x: called
>> helper_deposit32x2: 0302010023222120
>> helper_deposit32x2: 3332313043424140
>> Output VRT32:  00010203 20212223 30313233 40414243 
>> 
>> >> vsx.h:
>> >> ======
>> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>> >> 
>> >> typedef union {
>> >>     __vector uint32_t v;
>> >>     uint32_t a[U32_SIZE];
>> >> } vuint32_t;
>> >
>> > I am a little suspicious that whatever the compiler does to convert
>> > the vector to an array via this union might be undoing a byte reverse.
>> >
>> > I'd be more confident if you used VSX instructions to extract and
>> > store separately one of the 32-bit subwords of the vector.
>> 
>> I will try to figure those instructions.
>
> Ok, thanks.
>

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-09-21  3:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
2016-09-19  6:19   ` David Gibson
2016-09-19  6:50     ` David Gibson
2016-09-19 10:36       ` Nikunj A Dadhania
2016-09-20  4:34         ` David Gibson
2016-09-20 17:10           ` Nikunj A Dadhania
2016-09-21  1:57             ` David Gibson
2016-09-21  3:44               ` Nikunj A Dadhania
2016-09-19  8:32     ` Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x Nikunj A Dadhania
2016-09-19  6:33   ` David Gibson
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x Nikunj A Dadhania
2016-09-19  6:35   ` David Gibson
2016-09-19  6:51 ` [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) David Gibson
2016-09-19  8:30   ` Nikunj A Dadhania

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.