[PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask
@ 2021-11-12 14:14 matheus.ferst
  2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

This is a small patch series just to allow Ubuntu 21.10 to boot with
-cpu POWER10. Glibc 2.34 is using vextractbm, so the init is killed by
SIGILL without the second patch of this series. The other two insns. are
included as they are somewhat close to Vector Extract Mask (at least in
pseudocode).

v2:
- Applied rth suggestions to VEXTRACT[BHWDQ]M and MTVSR[BHWDQ]M[I]

Matheus Ferst (3):
  target/ppc: Implement Vector Expand Mask
  target/ppc: Implement Vector Extract Mask
  target/ppc: Implement Vector Mask Move insns

 target/ppc/insn32.decode            |  28 ++++
 target/ppc/translate/vmx-impl.c.inc | 209 ++++++++++++++++++++++++++++
 2 files changed, 237 insertions(+)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask
  2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
  2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
vexpandbm: Vector Expand Byte Mask
vexpandhm: Vector Expand Halfword Mask
vexpandwm: Vector Expand Word Mask
vexpanddm: Vector Expand Doubleword Mask
vexpandqm: Vector Expand Quadword Mask

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            | 11 ++++++++++
 target/ppc/translate/vmx-impl.c.inc | 34 +++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index e135b8aba4..9a28f1d266 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -56,6 +56,9 @@
 &VX_uim4        vrt uim vrb
 @VX_uim4        ...... vrt:5 . uim:4 vrb:5 ...........  &VX_uim4
 
+&VX_tb          vrt vrb
+@VX_tb          ...... vrt:5 ..... vrb:5 ...........    &VX_tb
+
 &X              rt ra rb
 @X              ...... rt:5 ra:5 rb:5 .......... .      &X
 
@@ -408,6 +411,14 @@ VINSWVRX        000100 ..... ..... ..... 00110001111    @VX
 VSLDBI          000100 ..... ..... ..... 00 ... 010110  @VN
 VSRDBI          000100 ..... ..... ..... 01 ... 010110  @VN
 
+## Vector Mask Manipulation Instructions
+
+VEXPANDBM       000100 ..... 00000 ..... 11001000010    @VX_tb
+VEXPANDHM       000100 ..... 00001 ..... 11001000010    @VX_tb
+VEXPANDWM       000100 ..... 00010 ..... 11001000010    @VX_tb
+VEXPANDDM       000100 ..... 00011 ..... 11001000010    @VX_tb
+VEXPANDQM       000100 ..... 00100 ..... 11001000010    @VX_tb
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index b361f73a67..58aca58f0f 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1505,6 +1505,40 @@ static bool trans_VSRDBI(DisasContext *ctx, arg_VN *a)
     return true;
 }
 
+static bool do_vexpand(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_sari(vece, avr_full_offset(a->vrt), avr_full_offset(a->vrb),
+                      (8 << vece) - 1, 16, 16);
+
+    return true;
+}
+
+TRANS(VEXPANDBM, do_vexpand, MO_8)
+TRANS(VEXPANDHM, do_vexpand, MO_16)
+TRANS(VEXPANDWM, do_vexpand, MO_32)
+TRANS(VEXPANDDM, do_vexpand, MO_64)
+
+static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
+{
+    TCGv_i64 tmp;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tmp = tcg_temp_new_i64();
+
+    get_avr64(tmp, a->vrb, true);
+    tcg_gen_sari_i64(tmp, tmp, 63);
+    set_avr64(a->vrt, tmp, false);
+    set_avr64(a->vrt, tmp, true);
+
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
  2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
  2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
  2021-12-03 13:00   ` Richard Henderson
  2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
  2021-12-03  8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater
  3 siblings, 1 reply; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
vextractbm: Vector Extract Byte Mask
vextracthm: Vector Extract Halfword Mask
vextractwm: Vector Extract Word Mask
vextractdm: Vector Extract Doubleword Mask
vextractqm: Vector Extract Quadword Mask

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v2:
- Applied rth suggestion to do_vextractm
---
 target/ppc/insn32.decode            |  6 +++
 target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 9a28f1d266..639ac22bf0 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -419,6 +419,12 @@ VEXPANDWM       000100 ..... 00010 ..... 11001000010    @VX_tb
 VEXPANDDM       000100 ..... 00011 ..... 11001000010    @VX_tb
 VEXPANDQM       000100 ..... 00100 ..... 11001000010    @VX_tb
 
+VEXTRACTBM      000100 ..... 01000 ..... 11001000010    @VX_tb
+VEXTRACTHM      000100 ..... 01001 ..... 11001000010    @VX_tb
+VEXTRACTWM      000100 ..... 01010 ..... 11001000010    @VX_tb
+VEXTRACTDM      000100 ..... 01011 ..... 11001000010    @VX_tb
+VEXTRACTQM      000100 ..... 01100 ..... 11001000010    @VX_tb
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 58aca58f0f..dd7337c2f2 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
     return true;
 }
 
+static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+    const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+    TCGv_i64 t, b, tmp;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    t = tcg_const_i64(0);
+    b = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+
+    for (int w = 0; w < 2; w++) {
+        get_avr64(b, a->vrb, w);
+
+        for (int i = 0; i < elem_count_half; i++) {
+            int in_bit = (i + 1) * elem_width - 1;
+            int out_bit = w * elem_count_half + i;
+
+            if (in_bit > out_bit) {
+                tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
+            } else {
+                tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
+            }
+            tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
+            tcg_gen_or_i64(t, t, tmp);
+        }
+    }
+    tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
+
+    tcg_temp_free_i64(t);
+    tcg_temp_free_i64(b);
+    tcg_temp_free_i64(tmp);
+
+    return true;
+}
+
+TRANS(VEXTRACTBM, do_vextractm, MO_8)
+TRANS(VEXTRACTHM, do_vextractm, MO_16)
+TRANS(VEXTRACTWM, do_vextractm, MO_32)
+TRANS(VEXTRACTDM, do_vextractm, MO_64)
+
+static bool trans_VEXTRACTQM(DisasContext *ctx, arg_VX_tb *a)
+{
+    TCGv_i64 tmp;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tmp = tcg_temp_new_i64();
+
+    get_avr64(tmp, a->vrb, true);
+    tcg_gen_shri_i64(tmp, tmp, 63);
+    tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], tmp);
+
+    tcg_temp_free_i64(tmp);
+
+    return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns
  2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
  2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
  2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
  2021-12-03 13:01   ` Richard Henderson
  2021-12-03  8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater
  3 siblings, 1 reply; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
mtvsrbm: Move to VSR Byte Mask
mtvsrhm: Move to VSR Halfword Mask
mtvsrwm: Move to VSR Word Mask
mtvsrdm: Move to VSR Doubleword Mask
mtvsrqm: Move to VSR Quadword Mask
mtvsrbmi: Move to VSR Byte Mask Immediate

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v2:
- Applied rth suggestions to do_mtvsrm and trans_MTVSRBMI
---
 target/ppc/insn32.decode            |  11 +++
 target/ppc/translate/vmx-impl.c.inc | 115 ++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 639ac22bf0..f68931f4f3 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -40,6 +40,10 @@
 %ds_rtp         22:4   !function=times_2
 @DS_rtp         ...... ....0 ra:5 .............. ..             &D rt=%ds_rtp si=%ds_si
 
+&DX_b           vrt b
+%dx_b           6:10 16:5 0:1
+@DX_b           ...... vrt:5  ..... .......... ..... .          &DX_b b=%dx_b
+
 &DX             rt d
 %dx_d           6:s10 16:5 0:1
 @DX             ...... rt:5  ..... .......... ..... .   &DX d=%dx_d
@@ -413,6 +417,13 @@ VSRDBI          000100 ..... ..... ..... 01 ... 010110  @VN
 
 ## Vector Mask Manipulation Instructions
 
+MTVSRBM         000100 ..... 10000 ..... 11001000010    @VX_tb
+MTVSRHM         000100 ..... 10001 ..... 11001000010    @VX_tb
+MTVSRWM         000100 ..... 10010 ..... 11001000010    @VX_tb
+MTVSRDM         000100 ..... 10011 ..... 11001000010    @VX_tb
+MTVSRQM         000100 ..... 10100 ..... 11001000010    @VX_tb
+MTVSRBMI        000100 ..... ..... .......... 01010 .   @DX_b
+
 VEXPANDBM       000100 ..... 00000 ..... 11001000010    @VX_tb
 VEXPANDHM       000100 ..... 00001 ..... 11001000010    @VX_tb
 VEXPANDWM       000100 ..... 00010 ..... 11001000010    @VX_tb
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index dd7337c2f2..404767e4ec 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1599,6 +1599,121 @@ static bool trans_VEXTRACTQM(DisasContext *ctx, arg_VX_tb *a)
     return true;
 }
 
+static bool do_mtvsrm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+    const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+    uint64_t c;
+    int i, j;
+    TCGv_i64 hi, lo, t0, t1;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    hi = tcg_temp_new_i64();
+    lo = tcg_temp_new_i64();
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+    tcg_gen_extu_tl_i64(t0, cpu_gpr[a->vrb]);
+    tcg_gen_extract_i64(hi, t0, elem_count_half, elem_count_half);
+    tcg_gen_extract_i64(lo, t0, 0, elem_count_half);
+
+    /*
+     * Spread the bits into their respective elements.
+     * E.g. for bytes:
+     * 00000000000000000000000000000000000000000000000000000000abcdefgh
+     *   << 32 - 4
+     * 0000000000000000000000000000abcdefgh0000000000000000000000000000
+     *   |
+     * 0000000000000000000000000000abcdefgh00000000000000000000abcdefgh
+     *   << 16 - 2
+     * 00000000000000abcdefgh00000000000000000000abcdefgh00000000000000
+     *   |
+     * 00000000000000abcdefgh000000abcdefgh000000abcdefgh000000abcdefgh
+     *   << 8 - 1
+     * 0000000abcdefgh000000abcdefgh000000abcdefgh000000abcdefgh0000000
+     *   |
+     * 0000000abcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgh
+     *   & dup(1)
+     * 0000000a0000000b0000000c0000000d0000000e0000000f0000000g0000000h
+     *   * 0xff
+     * aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh
+     */
+    for (i = elem_count_half / 2, j = 32; i > 0; i >>= 1, j >>= 1) {
+        tcg_gen_shli_i64(t0, hi, j - i);
+        tcg_gen_shli_i64(t1, lo, j - i);
+        tcg_gen_or_i64(hi, hi, t0);
+        tcg_gen_or_i64(lo, lo, t1);
+    }
+
+    c = dup_const(vece, 1);
+    tcg_gen_andi_i64(hi, hi, c);
+    tcg_gen_andi_i64(lo, lo, c);
+
+    c = MAKE_64BIT_MASK(0, elem_width);
+    tcg_gen_muli_i64(hi, hi, c);
+    tcg_gen_muli_i64(lo, lo, c);
+
+    set_avr64(a->vrt, lo, false);
+    set_avr64(a->vrt, hi, true);
+
+    tcg_temp_free_i64(hi);
+    tcg_temp_free_i64(lo);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+
+    return true;
+}
+
+TRANS(MTVSRBM, do_mtvsrm, MO_8)
+TRANS(MTVSRHM, do_mtvsrm, MO_16)
+TRANS(MTVSRWM, do_mtvsrm, MO_32)
+TRANS(MTVSRDM, do_mtvsrm, MO_64)
+
+static bool trans_MTVSRQM(DisasContext *ctx, arg_VX_tb *a)
+{
+    TCGv_i64 tmp;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tmp = tcg_temp_new_i64();
+
+    tcg_gen_ext_tl_i64(tmp, cpu_gpr[a->vrb]);
+    tcg_gen_sextract_i64(tmp, tmp, 0, 1);
+    set_avr64(a->vrt, tmp, false);
+    set_avr64(a->vrt, tmp, true);
+
+    tcg_temp_free_i64(tmp);
+
+    return true;
+}
+
+static bool trans_MTVSRBMI(DisasContext *ctx, arg_DX_b *a)
+{
+    const uint64_t mask = dup_const(MO_8, 1);
+    uint64_t hi, lo;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    hi = extract16(a->b, 8, 8);
+    lo = extract16(a->b, 0, 8);
+
+    for (int i = 4, j = 32; i > 0; i >>= 1, j >>= 1) {
+        hi |= hi << (j - i);
+        lo |= lo << (j - i);
+    }
+
+    hi = (hi & mask) * 0xFF;
+    lo = (lo & mask) * 0xFF;
+
+    set_avr64(a->vrt, tcg_constant_i64(hi), true);
+    set_avr64(a->vrt, tcg_constant_i64(lo), false);
+
+    return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask
  2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
                   ` (2 preceding siblings ...)
  2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
@ 2021-12-03  8:34 ` Cédric Le Goater
  3 siblings, 0 replies; 8+ messages in thread
From: Cédric Le Goater @ 2021-12-03  8:34 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, david

Hello,

On 11/12/21 15:14, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> This is a small patch series just to allow Ubuntu 21.10 to boot with
> -cpu POWER10. Glibc 2.34 is using vextractbm, so the init is killed by
> SIGILL without the second patch of this series. The other two insns. are
> included as they are somewhat close to Vector Extract Mask (at least in
> pseudocode).
> 
> v2:
> - Applied rth suggestions to VEXTRACT[BHWDQ]M and MTVSR[BHWDQ]M[I]

I am planning to include these patches in the next ppc pull request
for QEMU 7.0 since they fix support for recent glibc/distros. Unless
something still needs to be done fpr patch 2+3.

Thanks,

C.

> 
> Matheus Ferst (3):
>    target/ppc: Implement Vector Expand Mask
>    target/ppc: Implement Vector Extract Mask
>    target/ppc: Implement Vector Mask Move insns
> 
>   target/ppc/insn32.decode            |  28 ++++
>   target/ppc/translate/vmx-impl.c.inc | 209 ++++++++++++++++++++++++++++
>   2 files changed, 237 insertions(+)
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
  2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
@ 2021-12-03 13:00   ` Richard Henderson
  2021-12-03 13:21     ` Richard Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:00 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> vextractbm: Vector Extract Byte Mask
> vextracthm: Vector Extract Halfword Mask
> vextractwm: Vector Extract Word Mask
> vextractdm: Vector Extract Doubleword Mask
> vextractqm: Vector Extract Quadword Mask
> 
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v2:
> - Applied rth suggestion to do_vextractm
> ---
>   target/ppc/insn32.decode            |  6 +++
>   target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
>   2 files changed, 66 insertions(+)
> 
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index 9a28f1d266..639ac22bf0 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -419,6 +419,12 @@ VEXPANDWM       000100 ..... 00010 ..... 11001000010    @VX_tb
>   VEXPANDDM       000100 ..... 00011 ..... 11001000010    @VX_tb
>   VEXPANDQM       000100 ..... 00100 ..... 11001000010    @VX_tb
>   
> +VEXTRACTBM      000100 ..... 01000 ..... 11001000010    @VX_tb
> +VEXTRACTHM      000100 ..... 01001 ..... 11001000010    @VX_tb
> +VEXTRACTWM      000100 ..... 01010 ..... 11001000010    @VX_tb
> +VEXTRACTDM      000100 ..... 01011 ..... 11001000010    @VX_tb
> +VEXTRACTQM      000100 ..... 01100 ..... 11001000010    @VX_tb
> +
>   # VSX Load/Store Instructions
>   
>   LXV             111101 ..... ..... ............ . 001   @DQ_TSX
> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
> index 58aca58f0f..dd7337c2f2 100644
> --- a/target/ppc/translate/vmx-impl.c.inc
> +++ b/target/ppc/translate/vmx-impl.c.inc
> @@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
>       return true;
>   }
>   
> +static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
> +{
> +    const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
> +    TCGv_i64 t, b, tmp;
> +
> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
> +    REQUIRE_VECTOR(ctx);
> +
> +    t = tcg_const_i64(0);
> +    b = tcg_temp_new_i64();
> +    tmp = tcg_temp_new_i64();
> +
> +    for (int w = 0; w < 2; w++) {
> +        get_avr64(b, a->vrb, w);
> +
> +        for (int i = 0; i < elem_count_half; i++) {
> +            int in_bit = (i + 1) * elem_width - 1;
> +            int out_bit = w * elem_count_half + i;
> +
> +            if (in_bit > out_bit) {
> +                tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
> +            } else {
> +                tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
> +            }
> +            tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
> +            tcg_gen_or_i64(t, t, tmp);
> +        }
> +    }
> +    tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);

Pardon me.  I realized after the fact that we can run the same algorithm as for mtvsrm (in 
the next patch) in reverse.

   & dup(1)
.......a.......b.......c.......d.......e.......f.......g.......h
   >> 32 - 4
...................................a.......b.......c.......d....
   |
.......a.......b.......c.......d...a...e...b...f...c...g...d...h
   >> 16 - 2
.....................a.......b.......c.......d...a...e...b...f..
   |
.......a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.b.d.f.h
   >> 8 - 1
..............a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.
   |
.......a......ab.....abc....abcd...abcde..abcdef.abcdefgabcdefgh
   & 0xff
........................................................abcdefgh

where one of the two final masks can be done via deposit:

     tcg_gen_andi_i64(hi, hi, 0xff);
     tcg_gen_deposit_i64(lo, lo, hi, 8, 56);

Which will reduce the instruction count of this implementation by half.


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns
  2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
@ 2021-12-03 13:01   ` Richard Henderson
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:01 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> mtvsrbm: Move to VSR Byte Mask
> mtvsrhm: Move to VSR Halfword Mask
> mtvsrwm: Move to VSR Word Mask
> mtvsrdm: Move to VSR Doubleword Mask
> mtvsrqm: Move to VSR Quadword Mask
> mtvsrbmi: Move to VSR Byte Mask Immediate
> 
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v2:
> - Applied rth suggestions to do_mtvsrm and trans_MTVSRBMI

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
  2021-12-03 13:00   ` Richard Henderson
@ 2021-12-03 13:21     ` Richard Henderson
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:21 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 12/3/21 5:00 AM, Richard Henderson wrote:
> On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> Implement the following PowerISA v3.1 instructions:
>> vextractbm: Vector Extract Byte Mask
>> vextracthm: Vector Extract Halfword Mask
>> vextractwm: Vector Extract Word Mask
>> vextractdm: Vector Extract Doubleword Mask
>> vextractqm: Vector Extract Quadword Mask
>>
>> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> v2:
>> - Applied rth suggestion to do_vextractm
>> ---
>>   target/ppc/insn32.decode            |  6 +++
>>   target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
>>   2 files changed, 66 insertions(+)
>>
>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>> index 9a28f1d266..639ac22bf0 100644
>> --- a/target/ppc/insn32.decode
>> +++ b/target/ppc/insn32.decode
>> @@ -419,6 +419,12 @@ VEXPANDWM       000100 ..... 00010 ..... 11001000010    @VX_tb
>>   VEXPANDDM       000100 ..... 00011 ..... 11001000010    @VX_tb
>>   VEXPANDQM       000100 ..... 00100 ..... 11001000010    @VX_tb
>> +VEXTRACTBM      000100 ..... 01000 ..... 11001000010    @VX_tb
>> +VEXTRACTHM      000100 ..... 01001 ..... 11001000010    @VX_tb
>> +VEXTRACTWM      000100 ..... 01010 ..... 11001000010    @VX_tb
>> +VEXTRACTDM      000100 ..... 01011 ..... 11001000010    @VX_tb
>> +VEXTRACTQM      000100 ..... 01100 ..... 11001000010    @VX_tb
>> +
>>   # VSX Load/Store Instructions
>>   LXV             111101 ..... ..... ............ . 001   @DQ_TSX
>> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
>> index 58aca58f0f..dd7337c2f2 100644
>> --- a/target/ppc/translate/vmx-impl.c.inc
>> +++ b/target/ppc/translate/vmx-impl.c.inc
>> @@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
>>       return true;
>>   }
>> +static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
>> +{
>> +    const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
>> +    TCGv_i64 t, b, tmp;
>> +
>> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
>> +    REQUIRE_VECTOR(ctx);
>> +
>> +    t = tcg_const_i64(0);
>> +    b = tcg_temp_new_i64();
>> +    tmp = tcg_temp_new_i64();
>> +
>> +    for (int w = 0; w < 2; w++) {
>> +        get_avr64(b, a->vrb, w);
>> +
>> +        for (int i = 0; i < elem_count_half; i++) {
>> +            int in_bit = (i + 1) * elem_width - 1;
>> +            int out_bit = w * elem_count_half + i;
>> +
>> +            if (in_bit > out_bit) {
>> +                tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
>> +            } else {
>> +                tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
>> +            }
>> +            tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
>> +            tcg_gen_or_i64(t, t, tmp);
>> +        }
>> +    }
>> +    tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
> 
> Pardon me.  I realized after the fact that we can run the same algorithm as for mtvsrm (in 
> the next patch) in reverse.
> 
>    & dup(1)
> .......a.......b.......c.......d.......e.......f.......g.......h
>    >> 32 - 4
> ...................................a.......b.......c.......d....
>    |
> .......a.......b.......c.......d...a...e...b...f...c...g...d...h
>    >> 16 - 2
> .....................a.......b.......c.......d...a...e...b...f..
>    |
> .......a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.b.d.f.h
>    >> 8 - 1
> ..............a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.
>    |
> .......a......ab.....abc....abcd...abcde..abcdef.abcdefgabcdefgh
>    & 0xff
> ........................................................abcdefgh
> 
> where one of the two final masks can be done via deposit:
> 
>      tcg_gen_andi_i64(hi, hi, 0xff);
>      tcg_gen_deposit_i64(lo, lo, hi, 8, 56);
> 
> Which will reduce the instruction count of this implementation by half.

Oops, ENOCOFFEE.  Of course the input bit comes from the msb of the element, not the lsb. 
  Three different options:

(1) Begin with a shift of elem_count_half - 1, then do the above,

(2) Change the initial mask to the msb, then extract from elem_count_half - 1.

(3) Do left shifts so that we collect the bits at the msb of
     the word.  This probably results in the easiest concatenation
     in the end:

     tcg_gen_shri_i64(hi, hi, 64 - elem_count_half);
     tcg_gen_extract2_i64(lo, lo, hi, 64 - 2 * elem_count_half);


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-03 13:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
2021-12-03 13:00   ` Richard Henderson
2021-12-03 13:21     ` Richard Henderson
2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
2021-12-03 13:01   ` Richard Henderson
2021-12-03  8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.