* [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask
@ 2021-11-12 14:14 matheus.ferst
2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david
From: Matheus Ferst <matheus.ferst@eldorado.org.br>
This is a small patch series just to allow Ubuntu 21.10 to boot with
-cpu POWER10. Glibc 2.34 is using vextractbm, so the init is killed by
SIGILL without the second patch of this series. The other two insns. are
included as they are somewhat close to Vector Extract Mask (at least in
pseudocode).
v2:
- Applied rth suggestions to VEXTRACT[BHWDQ]M and MTVSR[BHWDQ]M[I]
Matheus Ferst (3):
target/ppc: Implement Vector Expand Mask
target/ppc: Implement Vector Extract Mask
target/ppc: Implement Vector Mask Move insns
target/ppc/insn32.decode | 28 ++++
target/ppc/translate/vmx-impl.c.inc | 209 ++++++++++++++++++++++++++++
2 files changed, 237 insertions(+)
--
2.25.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david
From: Matheus Ferst <matheus.ferst@eldorado.org.br>
Implement the following PowerISA v3.1 instructions:
vexpandbm: Vector Expand Byte Mask
vexpandhm: Vector Expand Halfword Mask
vexpandwm: Vector Expand Word Mask
vexpanddm: Vector Expand Doubleword Mask
vexpandqm: Vector Expand Quadword Mask
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
target/ppc/insn32.decode | 11 ++++++++++
target/ppc/translate/vmx-impl.c.inc | 34 +++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index e135b8aba4..9a28f1d266 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -56,6 +56,9 @@
&VX_uim4 vrt uim vrb
@VX_uim4 ...... vrt:5 . uim:4 vrb:5 ........... &VX_uim4
+&VX_tb vrt vrb
+@VX_tb ...... vrt:5 ..... vrb:5 ........... &VX_tb
+
&X rt ra rb
@X ...... rt:5 ra:5 rb:5 .......... . &X
@@ -408,6 +411,14 @@ VINSWVRX 000100 ..... ..... ..... 00110001111 @VX
VSLDBI 000100 ..... ..... ..... 00 ... 010110 @VN
VSRDBI 000100 ..... ..... ..... 01 ... 010110 @VN
+## Vector Mask Manipulation Instructions
+
+VEXPANDBM 000100 ..... 00000 ..... 11001000010 @VX_tb
+VEXPANDHM 000100 ..... 00001 ..... 11001000010 @VX_tb
+VEXPANDWM 000100 ..... 00010 ..... 11001000010 @VX_tb
+VEXPANDDM 000100 ..... 00011 ..... 11001000010 @VX_tb
+VEXPANDQM 000100 ..... 00100 ..... 11001000010 @VX_tb
+
# VSX Load/Store Instructions
LXV 111101 ..... ..... ............ . 001 @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index b361f73a67..58aca58f0f 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1505,6 +1505,40 @@ static bool trans_VSRDBI(DisasContext *ctx, arg_VN *a)
return true;
}
+static bool do_vexpand(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ tcg_gen_gvec_sari(vece, avr_full_offset(a->vrt), avr_full_offset(a->vrb),
+ (8 << vece) - 1, 16, 16);
+
+ return true;
+}
+
+TRANS(VEXPANDBM, do_vexpand, MO_8)
+TRANS(VEXPANDHM, do_vexpand, MO_16)
+TRANS(VEXPANDWM, do_vexpand, MO_32)
+TRANS(VEXPANDDM, do_vexpand, MO_64)
+
+static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
+{
+ TCGv_i64 tmp;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ tmp = tcg_temp_new_i64();
+
+ get_avr64(tmp, a->vrb, true);
+ tcg_gen_sari_i64(tmp, tmp, 63);
+ set_avr64(a->vrt, tmp, false);
+ set_avr64(a->vrt, tmp, true);
+
+ tcg_temp_free_i64(tmp);
+ return true;
+}
+
#define GEN_VAFORM_PAIRED(name0, name1, opc2) \
static void glue(gen_, name0##_##name1)(DisasContext *ctx) \
{ \
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
2021-12-03 13:00 ` Richard Henderson
2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
2021-12-03 8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater
3 siblings, 1 reply; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david
From: Matheus Ferst <matheus.ferst@eldorado.org.br>
Implement the following PowerISA v3.1 instructions:
vextractbm: Vector Extract Byte Mask
vextracthm: Vector Extract Halfword Mask
vextractwm: Vector Extract Word Mask
vextractdm: Vector Extract Doubleword Mask
vextractqm: Vector Extract Quadword Mask
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v2:
- Applied rth suggestion to do_vextractm
---
target/ppc/insn32.decode | 6 +++
target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 9a28f1d266..639ac22bf0 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -419,6 +419,12 @@ VEXPANDWM 000100 ..... 00010 ..... 11001000010 @VX_tb
VEXPANDDM 000100 ..... 00011 ..... 11001000010 @VX_tb
VEXPANDQM 000100 ..... 00100 ..... 11001000010 @VX_tb
+VEXTRACTBM 000100 ..... 01000 ..... 11001000010 @VX_tb
+VEXTRACTHM 000100 ..... 01001 ..... 11001000010 @VX_tb
+VEXTRACTWM 000100 ..... 01010 ..... 11001000010 @VX_tb
+VEXTRACTDM 000100 ..... 01011 ..... 11001000010 @VX_tb
+VEXTRACTQM 000100 ..... 01100 ..... 11001000010 @VX_tb
+
# VSX Load/Store Instructions
LXV 111101 ..... ..... ............ . 001 @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 58aca58f0f..dd7337c2f2 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
return true;
}
+static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+ const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+ TCGv_i64 t, b, tmp;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ t = tcg_const_i64(0);
+ b = tcg_temp_new_i64();
+ tmp = tcg_temp_new_i64();
+
+ for (int w = 0; w < 2; w++) {
+ get_avr64(b, a->vrb, w);
+
+ for (int i = 0; i < elem_count_half; i++) {
+ int in_bit = (i + 1) * elem_width - 1;
+ int out_bit = w * elem_count_half + i;
+
+ if (in_bit > out_bit) {
+ tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
+ } else {
+ tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
+ }
+ tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
+ tcg_gen_or_i64(t, t, tmp);
+ }
+ }
+ tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
+
+ tcg_temp_free_i64(t);
+ tcg_temp_free_i64(b);
+ tcg_temp_free_i64(tmp);
+
+ return true;
+}
+
+TRANS(VEXTRACTBM, do_vextractm, MO_8)
+TRANS(VEXTRACTHM, do_vextractm, MO_16)
+TRANS(VEXTRACTWM, do_vextractm, MO_32)
+TRANS(VEXTRACTDM, do_vextractm, MO_64)
+
+static bool trans_VEXTRACTQM(DisasContext *ctx, arg_VX_tb *a)
+{
+ TCGv_i64 tmp;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ tmp = tcg_temp_new_i64();
+
+ get_avr64(tmp, a->vrb, true);
+ tcg_gen_shri_i64(tmp, tmp, 63);
+ tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], tmp);
+
+ tcg_temp_free_i64(tmp);
+
+ return true;
+}
+
#define GEN_VAFORM_PAIRED(name0, name1, opc2) \
static void glue(gen_, name0##_##name1)(DisasContext *ctx) \
{ \
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
@ 2021-11-12 14:14 ` matheus.ferst
2021-12-03 13:01 ` Richard Henderson
2021-12-03 8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater
3 siblings, 1 reply; 8+ messages in thread
From: matheus.ferst @ 2021-11-12 14:14 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david
From: Matheus Ferst <matheus.ferst@eldorado.org.br>
Implement the following PowerISA v3.1 instructions:
mtvsrbm: Move to VSR Byte Mask
mtvsrhm: Move to VSR Halfword Mask
mtvsrwm: Move to VSR Word Mask
mtvsrdm: Move to VSR Doubleword Mask
mtvsrqm: Move to VSR Quadword Mask
mtvsrbmi: Move to VSR Byte Mask Immediate
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v2:
- Applied rth suggestions to do_mtvsrm and trans_MTVSRBMI
---
target/ppc/insn32.decode | 11 +++
target/ppc/translate/vmx-impl.c.inc | 115 ++++++++++++++++++++++++++++
2 files changed, 126 insertions(+)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 639ac22bf0..f68931f4f3 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -40,6 +40,10 @@
%ds_rtp 22:4 !function=times_2
@DS_rtp ...... ....0 ra:5 .............. .. &D rt=%ds_rtp si=%ds_si
+&DX_b vrt b
+%dx_b 6:10 16:5 0:1
+@DX_b ...... vrt:5 ..... .......... ..... . &DX_b b=%dx_b
+
&DX rt d
%dx_d 6:s10 16:5 0:1
@DX ...... rt:5 ..... .......... ..... . &DX d=%dx_d
@@ -413,6 +417,13 @@ VSRDBI 000100 ..... ..... ..... 01 ... 010110 @VN
## Vector Mask Manipulation Instructions
+MTVSRBM 000100 ..... 10000 ..... 11001000010 @VX_tb
+MTVSRHM 000100 ..... 10001 ..... 11001000010 @VX_tb
+MTVSRWM 000100 ..... 10010 ..... 11001000010 @VX_tb
+MTVSRDM 000100 ..... 10011 ..... 11001000010 @VX_tb
+MTVSRQM 000100 ..... 10100 ..... 11001000010 @VX_tb
+MTVSRBMI 000100 ..... ..... .......... 01010 . @DX_b
+
VEXPANDBM 000100 ..... 00000 ..... 11001000010 @VX_tb
VEXPANDHM 000100 ..... 00001 ..... 11001000010 @VX_tb
VEXPANDWM 000100 ..... 00010 ..... 11001000010 @VX_tb
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index dd7337c2f2..404767e4ec 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1599,6 +1599,121 @@ static bool trans_VEXTRACTQM(DisasContext *ctx, arg_VX_tb *a)
return true;
}
+static bool do_mtvsrm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
+{
+ const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
+ uint64_t c;
+ int i, j;
+ TCGv_i64 hi, lo, t0, t1;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ hi = tcg_temp_new_i64();
+ lo = tcg_temp_new_i64();
+ t0 = tcg_temp_new_i64();
+ t1 = tcg_temp_new_i64();
+
+ tcg_gen_extu_tl_i64(t0, cpu_gpr[a->vrb]);
+ tcg_gen_extract_i64(hi, t0, elem_count_half, elem_count_half);
+ tcg_gen_extract_i64(lo, t0, 0, elem_count_half);
+
+ /*
+ * Spread the bits into their respective elements.
+ * E.g. for bytes:
+ * 00000000000000000000000000000000000000000000000000000000abcdefgh
+ * << 32 - 4
+ * 0000000000000000000000000000abcdefgh0000000000000000000000000000
+ * |
+ * 0000000000000000000000000000abcdefgh00000000000000000000abcdefgh
+ * << 16 - 2
+ * 00000000000000abcdefgh00000000000000000000abcdefgh00000000000000
+ * |
+ * 00000000000000abcdefgh000000abcdefgh000000abcdefgh000000abcdefgh
+ * << 8 - 1
+ * 0000000abcdefgh000000abcdefgh000000abcdefgh000000abcdefgh0000000
+ * |
+ * 0000000abcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgXbcdefgh
+ * & dup(1)
+ * 0000000a0000000b0000000c0000000d0000000e0000000f0000000g0000000h
+ * * 0xff
+ * aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh
+ */
+ for (i = elem_count_half / 2, j = 32; i > 0; i >>= 1, j >>= 1) {
+ tcg_gen_shli_i64(t0, hi, j - i);
+ tcg_gen_shli_i64(t1, lo, j - i);
+ tcg_gen_or_i64(hi, hi, t0);
+ tcg_gen_or_i64(lo, lo, t1);
+ }
+
+ c = dup_const(vece, 1);
+ tcg_gen_andi_i64(hi, hi, c);
+ tcg_gen_andi_i64(lo, lo, c);
+
+ c = MAKE_64BIT_MASK(0, elem_width);
+ tcg_gen_muli_i64(hi, hi, c);
+ tcg_gen_muli_i64(lo, lo, c);
+
+ set_avr64(a->vrt, lo, false);
+ set_avr64(a->vrt, hi, true);
+
+ tcg_temp_free_i64(hi);
+ tcg_temp_free_i64(lo);
+ tcg_temp_free_i64(t0);
+ tcg_temp_free_i64(t1);
+
+ return true;
+}
+
+TRANS(MTVSRBM, do_mtvsrm, MO_8)
+TRANS(MTVSRHM, do_mtvsrm, MO_16)
+TRANS(MTVSRWM, do_mtvsrm, MO_32)
+TRANS(MTVSRDM, do_mtvsrm, MO_64)
+
+static bool trans_MTVSRQM(DisasContext *ctx, arg_VX_tb *a)
+{
+ TCGv_i64 tmp;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ tmp = tcg_temp_new_i64();
+
+ tcg_gen_ext_tl_i64(tmp, cpu_gpr[a->vrb]);
+ tcg_gen_sextract_i64(tmp, tmp, 0, 1);
+ set_avr64(a->vrt, tmp, false);
+ set_avr64(a->vrt, tmp, true);
+
+ tcg_temp_free_i64(tmp);
+
+ return true;
+}
+
+static bool trans_MTVSRBMI(DisasContext *ctx, arg_DX_b *a)
+{
+ const uint64_t mask = dup_const(MO_8, 1);
+ uint64_t hi, lo;
+
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+ REQUIRE_VECTOR(ctx);
+
+ hi = extract16(a->b, 8, 8);
+ lo = extract16(a->b, 0, 8);
+
+ for (int i = 4, j = 32; i > 0; i >>= 1, j >>= 1) {
+ hi |= hi << (j - i);
+ lo |= lo << (j - i);
+ }
+
+ hi = (hi & mask) * 0xFF;
+ lo = (lo & mask) * 0xFF;
+
+ set_avr64(a->vrt, tcg_constant_i64(hi), true);
+ set_avr64(a->vrt, tcg_constant_i64(lo), false);
+
+ return true;
+}
+
#define GEN_VAFORM_PAIRED(name0, name1, opc2) \
static void glue(gen_, name0##_##name1)(DisasContext *ctx) \
{ \
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
` (2 preceding siblings ...)
2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
@ 2021-12-03 8:34 ` Cédric Le Goater
3 siblings, 0 replies; 8+ messages in thread
From: Cédric Le Goater @ 2021-12-03 8:34 UTC (permalink / raw)
To: matheus.ferst, qemu-devel, qemu-ppc
Cc: danielhb413, richard.henderson, groug, david
Hello,
On 11/12/21 15:14, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>
> This is a small patch series just to allow Ubuntu 21.10 to boot with
> -cpu POWER10. Glibc 2.34 is using vextractbm, so the init is killed by
> SIGILL without the second patch of this series. The other two insns. are
> included as they are somewhat close to Vector Extract Mask (at least in
> pseudocode).
>
> v2:
> - Applied rth suggestions to VEXTRACT[BHWDQ]M and MTVSR[BHWDQ]M[I]
I am planning to include these patches in the next ppc pull request
for QEMU 7.0 since they fix support for recent glibc/distros. Unless
something still needs to be done fpr patch 2+3.
Thanks,
C.
>
> Matheus Ferst (3):
> target/ppc: Implement Vector Expand Mask
> target/ppc: Implement Vector Extract Mask
> target/ppc: Implement Vector Mask Move insns
>
> target/ppc/insn32.decode | 28 ++++
> target/ppc/translate/vmx-impl.c.inc | 209 ++++++++++++++++++++++++++++
> 2 files changed, 237 insertions(+)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
@ 2021-12-03 13:00 ` Richard Henderson
2021-12-03 13:21 ` Richard Henderson
0 siblings, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:00 UTC (permalink / raw)
To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david
On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>
> Implement the following PowerISA v3.1 instructions:
> vextractbm: Vector Extract Byte Mask
> vextracthm: Vector Extract Halfword Mask
> vextractwm: Vector Extract Word Mask
> vextractdm: Vector Extract Doubleword Mask
> vextractqm: Vector Extract Quadword Mask
>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v2:
> - Applied rth suggestion to do_vextractm
> ---
> target/ppc/insn32.decode | 6 +++
> target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
> 2 files changed, 66 insertions(+)
>
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index 9a28f1d266..639ac22bf0 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -419,6 +419,12 @@ VEXPANDWM 000100 ..... 00010 ..... 11001000010 @VX_tb
> VEXPANDDM 000100 ..... 00011 ..... 11001000010 @VX_tb
> VEXPANDQM 000100 ..... 00100 ..... 11001000010 @VX_tb
>
> +VEXTRACTBM 000100 ..... 01000 ..... 11001000010 @VX_tb
> +VEXTRACTHM 000100 ..... 01001 ..... 11001000010 @VX_tb
> +VEXTRACTWM 000100 ..... 01010 ..... 11001000010 @VX_tb
> +VEXTRACTDM 000100 ..... 01011 ..... 11001000010 @VX_tb
> +VEXTRACTQM 000100 ..... 01100 ..... 11001000010 @VX_tb
> +
> # VSX Load/Store Instructions
>
> LXV 111101 ..... ..... ............ . 001 @DQ_TSX
> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
> index 58aca58f0f..dd7337c2f2 100644
> --- a/target/ppc/translate/vmx-impl.c.inc
> +++ b/target/ppc/translate/vmx-impl.c.inc
> @@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
> return true;
> }
>
> +static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
> +{
> + const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
> + TCGv_i64 t, b, tmp;
> +
> + REQUIRE_INSNS_FLAGS2(ctx, ISA310);
> + REQUIRE_VECTOR(ctx);
> +
> + t = tcg_const_i64(0);
> + b = tcg_temp_new_i64();
> + tmp = tcg_temp_new_i64();
> +
> + for (int w = 0; w < 2; w++) {
> + get_avr64(b, a->vrb, w);
> +
> + for (int i = 0; i < elem_count_half; i++) {
> + int in_bit = (i + 1) * elem_width - 1;
> + int out_bit = w * elem_count_half + i;
> +
> + if (in_bit > out_bit) {
> + tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
> + } else {
> + tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
> + }
> + tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
> + tcg_gen_or_i64(t, t, tmp);
> + }
> + }
> + tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
Pardon me. I realized after the fact that we can run the same algorithm as for mtvsrm (in
the next patch) in reverse.
& dup(1)
.......a.......b.......c.......d.......e.......f.......g.......h
>> 32 - 4
...................................a.......b.......c.......d....
|
.......a.......b.......c.......d...a...e...b...f...c...g...d...h
>> 16 - 2
.....................a.......b.......c.......d...a...e...b...f..
|
.......a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.b.d.f.h
>> 8 - 1
..............a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.
|
.......a......ab.....abc....abcd...abcde..abcdef.abcdefgabcdefgh
& 0xff
........................................................abcdefgh
where one of the two final masks can be done via deposit:
tcg_gen_andi_i64(hi, hi, 0xff);
tcg_gen_deposit_i64(lo, lo, hi, 8, 56);
Which will reduce the instruction count of this implementation by half.
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns
2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
@ 2021-12-03 13:01 ` Richard Henderson
0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:01 UTC (permalink / raw)
To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david
On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>
> Implement the following PowerISA v3.1 instructions:
> mtvsrbm: Move to VSR Byte Mask
> mtvsrhm: Move to VSR Halfword Mask
> mtvsrwm: Move to VSR Word Mask
> mtvsrdm: Move to VSR Doubleword Mask
> mtvsrqm: Move to VSR Quadword Mask
> mtvsrbmi: Move to VSR Byte Mask Immediate
>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v2:
> - Applied rth suggestions to do_mtvsrm and trans_MTVSRBMI
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask
2021-12-03 13:00 ` Richard Henderson
@ 2021-12-03 13:21 ` Richard Henderson
0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2021-12-03 13:21 UTC (permalink / raw)
To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david
On 12/3/21 5:00 AM, Richard Henderson wrote:
> On 11/12/21 6:14 AM, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> Implement the following PowerISA v3.1 instructions:
>> vextractbm: Vector Extract Byte Mask
>> vextracthm: Vector Extract Halfword Mask
>> vextractwm: Vector Extract Word Mask
>> vextractdm: Vector Extract Doubleword Mask
>> vextractqm: Vector Extract Quadword Mask
>>
>> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> v2:
>> - Applied rth suggestion to do_vextractm
>> ---
>> target/ppc/insn32.decode | 6 +++
>> target/ppc/translate/vmx-impl.c.inc | 60 +++++++++++++++++++++++++++++
>> 2 files changed, 66 insertions(+)
>>
>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>> index 9a28f1d266..639ac22bf0 100644
>> --- a/target/ppc/insn32.decode
>> +++ b/target/ppc/insn32.decode
>> @@ -419,6 +419,12 @@ VEXPANDWM 000100 ..... 00010 ..... 11001000010 @VX_tb
>> VEXPANDDM 000100 ..... 00011 ..... 11001000010 @VX_tb
>> VEXPANDQM 000100 ..... 00100 ..... 11001000010 @VX_tb
>> +VEXTRACTBM 000100 ..... 01000 ..... 11001000010 @VX_tb
>> +VEXTRACTHM 000100 ..... 01001 ..... 11001000010 @VX_tb
>> +VEXTRACTWM 000100 ..... 01010 ..... 11001000010 @VX_tb
>> +VEXTRACTDM 000100 ..... 01011 ..... 11001000010 @VX_tb
>> +VEXTRACTQM 000100 ..... 01100 ..... 11001000010 @VX_tb
>> +
>> # VSX Load/Store Instructions
>> LXV 111101 ..... ..... ............ . 001 @DQ_TSX
>> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
>> index 58aca58f0f..dd7337c2f2 100644
>> --- a/target/ppc/translate/vmx-impl.c.inc
>> +++ b/target/ppc/translate/vmx-impl.c.inc
>> @@ -1539,6 +1539,66 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb *a)
>> return true;
>> }
>> +static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
>> +{
>> + const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece;
>> + TCGv_i64 t, b, tmp;
>> +
>> + REQUIRE_INSNS_FLAGS2(ctx, ISA310);
>> + REQUIRE_VECTOR(ctx);
>> +
>> + t = tcg_const_i64(0);
>> + b = tcg_temp_new_i64();
>> + tmp = tcg_temp_new_i64();
>> +
>> + for (int w = 0; w < 2; w++) {
>> + get_avr64(b, a->vrb, w);
>> +
>> + for (int i = 0; i < elem_count_half; i++) {
>> + int in_bit = (i + 1) * elem_width - 1;
>> + int out_bit = w * elem_count_half + i;
>> +
>> + if (in_bit > out_bit) {
>> + tcg_gen_shri_i64(tmp, b, in_bit - out_bit);
>> + } else {
>> + tcg_gen_shli_i64(tmp, b, out_bit - in_bit);
>> + }
>> + tcg_gen_andi_i64(tmp, tmp, 1 << out_bit);
>> + tcg_gen_or_i64(t, t, tmp);
>> + }
>> + }
>> + tcg_gen_trunc_i64_tl(cpu_gpr[a->vrt], t);
>
> Pardon me. I realized after the fact that we can run the same algorithm as for mtvsrm (in
> the next patch) in reverse.
>
> & dup(1)
> .......a.......b.......c.......d.......e.......f.......g.......h
> >> 32 - 4
> ...................................a.......b.......c.......d....
> |
> .......a.......b.......c.......d...a...e...b...f...c...g...d...h
> >> 16 - 2
> .....................a.......b.......c.......d...a...e...b...f..
> |
> .......a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.b.d.f.h
> >> 8 - 1
> ..............a.......b.....a.c.....b.d...a.c.e...b.d.f.a.c.e.g.
> |
> .......a......ab.....abc....abcd...abcde..abcdef.abcdefgabcdefgh
> & 0xff
> ........................................................abcdefgh
>
> where one of the two final masks can be done via deposit:
>
> tcg_gen_andi_i64(hi, hi, 0xff);
> tcg_gen_deposit_i64(lo, lo, hi, 8, 56);
>
> Which will reduce the instruction count of this implementation by half.
Oops, ENOCOFFEE. Of course the input bit comes from the msb of the element, not the lsb.
Three different options:
(1) Begin with a shift of elem_count_half - 1, then do the above,
(2) Change the initial mask to the msb, then extract from elem_count_half - 1.
(3) Do left shifts so that we collect the bits at the msb of
the word. This probably results in the easiest concatenation
in the end:
tcg_gen_shri_i64(hi, hi, 64 - elem_count_half);
tcg_gen_extract2_i64(lo, lo, hi, 64 - 2 * elem_count_half);
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-12-03 13:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12 14:14 [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 1/3] target/ppc: Implement Vector Expand Mask matheus.ferst
2021-11-12 14:14 ` [PATCH v2 2/3] target/ppc: Implement Vector Extract Mask matheus.ferst
2021-12-03 13:00 ` Richard Henderson
2021-12-03 13:21 ` Richard Henderson
2021-11-12 14:14 ` [PATCH v2 3/3] target/ppc: Implement Vector Mask Move insns matheus.ferst
2021-12-03 13:01 ` Richard Henderson
2021-12-03 8:34 ` [PATCH v2 0/3] target/ppc: Implement Vector Expand/Extract Mask and Vector Mask Cédric Le Goater
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.