All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Cc: qemu-s390x@nongnu.org, dmiller423@gmail.com
Subject: Re: [PATCH v3 04/11] target/s390x: vxeh2: Update for changes to vector shifts
Date: Mon, 21 Mar 2022 12:15:53 +0100	[thread overview]
Message-ID: <93302366-470f-2684-c788-d5eb9e0f861e@redhat.com> (raw)
In-Reply-To: <20220308015358.188499-5-richard.henderson@linaro.org>

On 08.03.22 02:53, Richard Henderson wrote:
> From: David Miller <dmiller423@gmail.com>
> 
> Prior to vector enhancements 2, the shift count was supposed to be equal
> for each byte lest the result be unpredictable, which allowed us to assume
> that the shift count was the same, and optimize accordingly.
> 
> With vector enhancements 2, the shift count is allowed to be different
> for each byte, and we must cope with that.
> 
> Signed-off-by: David Miller <dmiller423@gmail.com>
> Message-Id: <20220307020327.3003-4-dmiller423@gmail.com>
> [rth: Split out of larger patch; simplify shift/merge code.]
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h               |  3 ++
>  target/s390x/tcg/vec_int_helper.c   | 58 ++++++++++++++++++++++
>  target/s390x/tcg/translate_vx.c.inc | 77 ++++++++++++-----------------
>  target/s390x/tcg/insn-data.def      | 12 ++---
>  4 files changed, 99 insertions(+), 51 deletions(-)
> 
> diff --git a/target/s390x/helper.h b/target/s390x/helper.h
> index 7412130883..bf33d86f74 100644
> --- a/target/s390x/helper.h
> +++ b/target/s390x/helper.h
> @@ -203,8 +203,11 @@ DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
> +DEF_HELPER_FLAGS_4(gvec_vsl_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
> +DEF_HELPER_FLAGS_4(gvec_vsra_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
> +DEF_HELPER_FLAGS_4(gvec_vsrl_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
>  DEF_HELPER_4(gvec_vtm, void, ptr, cptr, env, i32)
> diff --git a/target/s390x/tcg/vec_int_helper.c b/target/s390x/tcg/vec_int_helper.c
> index 5561b3ed90..a881d5d267 100644
> --- a/target/s390x/tcg/vec_int_helper.c
> +++ b/target/s390x/tcg/vec_int_helper.c
> @@ -540,18 +540,76 @@ void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
>      s390_vec_shl(v1, v2, count);
>  }
>  
> +void HELPER(gvec_vsl_ve2)(void *v1, const void *v2, const void *v3,
> +                          uint32_t desc)
> +{
> +    S390Vector tmp;
> +    uint32_t sh, e0, e1 = 0;

int i;

> +
> +    for (int i = 15; i >= 0; --i, e1 = e0 << 24) {

I'd only do "e1 = e0" here and do the shift for the rol32 ...

> +        e0 = s390_vec_read_element8(v2, i);
> +        sh = s390_vec_read_element8(v3, i) & 7;
> +
> +        s390_vec_write_element8(&tmp, i, rol32(e0 | e1, sh));

... here

s390_vec_write_element8(&tmp, i, rol32(e0 | e1 << 24, sh));

> +    }
> +
> +    *(S390Vector *)v1 = tmp;
> +}
> +
>  void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
>                         uint32_t desc)
>  {
>      s390_vec_sar(v1, v2, count);
>  }
>  
> +void HELPER(gvec_vsra_ve2)(void *v1, const void *v2, const void *v3,
> +                           uint32_t desc)
> +{
> +    S390Vector tmp;
> +    uint32_t sh, e0, e1;
> +    int i = 0;
> +
> +    e0 = s390_vec_read_element8(v2, 0);
> +    e1 = -(e0 >> 7) << 8;
> +
> +    for (;;) {
> +        sh = s390_vec_read_element8(v3, i) & 7;
> +
> +        s390_vec_write_element8(&tmp, i, (e0 | e1) >> sh);
> +
> +        if (++i >= 16) {
> +            break;
> +        }
> +
> +        e1 = e0 << 8;
> +        e0 = s390_vec_read_element8(v2, i);
> +    }

Can't we use the following that resembles the other helpers or am I
missing something?

S390Vector tmp;
uint32_t sh, e0, e1 = 0;

/* Byte 0 is special only. */
e0 = (int32_t)(int8_t)s390_vec_read_element8(v2, i);
sh = s390_vec_read_element8(v3, i) & 7;
s390_vec_write_element8(&tmp, i, e0 >> sh);

e1 = e0;
for (int i = 1; i < 16; ++i, e1 = e0) {
	e0 = s390_vec_read_element8(v2, i);
	sh = s390_vec_read_element8(v3, i) & 7;
	s390_vec_write_element8(&tmp, i, (e0 | e1 << 8) >> sh);
}

*(S390Vector *)v1 = tmp;


> +
> +    *(S390Vector *)v1 = tmp;
> +}
> +
>  void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
>                         uint32_t desc)
>  {
>      s390_vec_shr(v1, v2, count);
>  }
>  
> +void HELPER(gvec_vsrl_ve2)(void *v1, const void *v2, const void *v3,
> +                           uint32_t desc)
> +{
> +    S390Vector tmp;
> +    uint32_t sh, e0, e1 = 0;
> +
> +    for (int i = 0; i < 16; ++i, e1 = e0 << 8) {

Dito, I'd do the shift below ...

> +        e0 = s390_vec_read_element8(v2, i);
> +        sh = s390_vec_read_element8(v3, i) & 7;
> +
> +        s390_vec_write_element8(&tmp, i, (e0 | e1) >> sh);

s390_vec_write_element8(&tmp, i, (e0 | e1 << 8) >> sh);

> +    }
> +
> +    *(S390Vector *)v1 = tmp;
> +}
> +
>  #define DEF_VSCBI(BITS)                                                        \
>  void HELPER(gvec_vscbi##BITS)(void *v1, const void *v2, const void *v3,        \
>                                uint32_t desc)                                   \
> diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
> index d514e8b218..967f6213d8 100644
> --- a/target/s390x/tcg/translate_vx.c.inc
> +++ b/target/s390x/tcg/translate_vx.c.inc
> @@ -2018,21 +2018,42 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
>      return DISAS_NEXT;
>  }
>  
> +static DisasJumpType gen_vsh_bit_byte(DisasContext *s, DisasOps *o,
> +                                      gen_helper_gvec_2i *gen,
> +                                      gen_helper_gvec_3 *gen_ve2)
> +{
> +    bool byte = s->insn->data;

Nit: I'd have called this "by_byte".


-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-03-21 11:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08  1:53 [PATCH v3 00/11] s390x/tcg: Implement Vector-Enhancements Facility 2 Richard Henderson
2022-03-08  1:53 ` [PATCH v3 01/11] tcg: Implement tcg_gen_{h,w}swap_{i32,i64} Richard Henderson
2022-03-21  9:32   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 02/11] target/s390x: vxeh2: vector convert short/32b Richard Henderson
2022-03-21  9:33   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 03/11] target/s390x: vxeh2: vector string search Richard Henderson
2022-03-21 10:31   ` David Hildenbrand
2022-03-22 14:42     ` Richard Henderson
2022-03-22 15:06       ` David Miller
2022-03-08  1:53 ` [PATCH v3 04/11] target/s390x: vxeh2: Update for changes to vector shifts Richard Henderson
2022-03-21 11:15   ` David Hildenbrand [this message]
2022-03-08  1:53 ` [PATCH v3 05/11] target/s390x: vxeh2: vector shift double by bit Richard Henderson
2022-03-21 11:23   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 06/11] target/s390x: vxeh2: vector {load, store} elements reversed Richard Henderson
2022-03-21 11:35   ` David Hildenbrand
2022-03-21 15:35     ` Richard Henderson
2022-03-21 15:40       ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 07/11] target/s390x: vxeh2: vector {load, store} byte reversed elements Richard Henderson
2022-03-21 11:45   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 08/11] target/s390x: vxeh2: vector {load, store} byte reversed element Richard Henderson
2022-03-21 12:33   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 09/11] target/s390x: add S390_FEAT_VECTOR_ENH2 to cpu max Richard Henderson
2022-03-21 11:28   ` David Hildenbrand
2022-03-08  1:53 ` [PATCH v3 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 Richard Henderson
2022-03-08  1:53 ` [PATCH v3 11/11] target/s390x: Fix writeback to v1 in helper_vstl Richard Henderson
2022-03-21 11:26   ` David Hildenbrand
2022-03-08  4:09 ` [PATCH v3 00/11] s390x/tcg: Implement Vector-Enhancements Facility 2 David Miller
2022-03-20  1:14   ` David Miller
2022-03-21  9:19     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93302366-470f-2684-c788-d5eb9e0f861e@redhat.com \
    --to=david@redhat.com \
    --cc=dmiller423@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.