qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
       [not found] <1566959818-38369-1-git-send-email-zhiwei_liu@c-sky.com>
@ 2019-08-28  9:08 ` Alex Bennée
  2019-08-28 16:39   ` Richard Henderson
  2019-08-29 13:35   ` liuzhiwei
  2019-08-28 18:54 ` Richard Henderson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Alex Bennée @ 2019-08-28  9:08 UTC (permalink / raw)
  To: liuzhiwei
  Cc: peter.maydell, palmer, qemu-riscv, sagark, kbastian, riku.voipio,
	qemu-devel, laurent, Alistair.Francis, aurelien


liuzhiwei <zhiwei_liu@c-sky.com> writes:

> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
> ---
>  fpu/softfloat.c                         |   119 +
>  include/fpu/softfloat.h                 |     4 +

Changes to softfloat should be in a separate patch, but see bellow.

>  linux-user/riscv/cpu_loop.c             |     8 +-
>  target/riscv/Makefile.objs              |     2 +-
>  target/riscv/cpu.h                      |    30 +
>  target/riscv/cpu_bits.h                 |    15 +
>  target/riscv/cpu_helper.c               |     7 +
>  target/riscv/csr.c                      |    65 +-
>  target/riscv/helper.h                   |   354 +
>  target/riscv/insn32.decode              |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c                |     1 +
>  target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++

This is likely too big to be reviewed. Is it possible to split the patch
up into more discrete chunks, for example support pieces and then maybe
a class at a time?

>  13 files changed, 28017 insertions(+), 9 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 2ba36ec..da155ea 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
>  }
>
>  /*----------------------------------------------------------------------------
> +| Returns the sign bit of the half-precision floating-point value `a'.
> +*----------------------------------------------------------------------------*/
> +
> +static inline flag extractFloat16Sign(float16 a)
> +{
> +    return float16_val(a) >> 0xf;
> +}
> +

We are trying to avoid this sort of bit fiddling for new code when we
already have generic decompose functions that can extract all the parts
into a common format.

> +
> +/*----------------------------------------------------------------------------
>  | Returns the fraction bits of the single-precision floating-point value `a'.
>  *----------------------------------------------------------------------------*/
>
> @@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b, float_status *status)
>  }
>
>  /*----------------------------------------------------------------------------
> +| Returns 1 if the half-precision floating-point value `a' is less than
> +| or equal to the corresponding value `b', and 0 otherwise.  The invalid
> +| exception is raised if either operand is a NaN.  The comparison is performed
> +| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> +*----------------------------------------------------------------------------*/
> +
> +int float16_le(float16 a, float16 b, float_status *status)
> +{
> +    flag aSign, bSign;
> +    uint16_t av, bv;
> +    a = float16_squash_input_denormal(a, status);
> +    b = float16_squash_input_denormal(b, status);
> +
> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +       ) {
> +        float_raise(float_flag_invalid, status);
> +        return 0;
> +    }
> +    aSign = extractFloat16Sign( a );
> +    bSign = extractFloat16Sign( b );
> +    av = float16_val(a);
> +    bv = float16_val(b);
> +    if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv )<<1 ) == 0 );
> +    return ( av == bv ) || ( aSign ^ ( av < bv ) );
> +
> +}

What does this provide that:

  float16_compare(a, b, status) == float_relation_less;

doesn't?

> +
> +/*----------------------------------------------------------------------------
>  | Returns 1 if the single-precision floating-point value `a' is less than
>  | or equal to the corresponding value `b', and 0 otherwise.  The invalid
>  | exception is raised if either operand is a NaN.  The comparison is performed
> @@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b, float_status *status)
>  | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>  *----------------------------------------------------------------------------*/
>
> +int float16_lt(float16 a, float16 b, float_status *status)
> +{
> +    flag aSign, bSign;
> +    uint16_t av, bv;
> +    a = float16_squash_input_denormal(a, status);
> +    b = float16_squash_input_denormal(b, status);
> +
> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +       ) {
> +        float_raise(float_flag_invalid, status);
> +        return 0;
> +    }
> +    aSign = extractFloat16Sign( a );
> +    bSign = extractFloat16Sign( b );
> +    av = float16_val(a);
> +    bv = float16_val(b);
> +    if ( aSign != bSign ) return aSign && ( (uint16_t) ( ( av | bv )<<1 ) != 0 );
> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
> +
> +}
> +
> +/*----------------------------------------------------------------------------
> +| Returns 1 if the single-precision floating-point value `a' is less than
> +| the corresponding value `b', and 0 otherwise.  The invalid exception is
> +| raised if either operand is a NaN.  The comparison is performed according
> +| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> +*----------------------------------------------------------------------------*/
> +
>  int float32_lt(float32 a, float32 b, float_status *status)
>  {
>      flag aSign, bSign;
> @@ -4869,6 +4937,32 @@ int float32_unordered(float32 a, float32 b, float_status *status)
>  }
>
>  /*----------------------------------------------------------------------------
> +| Returns 1 if the half-precision floating-point value `a' is equal to
> +| the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
> +| exception.  The comparison is performed according to the IEC/IEEE Standard
> +| for Binary Floating-Point Arithmetic.
> +*----------------------------------------------------------------------------*/
> +
> +int float16_eq_quiet(float16 a, float16 b, float_status *status)
> +{
> +    a = float16_squash_input_denormal(a, status);
> +    b = float16_squash_input_denormal(b, status);
> +
> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +       ) {
> +        if (float16_is_signaling_nan(a, status)
> +         || float16_is_signaling_nan(b, status)) {
> +            float_raise(float_flag_invalid, status);
> +        }
> +        return 0;
> +    }
> +    return ( float16_val(a) == float16_val(b) ) ||
> +            ( (uint16_t) ( ( float16_val(a) | float16_val(b) )<<1 ) == 0 );
> +}
> +

See also float_16_compare_quiet

> +
> +/*----------------------------------------------------------------------------
>  | Returns 1 if the single-precision floating-point value `a' is equal to
>  | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
>  | exception.  The comparison is performed according to the IEC/IEEE Standard
> @@ -4958,6 +5052,31 @@ int float32_lt_quiet(float32 a, float32 b, float_status *status)
>  }
>
>  /*----------------------------------------------------------------------------
> +| Returns 1 if the half-precision floating-point values `a' and `b' cannot
> +| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
> +| comparison is performed according to the IEC/IEEE Standard for Binary
> +| Floating-Point Arithmetic.
> +*----------------------------------------------------------------------------*/
> +
> +int float16_unordered_quiet(float16 a, float16 b, float_status *status)
> +{
> +    a = float16_squash_input_denormal(a, status);
> +    b = float16_squash_input_denormal(b, status);
> +
> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +       ) {
> +        if (float16_is_signaling_nan(a, status)
> +         || float16_is_signaling_nan(b, status)) {
> +            float_raise(float_flag_invalid, status);
> +        }
> +        return 1;
> +    }
> +    return 0;
> +}
> +
> +
> +/*----------------------------------------------------------------------------
>  | Returns 1 if the single-precision floating-point values `a' and `b' cannot
>  | be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
>  | comparison is performed according to the IEC/IEEE Standard for Binary
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 3ff3fa5..3b0754c 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status *status);
>  float16 float16_sqrt(float16, float_status *status);
>  int float16_compare(float16, float16, float_status *status);
>  int float16_compare_quiet(float16, float16, float_status *status);
> +int float16_unordered_quiet(float16, float16, float_status *status);
> +int float16_le(float16, float16, float_status *status);
> +int float16_lt(float16, float16, float_status *status);
> +int float16_eq_quiet(float16, float16, float_status *status);
>
>  int float16_is_quiet_nan(float16, float_status *status);
>  int float16_is_signaling_nan(float16, float_status *status);
> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
> index 12aa3c0..b01548a 100644
> --- a/linux-user/riscv/cpu_loop.c
> +++ b/linux-user/riscv/cpu_loop.c
> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>          signum = 0;
>          sigcode = 0;
>          sigaddr = 0;
> -
> +        if (env->foflag) {
> +            if (env->vfp.vl != 0) {
> +                env->foflag = false;
> +                env->pc += 4;
> +                continue;
> +            }
> +        }

What is this trying to do?

>          switch (trapnr) {
>          case EXCP_INTERRUPT:
>              /* just indicate that signals should be handled asap */
> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
> index b1c79bc..d577cef 100644
> --- a/target/riscv/Makefile.objs
> +++ b/target/riscv/Makefile.objs
> @@ -1,4 +1,4 @@
> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o pmp.o
> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o pmp.o
>
>  DECODETREE = $(SRC_PATH)/scripts/decodetree.py
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 0adb307..5a93aa2 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -67,6 +67,7 @@
>  #define RVC RV('C')
>  #define RVS RV('S')
>  #define RVU RV('U')
> +#define RVV RV('V')
>
>  /* S extension denotes that Supervisor mode exists, however it is possible
>     to have a core that support S mode but does not have an MMU and there
> @@ -93,9 +94,38 @@ typedef struct CPURISCVState CPURISCVState;
>
>  #include "pmp.h"
>
> +#define VLEN 128
> +#define VUNIT(x) (VLEN / x)
> +

If you want to do vectors I suggest you look at the TCGvec types for
passing pointers to vector registers to helpers. In this case you will
want to ensure your vector registers are properly aligned.

>  struct CPURISCVState {
>      target_ulong gpr[32];
>      uint64_t fpr[32]; /* assume both F and D extensions */
> +
> +    /* vector coprocessor state.  */
> +    struct {
> +        union VECTOR {
> +            float64  f64[VUNIT(64)];
> +            float32  f32[VUNIT(32)];
> +            float16  f16[VUNIT(16)];
> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
> +            uint64_t u64[VUNIT(64)];
> +            int64_t  s64[VUNIT(64)];
> +            uint32_t u32[VUNIT(32)];
> +            int32_t  s32[VUNIT(32)];
> +            uint16_t u16[VUNIT(16)];
> +            int16_t  s16[VUNIT(16)];
> +            uint8_t  u8[VUNIT(8)];
> +            int8_t   s8[VUNIT(8)];
> +        } vreg[32];
> +        target_ulong vxrm;
> +        target_ulong vxsat;
> +        target_ulong vl;
> +        target_ulong vstart;
> +        target_ulong vtype;
> +        float_status fp_status;
> +    } vfp;
> +
> +    bool         foflag;

Again I have no idea what foflag is here.

>      target_ulong pc;
>      target_ulong load_res;
>      target_ulong load_val;
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 11f971a..9eb43ec 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -29,6 +29,14 @@
>  #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
>  #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
>
> +/* Vector Fixed-Point round model */
> +#define FSR_VXRM_SHIFT      9
> +#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
> +
> +/* Vector Fixed-Point saturation flag */
> +#define FSR_VXSAT_SHIFT     8
> +#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
> +
>  /* Control and Status Registers */
>
>  /* User Trap Setup */
> @@ -48,6 +56,13 @@
>  #define CSR_FRM             0x002
>  #define CSR_FCSR            0x003
>
> +/* User Vector CSRs */
> +#define CSR_VSTART          0x008
> +#define CSR_VXSAT           0x009
> +#define CSR_VXRM            0x00a
> +#define CSR_VL              0xc20
> +#define CSR_VTYPE           0xc21
> +
>  /* User Timers and Counters */
>  #define CSR_CYCLE           0xc00
>  #define CSR_TIME            0xc01
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index e32b612..405caf6 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>          [PRV_H] = RISCV_EXCP_H_ECALL,
>          [PRV_M] = RISCV_EXCP_M_ECALL
>      };
> +    if (env->foflag) {
> +        if (env->vfp.vl != 0) {
> +            env->foflag = false;
> +            env->pc += 4;
> +            return;
> +        }
> +    }
>
>      if (!async) {
>          /* set tval to badaddr for traps with address information */
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index e0d4586..a6131ff 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -87,12 +87,12 @@ static int ctr(CPURISCVState *env, int csrno)
>      return 0;
>  }
>
> -#if !defined(CONFIG_USER_ONLY)
>  static int any(CPURISCVState *env, int csrno)
>  {
>      return 0;
>  }
>
> +#if !defined(CONFIG_USER_ONLY)
>  static int smode(CPURISCVState *env, int csrno)
>  {
>      return -!riscv_has_ext(env, RVS);
> @@ -158,8 +158,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
>          return -1;
>      }
>  #endif
> -    *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
> -        | (env->frm << FSR_RD_SHIFT);
> +    *val = (env->vfp.vxrm << FSR_VXRM_SHIFT)
> +            | (env->vfp.vxsat << FSR_VXSAT_SHIFT)
> +            | (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
> +            | (env->frm << FSR_RD_SHIFT);
>      return 0;
>  }
>
> @@ -172,10 +174,60 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
>      env->mstatus |= MSTATUS_FS;
>  #endif
>      env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
> +    env->vfp.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
> +    env->vfp.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
>      riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
>      return 0;
>  }
>
> +static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vfp.vtype;
> +    return 0;
> +}
> +
> +static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vfp.vl;
> +    return 0;
> +}
> +
> +static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vfp.vxrm;
> +    return 0;
> +}
> +
> +static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vfp.vxsat;
> +    return 0;
> +}
> +
> +static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
> +{
> +    *val = env->vfp.vstart;
> +    return 0;
> +}
> +
> +static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vfp.vxrm = val;
> +    return 0;
> +}
> +
> +static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vfp.vxsat = val;
> +    return 0;
> +}
> +
> +static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
> +{
> +    env->vfp.vstart = val;
> +    return 0;
> +}

A fixed return value makes me think these should be void functions.

> +
>  /* User Timers and Counters */
>  static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>  {
> @@ -873,7 +925,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>      [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
>      [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
>      [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
> -
> +    /* Vector CSRs */
> +    [CSR_VSTART] =              { any,   read_vstart,     write_vstart      },
> +    [CSR_VXSAT] =               { any,   read_vxsat,      write_vxsat       },
> +    [CSR_VXRM] =                { any,   read_vxrm,       write_vxrm        },
> +    [CSR_VL] =                  { any,   read_vl                            },
> +    [CSR_VTYPE] =               { any,   read_vtype                         },
>      /* User Timers and Counters */
>      [CSR_CYCLE] =               { ctr,  read_instret                        },
>      [CSR_INSTRET] =             { ctr,  read_instret                        },
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index debb22a..fee02c0 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -76,3 +76,357 @@ DEF_HELPER_2(mret, tl, env, tl)
>  DEF_HELPER_1(wfi, void, env)
>  DEF_HELPER_1(tlb_flush, void, env)
>  #endif
> +/* Vector functions */

Think about how you could split this patch up to introduce a group of
instructions at a time. This will make it a lot easier review.

I'm going to leave review of the specifics to the RISCV maintainers but
I suspect they will want to wait until a v2 of the series. However it
looks like a good first pass at implementing vectors.

--
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28  9:08 ` [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 Alex Bennée
@ 2019-08-28 16:39   ` Richard Henderson
  2019-08-29 13:35   ` liuzhiwei
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2019-08-28 16:39 UTC (permalink / raw)
  To: Alex Bennée, liuzhiwei
  Cc: peter.maydell, palmer, qemu-riscv, sagark, kbastian, riku.voipio,
	qemu-devel, laurent, Alistair.Francis, aurelien

On 8/28/19 2:08 AM, Alex Bennée wrote:
> If you want to do vectors I suggest you look at the TCGvec types for
> passing pointers to vector registers to helpers. In this case you will
> want to ensure your vector registers are properly aligned.

The risc-v vector extension is very different from any other existing vector
extension.  In particular, the locations of the vector elements vary
dynamically.  Except for certain special cases I doubt that risc-v can make
direct use of the generic TCG vector support.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
       [not found] <1566959818-38369-1-git-send-email-zhiwei_liu@c-sky.com>
  2019-08-28  9:08 ` [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 Alex Bennée
@ 2019-08-28 18:54 ` Richard Henderson
  2019-08-28 20:43   ` Richard Henderson
                     ` (2 more replies)
  2019-08-28 21:34 ` Alistair Francis
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 27+ messages in thread
From: Richard Henderson @ 2019-08-28 18:54 UTC (permalink / raw)
  To: liuzhiwei, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien

On 8/27/19 7:36 PM, liuzhiwei wrote:
> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
> ---
>  fpu/softfloat.c                         |   119 +
>  include/fpu/softfloat.h                 |     4 +
>  linux-user/riscv/cpu_loop.c             |     8 +-
>  target/riscv/Makefile.objs              |     2 +-
>  target/riscv/cpu.h                      |    30 +
>  target/riscv/cpu_bits.h                 |    15 +
>  target/riscv/cpu_helper.c               |     7 +
>  target/riscv/csr.c                      |    65 +-
>  target/riscv/helper.h                   |   354 +
>  target/riscv/insn32.decode              |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c                |     1 +
>  target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>  13 files changed, 28017 insertions(+), 9 deletions(-)

As Alex mentioned, this is *far* too big to be presented as a single patch.

> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 3ff3fa5..3b0754c 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status *status);
>  float16 float16_sqrt(float16, float_status *status);
>  int float16_compare(float16, float16, float_status *status);
>  int float16_compare_quiet(float16, float16, float_status *status);
> +int float16_unordered_quiet(float16, float16, float_status *status);
> +int float16_le(float16, float16, float_status *status);
> +int float16_lt(float16, float16, float_status *status);
> +int float16_eq_quiet(float16, float16, float_status *status);

As Alex mentioned, none of these changes are required, as all
functionality is provided by float16_compare{,_quiet}.

> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
> index 12aa3c0..b01548a 100644
> --- a/linux-user/riscv/cpu_loop.c
> +++ b/linux-user/riscv/cpu_loop.c
> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>          signum = 0;
>          sigcode = 0;
>          sigaddr = 0;
> -
> +        if (env->foflag) {
> +            if (env->vfp.vl != 0) {
> +                env->foflag = false;
> +                env->pc += 4;
> +                continue;
> +            }

This is most definitely not the correct way to implement first-fault.

You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
where we test pages for validity with tlb_vaddr_to_host.

> +    /* vector coprocessor state.  */
> +    struct {
> +        union VECTOR {
> +            float64  f64[VUNIT(64)];
> +            float32  f32[VUNIT(32)];
> +            float16  f16[VUNIT(16)];
> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
> +            uint64_t u64[VUNIT(64)];
> +            int64_t  s64[VUNIT(64)];
> +            uint32_t u32[VUNIT(32)];
> +            int32_t  s32[VUNIT(32)];
> +            uint16_t u16[VUNIT(16)];
> +            int16_t  s16[VUNIT(16)];
> +            uint8_t  u8[VUNIT(8)];
> +            int8_t   s8[VUNIT(8)];
> +        } vreg[32];
> +        target_ulong vxrm;
> +        target_ulong vxsat;
> +        target_ulong vl;
> +        target_ulong vstart;
> +        target_ulong vtype;
> +        float_status fp_status;
> +    } vfp;

You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
in the context of risc-v.

I'm not sure that vreg[].element[] really makes the most sense in the context
of how risc-v rearranges its elements.  It will almost certainly fail clang
validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
vreg[n+1].

It might be best to have a single array:

    union {
        uint64_t u64[32 * VLEN / 64];
        ...
        uint8_t u8[32 * VLEN / 8];
    } velt;

This is clearer to the compiler that this is a single block of memory that we
can index as we please.

Note that float64/float32/float16 are legacy.  They will always be equivalent
to the unsigned integer types of the same size.

Is there really any vector operation at all that is dependent on XLEN?  If not,
then there is no reason to confuse things by including target_ulong.


> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index e32b612..405caf6 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>          [PRV_H] = RISCV_EXCP_H_ECALL,
>          [PRV_M] = RISCV_EXCP_M_ECALL
>      };
> +    if (env->foflag) {
> +        if (env->vfp.vl != 0) {
> +            env->foflag = false;
> +            env->pc += 4;
> +            return;
> +        }
> +    }

Again, not the way to implement first-fault.

In particular, you haven't even verified that do_interrupt has been called on
behalf of a RISCV_EXCP_LOAD_PAGE_FAULT.  This could be a timer tick.

> +#define MAX_U8      ((uint8_t)0xff)
> +#define MIN_U8      ((uint8_t)0x0)
> +#define MAX_S8      ((int8_t)0x7f)
> +#define MIN_S8      ((int8_t)0x80)
> +#define SIGNBIT16   (1 << 15)
> +#define MAX_U16     ((uint16_t)0xffff)
> +#define MIN_U16     ((uint16_t)0x0)
> +#define MAX_S16     ((int16_t)0x7fff)
> +#define MIN_S16     ((int16_t)0x8000)
> +#define SIGNBIT32   (1 << 31)
> +#define MAX_U32     ((uint32_t)0xffffffff)
> +#define MIN_U32     ((uint32_t)0x0)
> +#define MAX_S32     ((int32_t)0x7fffffff)
> +#define MIN_S32     ((int32_t)0x80000000)
> +#define SIGNBIT64   ((uint64_t)1 << 63)
> +#define MAX_U64     ((uint64_t)0xffffffffffffffff)
> +#define MIN_U64     ((uint64_t)0x0)
> +#define MAX_S64     ((int64_t)0x7fffffffffffffff)
> +#define MIN_S64     ((int64_t)0x8000000000000000)

Why are you replicating INT8_MIN et al?


> +static target_ulong vector_get_index(CPURISCVState *env, int rs1, int rs2,
> +    int index, int mem, int width, int nf)
> +{
> +    target_ulong abs_off, base = env->gpr[rs1];
> +    target_long offset;
> +    switch (width) {
> +    case 8:
> +        offset = sign_extend(env->vfp.vreg[rs2].s8[index], 8) + nf * mem;
> +        break;
> +    case 16:
> +        offset = sign_extend(env->vfp.vreg[rs2].s16[index], 16) + nf * mem;
> +        break;
> +    case 32:
> +        offset = sign_extend(env->vfp.vreg[rs2].s32[index], 32) + nf * mem;
> +        break;
> +    case 64:
> +        offset = env->vfp.vreg[rs2].s64[index] + nf * mem;
> +        break;
> +    default:
> +        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());

This is broken.  You cannot use GETPC() anywhere except in the outermost
HELPER().  Otherwise you're not computing the return address back into the
code_gen_buffer, which is what is required to properly unwind the guest state.


> +static inline bool vector_vtype_ill(CPURISCVState *env)
> +{
> +    if ((env->vfp.vtype >> (sizeof(target_ulong) - 1)) & 0x1) {
> +        return true;
> +    }
> +    return false;
> +}
> +
> +static inline void vector_vtype_set_ill(CPURISCVState *env)
> +{
> +    env->vfp.vtype = ((target_ulong)1) << (sizeof(target_ulong) - 1);
> +    return;
> +}
> +
> +static inline int vector_vtype_get_sew(CPURISCVState *env)
> +{
> +    return (env->vfp.vtype >> 2) & 0x7;
> +}
> +
> +static inline int vector_get_width(CPURISCVState *env)
> +{
> +    return  8 * (1 << vector_vtype_get_sew(env));
> +}
> +
> +static inline int vector_get_lmul(CPURISCVState *env)
> +{
> +    return 1 << (env->vfp.vtype & 0x3);
> +}
> +
> +static inline int vector_get_vlmax(CPURISCVState *env)
> +{
> +    return vector_get_lmul(env) * VLEN / vector_get_width(env);
> +}
> +
> +static inline int vector_elem_mask(CPURISCVState *env, uint32_t vm, int width,
> +    int lmul, int index)
> +{
> +    int mlen = width / lmul;
> +    int idx = (index * mlen) / 8;
> +    int pos = (index * mlen) % 8;
> +
> +    return vm || ((env->vfp.vreg[0].u8[idx] >> pos) & 0x1);
> +}

I would strongly encourage you place the components of vtype within tb_flags
via cpu_get_tb_cpu_state.  This would allow you to move quite a few checks from
run-time to translation-time.

Recall that translation happens once (per configuration), whereas execution
happens many times.  Obviously, the more configurations that we create, the
more translation that must happen.

But the vtypei argument to vsetvli is a good choice, because it is constant,
relates directly to the compiled code, and is unrelated to the length of the
data being processed.

With that, you can verify at translation:

(1) vill
(2) v[n], for (n % lmul) != 0
(3) v[n] overlapping v[0] for masked/carry operations, with lmul > 1

and

(4) you can arrange the helpers so that instead of 1 helper that has to
    handle all SEW, you have N helpers, each handling a different SEW.

And with all of this done, I believe you no longer need to pass the register
number to the helper.  You can pass the address of v[n], which is much more
like how the tcg generic vector support works.

Whether or not to include VL in tb_flags is a harder choice.  Certainly not the
exact value of VL, as that would lead to different translations for every loop
tail.  But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
single bit.  Knowing that this condition is true would allow some use of the
tcg generic vector support.

E.g. vadd.vv could be

    if (masked) {
        switch (SEW) {
        case MO_8:
            gen_helper_vadd8_mask(...);
            break;
        ...
        }
    } else if (vl_eq_vlmax) {
        tcg_gen_gvec_add(SEW, vreg_ofs(vd), vreg_ofs(vs2), vreg_ofs(vs1),
                         VLEN * LMUL, VLEN * LMUL);
    } else {
        switch (SEW) {
        case MO_8:
            gen_helper_vadd8(...);
            break;
        ...
        }
    }

Or, equivalently, pack pointers to the actual generator functions into a
structure so that this code structure can be shared between many instructions.

Bear in mind that all tcg gvec operations operate strictly upon lanes.  I.e.

   vd[x] = vs1[x] op vs2[x]

thus the actual arrangement of the elements in storage is irrelevant and SLEN
need not be considered here.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28 18:54 ` Richard Henderson
@ 2019-08-28 20:43   ` Richard Henderson
  2019-08-29 12:45     ` liuzhiwei
  2019-09-02  9:43   ` liuzhiwei
  2019-12-19  9:11   ` LIU Zhiwei
  2 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2019-08-28 20:43 UTC (permalink / raw)
  To: liuzhiwei, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien

On 8/28/19 11:54 AM, Richard Henderson wrote:
> But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
> single bit.

BTW, it is reasonable to check VSTART == 0 always.  Quoting the spec:

# Implementations are permitted to raise illegal instruction exceptions
# when attempting to execute a vector instruction with a value of vstart
# that the implementation can never produce when executing that same
# instruction with the same vtype setting.

Since qemu will never interrupt a single instruction, each vector instruction
will always run to completion, which clears VSTART.  Since QEMU will never
produce a non-zero value of VSTART, it is allowed to trap on any non-zero
setting of VSTART.

I.e. it can be handled at translation time alongside VILL.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
       [not found] <1566959818-38369-1-git-send-email-zhiwei_liu@c-sky.com>
  2019-08-28  9:08 ` [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 Alex Bennée
  2019-08-28 18:54 ` Richard Henderson
@ 2019-08-28 21:34 ` Alistair Francis
  2019-08-29 12:00   ` liuzhiwei
       [not found] ` <CAL1e-=iHangj7w+HgJ+FM=iqRLmaY-_CYeUv0gx+c8bpScb9RQ@mail.gmail.com>
       [not found] ` <CAEiOBXXofjrY2=sjuMDb9dTV2fk9yUVKnr+qmf+7mg9vki6OCw@mail.gmail.com>
  4 siblings, 1 reply; 27+ messages in thread
From: Alistair Francis @ 2019-08-28 21:34 UTC (permalink / raw)
  To: liuzhiwei
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno

On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>
> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
> ---
>  fpu/softfloat.c                         |   119 +
>  include/fpu/softfloat.h                 |     4 +
>  linux-user/riscv/cpu_loop.c             |     8 +-
>  target/riscv/Makefile.objs              |     2 +-
>  target/riscv/cpu.h                      |    30 +
>  target/riscv/cpu_bits.h                 |    15 +
>  target/riscv/cpu_helper.c               |     7 +
>  target/riscv/csr.c                      |    65 +-
>  target/riscv/helper.h                   |   354 +
>  target/riscv/insn32.decode              |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c                |     1 +
>  target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>  13 files changed, 28017 insertions(+), 9 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>

Hello,

Thanks for the patch!

As others have pointed out you will need to split the patch up into
multiple smaller patches, otherwise it is too hard to review almost
30,000 lines of code.

Can you also include a cover letter with your patch series describing
how you are testing this? AFAIK vector extension support isn't in any
compiler so I'm assuming you are handwriting the assembly or have
toolchain patches. Either way it will help if you can share that so
others can test your implementation.

Alex and Richard have kindly started the review. Once you have
addressed their comments and split this patch up into smaller patches
you can send a v2 and we can go from there.

Once again thanks for doing this implementation for QEMU!

Alistair


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28 21:34 ` Alistair Francis
@ 2019-08-29 12:00   ` liuzhiwei
  2019-08-29 15:14     ` Richard Henderson
  2019-08-29 21:50     ` Alistair Francis
  0 siblings, 2 replies; 27+ messages in thread
From: liuzhiwei @ 2019-08-29 12:00 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno

On 2019/8/29 上午5:34, Alistair Francis wrote:
> On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   fpu/softfloat.c                         |   119 +
>>   include/fpu/softfloat.h                 |     4 +
>>   linux-user/riscv/cpu_loop.c             |     8 +-
>>   target/riscv/Makefile.objs              |     2 +-
>>   target/riscv/cpu.h                      |    30 +
>>   target/riscv/cpu_bits.h                 |    15 +
>>   target/riscv/cpu_helper.c               |     7 +
>>   target/riscv/csr.c                      |    65 +-
>>   target/riscv/helper.h                   |   354 +
>>   target/riscv/insn32.decode              |   374 +-
>>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>   target/riscv/translate.c                |     1 +
>>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>>   13 files changed, 28017 insertions(+), 9 deletions(-)
>>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>   create mode 100644 target/riscv/vector_helper.c
>>
> Hello,
>
> Thanks for the patch!
>
> As others have pointed out you will need to split the patch up into
> multiple smaller patches, otherwise it is too hard to review almost
> 30,000 lines of code.

Hi, Alistair

I'm so sorry for the inconvenience. It will be a patch set with a cover 
letter in V2.

> Can you also include a cover letter with your patch series describing
> how you are testing this? AFAIK vector extension support isn't in any
> compiler so I'm assuming you are handwriting the assembly or have
> toolchain patches. Either way it will help if you can share that so
> others can test your implementation.

Yes, it's handwriting assembly. The assembler in Binutils has support 
Vector extension.  First define an function test_vadd_vv_8 in assembly 
and then it can be called from a C program.

The function is something like

/* vadd.vv */
TEST_FUNC(test_vadd_vv_8)
         vsetvli        t1, x0, e8, m2
         vlb.v           v6, (a4)
         vsb.v           v6, (a3)
         vsetvli        t1, a0, e8, m2
         vlb.v           v0, (a1)
         vlb.v           v2, (a2)
         vadd.vv     v4, v0, v2
         vsb.v          v4, (a3)
ret
         .size   test_vadd_vv_8, .-test_vadd_vv_8

It takes more time to test than to implement the instructions. Maybe 
there is some better test method or some forced test cases in QEMU. 
Could you give me some advice for testing?

Best Regards,

Zhiwei

> Alex and Richard have kindly started the review. Once you have
> addressed their comments and split this patch up into smaller patches
> you can send a v2 and we can go from there.
>
> Once again thanks for doing this implementation for QEMU!
>
> Alistair
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28 20:43   ` Richard Henderson
@ 2019-08-29 12:45     ` liuzhiwei
  2019-08-29 15:09       ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: liuzhiwei @ 2019-08-29 12:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien


On 2019/8/29 上午4:43, Richard Henderson wrote:
> On 8/28/19 11:54 AM, Richard Henderson wrote:
>> But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
>> single bit.
> BTW, it is reasonable to check VSTART == 0 always.  Quoting the spec:
>
> # Implementations are permitted to raise illegal instruction exceptions
> # when attempting to execute a vector instruction with a value of vstart
> # that the implementation can never produce when executing that same
> # instruction with the same vtype setting.
>
> Since qemu will never interrupt a single instruction, each vector instruction
> will always run to completion, which clears VSTART.  Since QEMU will never
> produce a non-zero value of VSTART, it is allowed to trap on any non-zero
> setting of VSTART.
>
> I.e. it can be handled at translation time alongside VILL.

Hi, Richard

I am so sorry for the inconvenience. It is very kind of you to review 
the horrible long code and give so many comments.

Even in qemu,  it may be some situations that VSTART != 0. For example, 
a load instruction leads to a page fault exception in a middle position. 
If VSTART == 0,  some elements that had been loaded before the exception 
will be loaded once again.

Specially,  it may be a mistake if  the instruction restores execution 
with VSTART==  0.  When lmul == 1,

    "vlb v0 ,(a0), v0.t"

As v0 is the mask register,  if it is modified,  some part of it can't 
be used again.

It will take some time to address the other comments. After that I will 
split the patch into patch set with a cover letter in V2.

Thank you again for your review!

Best Regards,

Zhiwei

>
>
> r~
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28  9:08 ` [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 Alex Bennée
  2019-08-28 16:39   ` Richard Henderson
@ 2019-08-29 13:35   ` liuzhiwei
  1 sibling, 0 replies; 27+ messages in thread
From: liuzhiwei @ 2019-08-29 13:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, palmer, qemu-riscv, sagark, kbastian, riku.voipio,
	qemu-devel, laurent, Alistair.Francis, aurelien

Hi,  Alex

On 2019/8/28 下午5:08, Alex Bennée wrote:
> liuzhiwei <zhiwei_liu@c-sky.com> writes:
>
>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   fpu/softfloat.c                         |   119 +
>>   include/fpu/softfloat.h                 |     4 +
> Changes to softfloat should be in a separate patch, but see bellow.
>
>>   linux-user/riscv/cpu_loop.c             |     8 +-
>>   target/riscv/Makefile.objs              |     2 +-
>>   target/riscv/cpu.h                      |    30 +
>>   target/riscv/cpu_bits.h                 |    15 +
>>   target/riscv/cpu_helper.c               |     7 +
>>   target/riscv/csr.c                      |    65 +-
>>   target/riscv/helper.h                   |   354 +
>>   target/riscv/insn32.decode              |   374 +-
>>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>   target/riscv/translate.c                |     1 +
>>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
> This is likely too big to be reviewed. Is it possible to split the patch
> up into more discrete chunks, for example support pieces and then maybe
> a class at a time?

Yes,  a patch set with cover letter will be sent later.

>
>>   13 files changed, 28017 insertions(+), 9 deletions(-)
>>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>   create mode 100644 target/riscv/vector_helper.c
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 2ba36ec..da155ea 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
>>   }
>>
>>   /*----------------------------------------------------------------------------
>> +| Returns the sign bit of the half-precision floating-point value `a'.
>> +*----------------------------------------------------------------------------*/
>> +
>> +static inline flag extractFloat16Sign(float16 a)
>> +{
>> +    return float16_val(a) >> 0xf;
>> +}
>> +
> We are trying to avoid this sort of bit fiddling for new code when we
> already have generic decompose functions that can extract all the parts
> into a common format.
>
>> +
>> +/*----------------------------------------------------------------------------
>>   | Returns the fraction bits of the single-precision floating-point value `a'.
>>   *----------------------------------------------------------------------------*/
>>
>> @@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b, float_status *status)
>>   }
>>
>>   /*----------------------------------------------------------------------------
>> +| Returns 1 if the half-precision floating-point value `a' is less than
>> +| or equal to the corresponding value `b', and 0 otherwise.  The invalid
>> +| exception is raised if either operand is a NaN.  The comparison is performed
>> +| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>> +*----------------------------------------------------------------------------*/
>> +
>> +int float16_le(float16 a, float16 b, float_status *status)
>> +{
>> +    flag aSign, bSign;
>> +    uint16_t av, bv;
>> +    a = float16_squash_input_denormal(a, status);
>> +    b = float16_squash_input_denormal(b, status);
>> +
>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
>> +       ) {
>> +        float_raise(float_flag_invalid, status);
>> +        return 0;
>> +    }
>> +    aSign = extractFloat16Sign( a );
>> +    bSign = extractFloat16Sign( b );
>> +    av = float16_val(a);
>> +    bv = float16_val(b);
>> +    if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv )<<1 ) == 0 );
>> +    return ( av == bv ) || ( aSign ^ ( av < bv ) );
>> +
>> +}
> What does this provide that:
>
>    float16_compare(a, b, status) == float_relation_less;
>
> doesn't?
>
>> +
>> +/*----------------------------------------------------------------------------
>>   | Returns 1 if the single-precision floating-point value `a' is less than
>>   | or equal to the corresponding value `b', and 0 otherwise.  The invalid
>>   | exception is raised if either operand is a NaN.  The comparison is performed
>> @@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b, float_status *status)
>>   | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>>   *----------------------------------------------------------------------------*/
>>
>> +int float16_lt(float16 a, float16 b, float_status *status)
>> +{
>> +    flag aSign, bSign;
>> +    uint16_t av, bv;
>> +    a = float16_squash_input_denormal(a, status);
>> +    b = float16_squash_input_denormal(b, status);
>> +
>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
>> +       ) {
>> +        float_raise(float_flag_invalid, status);
>> +        return 0;
>> +    }
>> +    aSign = extractFloat16Sign( a );
>> +    bSign = extractFloat16Sign( b );
>> +    av = float16_val(a);
>> +    bv = float16_val(b);
>> +    if ( aSign != bSign ) return aSign && ( (uint16_t) ( ( av | bv )<<1 ) != 0 );
>> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
>> +
>> +}
>> +
>> +/*----------------------------------------------------------------------------
>> +| Returns 1 if the single-precision floating-point value `a' is less than
>> +| the corresponding value `b', and 0 otherwise.  The invalid exception is
>> +| raised if either operand is a NaN.  The comparison is performed according
>> +| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>> +*----------------------------------------------------------------------------*/
>> +
>>   int float32_lt(float32 a, float32 b, float_status *status)
>>   {
>>       flag aSign, bSign;
>> @@ -4869,6 +4937,32 @@ int float32_unordered(float32 a, float32 b, float_status *status)
>>   }
>>
>>   /*----------------------------------------------------------------------------
>> +| Returns 1 if the half-precision floating-point value `a' is equal to
>> +| the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
>> +| exception.  The comparison is performed according to the IEC/IEEE Standard
>> +| for Binary Floating-Point Arithmetic.
>> +*----------------------------------------------------------------------------*/
>> +
>> +int float16_eq_quiet(float16 a, float16 b, float_status *status)
>> +{
>> +    a = float16_squash_input_denormal(a, status);
>> +    b = float16_squash_input_denormal(b, status);
>> +
>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
>> +       ) {
>> +        if (float16_is_signaling_nan(a, status)
>> +         || float16_is_signaling_nan(b, status)) {
>> +            float_raise(float_flag_invalid, status);
>> +        }
>> +        return 0;
>> +    }
>> +    return ( float16_val(a) == float16_val(b) ) ||
>> +            ( (uint16_t) ( ( float16_val(a) | float16_val(b) )<<1 ) == 0 );
>> +}
>> +
> See also float_16_compare_quiet
Thank your for reminding me. I did't  find float16_compare and 
float16_compare_quiet interface  before.
>> +
>> +/*----------------------------------------------------------------------------
>>   | Returns 1 if the single-precision floating-point value `a' is equal to
>>   | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
>>   | exception.  The comparison is performed according to the IEC/IEEE Standard
>> @@ -4958,6 +5052,31 @@ int float32_lt_quiet(float32 a, float32 b, float_status *status)
>>   }
>>
>>   /*----------------------------------------------------------------------------
>> +| Returns 1 if the half-precision floating-point values `a' and `b' cannot
>> +| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
>> +| comparison is performed according to the IEC/IEEE Standard for Binary
>> +| Floating-Point Arithmetic.
>> +*----------------------------------------------------------------------------*/
>> +
>> +int float16_unordered_quiet(float16 a, float16 b, float_status *status)
>> +{
>> +    a = float16_squash_input_denormal(a, status);
>> +    b = float16_squash_input_denormal(b, status);
>> +
>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
>> +       ) {
>> +        if (float16_is_signaling_nan(a, status)
>> +         || float16_is_signaling_nan(b, status)) {
>> +            float_raise(float_flag_invalid, status);
>> +        }
>> +        return 1;
>> +    }
>> +    return 0;
>> +}
>> +
>> +
>> +/*----------------------------------------------------------------------------
>>   | Returns 1 if the single-precision floating-point values `a' and `b' cannot
>>   | be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
>>   | comparison is performed according to the IEC/IEEE Standard for Binary
>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>> index 3ff3fa5..3b0754c 100644
>> --- a/include/fpu/softfloat.h
>> +++ b/include/fpu/softfloat.h
>> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status *status);
>>   float16 float16_sqrt(float16, float_status *status);
>>   int float16_compare(float16, float16, float_status *status);
>>   int float16_compare_quiet(float16, float16, float_status *status);
>> +int float16_unordered_quiet(float16, float16, float_status *status);
>> +int float16_le(float16, float16, float_status *status);
>> +int float16_lt(float16, float16, float_status *status);
>> +int float16_eq_quiet(float16, float16, float_status *status);
>>
>>   int float16_is_quiet_nan(float16, float_status *status);
>>   int float16_is_signaling_nan(float16, float_status *status);
>> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
>> index 12aa3c0..b01548a 100644
>> --- a/linux-user/riscv/cpu_loop.c
>> +++ b/linux-user/riscv/cpu_loop.c
>> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>>           signum = 0;
>>           sigcode = 0;
>>           sigaddr = 0;
>> -
>> +        if (env->foflag) {
>> +            if (env->vfp.vl != 0) {
>> +                env->foflag = false;
>> +                env->pc += 4;
>> +                continue;
>> +            }
>> +        }
> What is this trying to do?
Handle Fault-only-first exception.
>
>>           switch (trapnr) {
>>           case EXCP_INTERRUPT:
>>               /* just indicate that signals should be handled asap */
>> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
>> index b1c79bc..d577cef 100644
>> --- a/target/riscv/Makefile.objs
>> +++ b/target/riscv/Makefile.objs
>> @@ -1,4 +1,4 @@
>> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o gdbstub.o pmp.o
>> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o vector_helper.o gdbstub.o pmp.o
>>
>>   DECODETREE = $(SRC_PATH)/scripts/decodetree.py
>>
>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> index 0adb307..5a93aa2 100644
>> --- a/target/riscv/cpu.h
>> +++ b/target/riscv/cpu.h
>> @@ -67,6 +67,7 @@
>>   #define RVC RV('C')
>>   #define RVS RV('S')
>>   #define RVU RV('U')
>> +#define RVV RV('V')
>>
>>   /* S extension denotes that Supervisor mode exists, however it is possible
>>      to have a core that support S mode but does not have an MMU and there
>> @@ -93,9 +94,38 @@ typedef struct CPURISCVState CPURISCVState;
>>
>>   #include "pmp.h"
>>
>> +#define VLEN 128
>> +#define VUNIT(x) (VLEN / x)
>> +
> If you want to do vectors I suggest you look at the TCGvec types for
> passing pointers to vector registers to helpers. In this case you will
> want to ensure your vector registers are properly aligned.
>
>>   struct CPURISCVState {
>>       target_ulong gpr[32];
>>       uint64_t fpr[32]; /* assume both F and D extensions */
>> +
>> +    /* vector coprocessor state.  */
>> +    struct {
>> +        union VECTOR {
>> +            float64  f64[VUNIT(64)];
>> +            float32  f32[VUNIT(32)];
>> +            float16  f16[VUNIT(16)];
>> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
>> +            uint64_t u64[VUNIT(64)];
>> +            int64_t  s64[VUNIT(64)];
>> +            uint32_t u32[VUNIT(32)];
>> +            int32_t  s32[VUNIT(32)];
>> +            uint16_t u16[VUNIT(16)];
>> +            int16_t  s16[VUNIT(16)];
>> +            uint8_t  u8[VUNIT(8)];
>> +            int8_t   s8[VUNIT(8)];
>> +        } vreg[32];
>> +        target_ulong vxrm;
>> +        target_ulong vxsat;
>> +        target_ulong vl;
>> +        target_ulong vstart;
>> +        target_ulong vtype;
>> +        float_status fp_status;
>> +    } vfp;
>> +
>> +    bool         foflag;
> Again I have no idea what foflag is here.
>
>>       target_ulong pc;
>>       target_ulong load_res;
>>       target_ulong load_val;
>> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
>> index 11f971a..9eb43ec 100644
>> --- a/target/riscv/cpu_bits.h
>> +++ b/target/riscv/cpu_bits.h
>> @@ -29,6 +29,14 @@
>>   #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
>>   #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
>>
>> +/* Vector Fixed-Point round model */
>> +#define FSR_VXRM_SHIFT      9
>> +#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
>> +
>> +/* Vector Fixed-Point saturation flag */
>> +#define FSR_VXSAT_SHIFT     8
>> +#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
>> +
>>   /* Control and Status Registers */
>>
>>   /* User Trap Setup */
>> @@ -48,6 +56,13 @@
>>   #define CSR_FRM             0x002
>>   #define CSR_FCSR            0x003
>>
>> +/* User Vector CSRs */
>> +#define CSR_VSTART          0x008
>> +#define CSR_VXSAT           0x009
>> +#define CSR_VXRM            0x00a
>> +#define CSR_VL              0xc20
>> +#define CSR_VTYPE           0xc21
>> +
>>   /* User Timers and Counters */
>>   #define CSR_CYCLE           0xc00
>>   #define CSR_TIME            0xc01
>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> index e32b612..405caf6 100644
>> --- a/target/riscv/cpu_helper.c
>> +++ b/target/riscv/cpu_helper.c
>> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>>           [PRV_H] = RISCV_EXCP_H_ECALL,
>>           [PRV_M] = RISCV_EXCP_M_ECALL
>>       };
>> +    if (env->foflag) {
>> +        if (env->vfp.vl != 0) {
>> +            env->foflag = false;
>> +            env->pc += 4;
>> +            return;
>> +        }
>> +    }
>>
>>       if (!async) {
>>           /* set tval to badaddr for traps with address information */
>> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
>> index e0d4586..a6131ff 100644
>> --- a/target/riscv/csr.c
>> +++ b/target/riscv/csr.c
>> @@ -87,12 +87,12 @@ static int ctr(CPURISCVState *env, int csrno)
>>       return 0;
>>   }
>>
>> -#if !defined(CONFIG_USER_ONLY)
>>   static int any(CPURISCVState *env, int csrno)
>>   {
>>       return 0;
>>   }
>>
>> +#if !defined(CONFIG_USER_ONLY)
>>   static int smode(CPURISCVState *env, int csrno)
>>   {
>>       return -!riscv_has_ext(env, RVS);
>> @@ -158,8 +158,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
>>           return -1;
>>       }
>>   #endif
>> -    *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>> -        | (env->frm << FSR_RD_SHIFT);
>> +    *val = (env->vfp.vxrm << FSR_VXRM_SHIFT)
>> +            | (env->vfp.vxsat << FSR_VXSAT_SHIFT)
>> +            | (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>> +            | (env->frm << FSR_RD_SHIFT);
>>       return 0;
>>   }
>>
>> @@ -172,10 +174,60 @@ static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
>>       env->mstatus |= MSTATUS_FS;
>>   #endif
>>       env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
>> +    env->vfp.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
>> +    env->vfp.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
>>       riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
>>       return 0;
>>   }
>>
>> +static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
>> +{
>> +    *val = env->vfp.vtype;
>> +    return 0;
>> +}
>> +
>> +static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
>> +{
>> +    *val = env->vfp.vl;
>> +    return 0;
>> +}
>> +
>> +static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
>> +{
>> +    *val = env->vfp.vxrm;
>> +    return 0;
>> +}
>> +
>> +static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
>> +{
>> +    *val = env->vfp.vxsat;
>> +    return 0;
>> +}
>> +
>> +static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
>> +{
>> +    *val = env->vfp.vstart;
>> +    return 0;
>> +}
>> +
>> +static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
>> +{
>> +    env->vfp.vxrm = val;
>> +    return 0;
>> +}
>> +
>> +static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
>> +{
>> +    env->vfp.vxsat = val;
>> +    return 0;
>> +}
>> +
>> +static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
>> +{
>> +    env->vfp.vstart = val;
>> +    return 0;
>> +}
> A fixed return value makes me think these should be void functions.
Good!
>
>> +
>>   /* User Timers and Counters */
>>   static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
>>   {
>> @@ -873,7 +925,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>>       [CSR_FFLAGS] =              { fs,   read_fflags,      write_fflags      },
>>       [CSR_FRM] =                 { fs,   read_frm,         write_frm         },
>>       [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr        },
>> -
>> +    /* Vector CSRs */
>> +    [CSR_VSTART] =              { any,   read_vstart,     write_vstart      },
>> +    [CSR_VXSAT] =               { any,   read_vxsat,      write_vxsat       },
>> +    [CSR_VXRM] =                { any,   read_vxrm,       write_vxrm        },
>> +    [CSR_VL] =                  { any,   read_vl                            },
>> +    [CSR_VTYPE] =               { any,   read_vtype                         },
>>       /* User Timers and Counters */
>>       [CSR_CYCLE] =               { ctr,  read_instret                        },
>>       [CSR_INSTRET] =             { ctr,  read_instret                        },
>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>> index debb22a..fee02c0 100644
>> --- a/target/riscv/helper.h
>> +++ b/target/riscv/helper.h
>> @@ -76,3 +76,357 @@ DEF_HELPER_2(mret, tl, env, tl)
>>   DEF_HELPER_1(wfi, void, env)
>>   DEF_HELPER_1(tlb_flush, void, env)
>>   #endif
>> +/* Vector functions */
> Think about how you could split this patch up to introduce a group of
> instructions at a time. This will make it a lot easier review.
>
> I'm going to leave review of the specifics to the RISCV maintainers but
> I suspect they will want to wait until a v2 of the series. However it
> looks like a good first pass at implementing vectors.
>
> --
> Alex Bennée

It will not change softfloat in patch  V2. Thank you again for your review!

Best Regards,

Zhiwei

>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 12:45     ` liuzhiwei
@ 2019-08-29 15:09       ` Richard Henderson
  2019-09-02  7:45         ` liuzhiwei
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2019-08-29 15:09 UTC (permalink / raw)
  To: liuzhiwei, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien

On 8/29/19 5:45 AM, liuzhiwei wrote:
> Even in qemu,  it may be some situations that VSTART != 0. For example, a load
> instruction leads to a page fault exception in a middle position. If VSTART ==
> 0,  some elements that had been loaded before the exception will be loaded once
> again.

Alternately, you can validate all of the pages before performing any memory
operations.  At which point there will never be an exception in the middle.

As it turns out, you *must* do this in order to allow watchpoints to work
correctly.  David Hildebrand and I are at this moment fixing this aspect of
watchpoints for s390x.

See https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05979.html


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 12:00   ` liuzhiwei
@ 2019-08-29 15:14     ` Richard Henderson
  2019-09-02  6:54       ` liuzhiwei
  2019-08-29 21:50     ` Alistair Francis
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2019-08-29 15:14 UTC (permalink / raw)
  To: liuzhiwei, Alistair Francis
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno

On 8/29/19 5:00 AM, liuzhiwei wrote:
> Maybe there is some better test method or some forced test cases in QEMU. Could
> you give me some advice for testing?

If you have hardware, or another simulator, RISU is very good
for testing these sorts of things.

See https://git.linaro.org/people/pmaydell/risu.git

You'll need to write new support for RISC-V, but it's not hard
and we can help out with that.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
       [not found]   ` <46ade3da-d642-bd19-7975-7dc228d401e4@c-sky.com>
@ 2019-08-29 18:32     ` Aleksandar Markovic
  0 siblings, 0 replies; 27+ messages in thread
From: Aleksandar Markovic @ 2019-08-29 18:32 UTC (permalink / raw)
  To: liuzhiwei
  Cc: Peter Maydell, Palmer Dabbelt, open list:RISC-V,
	Sagar Karandikar, Bastian Koppelmann, Riku Voipio,
	Laurent Vivier, QEMU Developers, Alistair Francis,
	Alex Bennée, Aurelien Jarno

29.08.2019. 15.02, "liuzhiwei" <zhiwei_liu@c-sky.com> је написао/ла:
>
>
> On 2019/8/29 上午3:20, Aleksandar Markovic wrote:
>>
>>
>>
>> > On Wed, Aug 28, 2019 at 9:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>>>
>>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>>> ---
>>
>>
>> Such large patch and "Change-Id:
I3cf891bc400713b95f47ecca82b1bf773f3dcb25" is its entire commit message??
Horrible.
>
> Hi,  Aleksandar
>
> I am so sorry. A patch set with cover letter will be sent later.
>
> Best Regards,
>
> Zhiwei

OK, Zhiwei,

You'll soon get more used  to participating in open source, and write much
better patches.

Try to follow guidelines described at
https://wiki.qemu.org/Contribute/SubmitAPatch
Thanks,
Aleksandar



>>
>> Aleksandar
>>
>>>
>>>  fpu/softfloat.c                         |   119 +
>>>  include/fpu/softfloat.h                 |     4 +
>>>  linux-user/riscv/cpu_loop.c             |     8 +-
>>>  target/riscv/Makefile.objs              |     2 +-
>>>  target/riscv/cpu.h                      |    30 +
>>>  target/riscv/cpu_bits.h                 |    15 +
>>>  target/riscv/cpu_helper.c               |     7 +
>>>  target/riscv/csr.c                      |    65 +-
>>>  target/riscv/helper.h                   |   354 +
>>>  target/riscv/insn32.decode              |   374 +-
>>>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>>  target/riscv/translate.c                |     1 +
>>>  target/riscv/vector_helper.c            | 26563
++++++++++++++++++++++++++++++
>>>  13 files changed, 28017 insertions(+), 9 deletions(-)
>>>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>>  create mode 100644 target/riscv/vector_helper.c
>>>
>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>> index 2ba36ec..da155ea 100644
>>> --- a/fpu/softfloat.c
>>> +++ b/fpu/softfloat.c
>>> @@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
>>>  }
>>>
>>>
 /*----------------------------------------------------------------------------
>>> +| Returns the sign bit of the half-precision floating-point value `a'.
>>>
+*----------------------------------------------------------------------------*/
>>> +
>>> +static inline flag extractFloat16Sign(float16 a)
>>> +{
>>> +    return float16_val(a) >> 0xf;
>>> +}
>>> +
>>> +
>>>
+/*----------------------------------------------------------------------------
>>>  | Returns the fraction bits of the single-precision floating-point
value `a'.
>>>
 *----------------------------------------------------------------------------*/
>>>
>>> @@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b,
float_status *status)
>>>  }
>>>
>>>
 /*----------------------------------------------------------------------------
>>> +| Returns 1 if the half-precision floating-point value `a' is less than
>>> +| or equal to the corresponding value `b', and 0 otherwise.  The
invalid
>>> +| exception is raised if either operand is a NaN.  The comparison is
performed
>>> +| according to the IEC/IEEE Standard for Binary Floating-Point
Arithmetic.
>>>
+*----------------------------------------------------------------------------*/
>>> +
>>> +int float16_le(float16 a, float16 b, float_status *status)
>>> +{
>>> +    flag aSign, bSign;
>>> +    uint16_t av, bv;
>>> +    a = float16_squash_input_denormal(a, status);
>>> +    b = float16_squash_input_denormal(b, status);
>>> +
>>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +       ) {
>>> +        float_raise(float_flag_invalid, status);
>>> +        return 0;
>>> +    }
>>> +    aSign = extractFloat16Sign( a );
>>> +    bSign = extractFloat16Sign( b );
>>> +    av = float16_val(a);
>>> +    bv = float16_val(b);
>>> +    if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv
)<<1 ) == 0 );
>>> +    return ( av == bv ) || ( aSign ^ ( av < bv ) );
>>> +
>>> +}
>>> +
>>>
+/*----------------------------------------------------------------------------
>>>  | Returns 1 if the single-precision floating-point value `a' is less
than
>>>  | or equal to the corresponding value `b', and 0 otherwise.  The
invalid
>>>  | exception is raised if either operand is a NaN.  The comparison is
performed
>>> @@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b,
float_status *status)
>>>  | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>>>
 *----------------------------------------------------------------------------*/
>>>
>>> +int float16_lt(float16 a, float16 b, float_status *status)
>>> +{
>>> +    flag aSign, bSign;
>>> +    uint16_t av, bv;
>>> +    a = float16_squash_input_denormal(a, status);
>>> +    b = float16_squash_input_denormal(b, status);
>>> +
>>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +       ) {
>>> +        float_raise(float_flag_invalid, status);
>>> +        return 0;
>>> +    }
>>> +    aSign = extractFloat16Sign( a );
>>> +    bSign = extractFloat16Sign( b );
>>> +    av = float16_val(a);
>>> +    bv = float16_val(b);
>>> +    if ( aSign != bSign ) return aSign && ( (uint16_t) ( ( av | bv
)<<1 ) != 0 );
>>> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
>>> +
>>> +}
>>> +
>>>
+/*----------------------------------------------------------------------------
>>> +| Returns 1 if the single-precision floating-point value `a' is less
than
>>> +| the corresponding value `b', and 0 otherwise.  The invalid exception
is
>>> +| raised if either operand is a NaN.  The comparison is performed
according
>>> +| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>>>
+*----------------------------------------------------------------------------*/
>>> +
>>>  int float32_lt(float32 a, float32 b, float_status *status)
>>>  {
>>>      flag aSign, bSign;
>>> @@ -4869,6 +4937,32 @@ int float32_unordered(float32 a, float32 b,
float_status *status)
>>>  }
>>>
>>>
 /*----------------------------------------------------------------------------
>>> +| Returns 1 if the half-precision floating-point value `a' is equal to
>>> +| the corresponding value `b', and 0 otherwise.  Quiet NaNs do not
cause an
>>> +| exception.  The comparison is performed according to the IEC/IEEE
Standard
>>> +| for Binary Floating-Point Arithmetic.
>>>
+*----------------------------------------------------------------------------*/
>>> +
>>> +int float16_eq_quiet(float16 a, float16 b, float_status *status)
>>> +{
>>> +    a = float16_squash_input_denormal(a, status);
>>> +    b = float16_squash_input_denormal(b, status);
>>> +
>>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +       ) {
>>> +        if (float16_is_signaling_nan(a, status)
>>> +         || float16_is_signaling_nan(b, status)) {
>>> +            float_raise(float_flag_invalid, status);
>>> +        }
>>> +        return 0;
>>> +    }
>>> +    return ( float16_val(a) == float16_val(b) ) ||
>>> +            ( (uint16_t) ( ( float16_val(a) | float16_val(b) )<<1 ) ==
0 );
>>> +}
>>> +
>>> +
>>>
+/*----------------------------------------------------------------------------
>>>  | Returns 1 if the single-precision floating-point value `a' is equal
to
>>>  | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not
cause an
>>>  | exception.  The comparison is performed according to the IEC/IEEE
Standard
>>> @@ -4958,6 +5052,31 @@ int float32_lt_quiet(float32 a, float32 b,
float_status *status)
>>>  }
>>>
>>>
 /*----------------------------------------------------------------------------
>>> +| Returns 1 if the half-precision floating-point values `a' and `b'
cannot
>>> +| be compared, and 0 otherwise.  Quiet NaNs do not cause an
exception.  The
>>> +| comparison is performed according to the IEC/IEEE Standard for Binary
>>> +| Floating-Point Arithmetic.
>>>
+*----------------------------------------------------------------------------*/
>>> +
>>> +int float16_unordered_quiet(float16 a, float16 b, float_status *status)
>>> +{
>>> +    a = float16_squash_input_denormal(a, status);
>>> +    b = float16_squash_input_denormal(b, status);
>>> +
>>> +    if (    ( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> +         || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +       ) {
>>> +        if (float16_is_signaling_nan(a, status)
>>> +         || float16_is_signaling_nan(b, status)) {
>>> +            float_raise(float_flag_invalid, status);
>>> +        }
>>> +        return 1;
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>
+/*----------------------------------------------------------------------------
>>>  | Returns 1 if the single-precision floating-point values `a' and `b'
cannot
>>>  | be compared, and 0 otherwise.  Quiet NaNs do not cause an
exception.  The
>>>  | comparison is performed according to the IEC/IEEE Standard for Binary
>>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>>> index 3ff3fa5..3b0754c 100644
>>> --- a/include/fpu/softfloat.h
>>> +++ b/include/fpu/softfloat.h
>>> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16,
float_status *status);
>>>  float16 float16_sqrt(float16, float_status *status);
>>>  int float16_compare(float16, float16, float_status *status);
>>>  int float16_compare_quiet(float16, float16, float_status *status);
>>> +int float16_unordered_quiet(float16, float16, float_status *status);
>>> +int float16_le(float16, float16, float_status *status);
>>> +int float16_lt(float16, float16, float_status *status);
>>> +int float16_eq_quiet(float16, float16, float_status *status);
>>>
>>>  int float16_is_quiet_nan(float16, float_status *status);
>>>  int float16_is_signaling_nan(float16, float_status *status);
>>> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
>>> index 12aa3c0..b01548a 100644
>>> --- a/linux-user/riscv/cpu_loop.c
>>> +++ b/linux-user/riscv/cpu_loop.c
>>> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>>>          signum = 0;
>>>          sigcode = 0;
>>>          sigaddr = 0;
>>> -
>>> +        if (env->foflag) {
>>> +            if (env->vfp.vl != 0) {
>>> +                env->foflag = false;
>>> +                env->pc += 4;
>>> +                continue;
>>> +            }
>>> +        }
>>>          switch (trapnr) {
>>>          case EXCP_INTERRUPT:
>>>              /* just indicate that signals should be handled asap */
>>> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
>>> index b1c79bc..d577cef 100644
>>> --- a/target/riscv/Makefile.objs
>>> +++ b/target/riscv/Makefile.objs
>>> @@ -1,4 +1,4 @@
>>> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o
gdbstub.o pmp.o
>>> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o
vector_helper.o gdbstub.o pmp.o
>>>
>>>  DECODETREE = $(SRC_PATH)/scripts/decodetree.py
>>>
>>> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>>> index 0adb307..5a93aa2 100644
>>> --- a/target/riscv/cpu.h
>>> +++ b/target/riscv/cpu.h
>>> @@ -67,6 +67,7 @@
>>>  #define RVC RV('C')
>>>  #define RVS RV('S')
>>>  #define RVU RV('U')
>>> +#define RVV RV('V')
>>>
>>>  /* S extension denotes that Supervisor mode exists, however it is
possible
>>>     to have a core that support S mode but does not have an MMU and
there
>>> @@ -93,9 +94,38 @@ typedef struct CPURISCVState CPURISCVState;
>>>
>>>  #include "pmp.h"
>>>
>>> +#define VLEN 128
>>> +#define VUNIT(x) (VLEN / x)
>>> +
>>>  struct CPURISCVState {
>>>      target_ulong gpr[32];
>>>      uint64_t fpr[32]; /* assume both F and D extensions */
>>> +
>>> +    /* vector coprocessor state.  */
>>> +    struct {
>>> +        union VECTOR {
>>> +            float64  f64[VUNIT(64)];
>>> +            float32  f32[VUNIT(32)];
>>> +            float16  f16[VUNIT(16)];
>>> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
>>> +            uint64_t u64[VUNIT(64)];
>>> +            int64_t  s64[VUNIT(64)];
>>> +            uint32_t u32[VUNIT(32)];
>>> +            int32_t  s32[VUNIT(32)];
>>> +            uint16_t u16[VUNIT(16)];
>>> +            int16_t  s16[VUNIT(16)];
>>> +            uint8_t  u8[VUNIT(8)];
>>> +            int8_t   s8[VUNIT(8)];
>>> +        } vreg[32];
>>> +        target_ulong vxrm;
>>> +        target_ulong vxsat;
>>> +        target_ulong vl;
>>> +        target_ulong vstart;
>>> +        target_ulong vtype;
>>> +        float_status fp_status;
>>> +    } vfp;
>>> +
>>> +    bool         foflag;
>>>      target_ulong pc;
>>>      target_ulong load_res;
>>>      target_ulong load_val;
>>> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
>>> index 11f971a..9eb43ec 100644
>>> --- a/target/riscv/cpu_bits.h
>>> +++ b/target/riscv/cpu_bits.h
>>> @@ -29,6 +29,14 @@
>>>  #define FSR_NXA             (FPEXC_NX << FSR_AEXC_SHIFT)
>>>  #define FSR_AEXC            (FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA |
FSR_NXA)
>>>
>>> +/* Vector Fixed-Point round model */
>>> +#define FSR_VXRM_SHIFT      9
>>> +#define FSR_VXRM            (0x3 << FSR_VXRM_SHIFT)
>>> +
>>> +/* Vector Fixed-Point saturation flag */
>>> +#define FSR_VXSAT_SHIFT     8
>>> +#define FSR_VXSAT           (0x1 << FSR_VXSAT_SHIFT)
>>> +
>>>  /* Control and Status Registers */
>>>
>>>  /* User Trap Setup */
>>> @@ -48,6 +56,13 @@
>>>  #define CSR_FRM             0x002
>>>  #define CSR_FCSR            0x003
>>>
>>> +/* User Vector CSRs */
>>> +#define CSR_VSTART          0x008
>>> +#define CSR_VXSAT           0x009
>>> +#define CSR_VXRM            0x00a
>>> +#define CSR_VL              0xc20
>>> +#define CSR_VTYPE           0xc21
>>> +
>>>  /* User Timers and Counters */
>>>  #define CSR_CYCLE           0xc00
>>>  #define CSR_TIME            0xc01
>>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>>> index e32b612..405caf6 100644
>>> --- a/target/riscv/cpu_helper.c
>>> +++ b/target/riscv/cpu_helper.c
>>> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>>>          [PRV_H] = RISCV_EXCP_H_ECALL,
>>>          [PRV_M] = RISCV_EXCP_M_ECALL
>>>      };
>>> +    if (env->foflag) {
>>> +        if (env->vfp.vl != 0) {
>>> +            env->foflag = false;
>>> +            env->pc += 4;
>>> +            return;
>>> +        }
>>> +    }
>>>
>>>      if (!async) {
>>>          /* set tval to badaddr for traps with address information */
>>> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
>>> index e0d4586..a6131ff 100644
>>> --- a/target/riscv/csr.c
>>> +++ b/target/riscv/csr.c
>>> @@ -87,12 +87,12 @@ static int ctr(CPURISCVState *env, int csrno)
>>>      return 0;
>>>  }
>>>
>>> -#if !defined(CONFIG_USER_ONLY)
>>>  static int any(CPURISCVState *env, int csrno)
>>>  {
>>>      return 0;
>>>  }
>>>
>>> +#if !defined(CONFIG_USER_ONLY)
>>>  static int smode(CPURISCVState *env, int csrno)
>>>  {
>>>      return -!riscv_has_ext(env, RVS);
>>> @@ -158,8 +158,10 @@ static int read_fcsr(CPURISCVState *env, int
csrno, target_ulong *val)
>>>          return -1;
>>>      }
>>>  #endif
>>> -    *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>>> -        | (env->frm << FSR_RD_SHIFT);
>>> +    *val = (env->vfp.vxrm << FSR_VXRM_SHIFT)
>>> +            | (env->vfp.vxsat << FSR_VXSAT_SHIFT)
>>> +            | (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
>>> +            | (env->frm << FSR_RD_SHIFT);
>>>      return 0;
>>>  }
>>>
>>> @@ -172,10 +174,60 @@ static int write_fcsr(CPURISCVState *env, int
csrno, target_ulong val)
>>>      env->mstatus |= MSTATUS_FS;
>>>  #endif
>>>      env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
>>> +    env->vfp.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
>>> +    env->vfp.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
>>>      riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
>>>      return 0;
>>>  }
>>>
>>> +static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
>>> +{
>>> +    *val = env->vfp.vtype;
>>> +    return 0;
>>> +}
>>> +
>>> +static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
>>> +{
>>> +    *val = env->vfp.vl;
>>> +    return 0;
>>> +}
>>> +
>>> +static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
>>> +{
>>> +    *val = env->vfp.vxrm;
>>> +    return 0;
>>> +}
>>> +
>>> +static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
>>> +{
>>> +    *val = env->vfp.vxsat;
>>> +    return 0;
>>> +}
>>> +
>>> +static int read_vstart(CPURISCVState *env, int csrno, target_ulong
*val)
>>> +{
>>> +    *val = env->vfp.vstart;
>>> +    return 0;
>>> +}
>>> +
>>> +static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
>>> +{
>>> +    env->vfp.vxrm = val;
>>> +    return 0;
>>> +}
>>> +
>>> +static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
>>> +{
>>> +    env->vfp.vxsat = val;
>>> +    return 0;
>>> +}
>>> +
>>> +static int write_vstart(CPURISCVState *env, int csrno, target_ulong
val)
>>> +{
>>> +    env->vfp.vstart = val;
>>> +    return 0;
>>> +}
>>> +
>>>  /* User Timers and Counters */
>>>  static int read_instret(CPURISCVState *env, int csrno, target_ulong
*val)
>>>  {
>>> @@ -873,7 +925,12 @@ static riscv_csr_operations
csr_ops[CSR_TABLE_SIZE] = {
>>>      [CSR_FFLAGS] =              { fs,   read_fflags,
write_fflags      },
>>>      [CSR_FRM] =                 { fs,   read_frm,         write_frm
     },
>>>      [CSR_FCSR] =                { fs,   read_fcsr,        write_fcsr
      },
>>> -
>>> +    /* Vector CSRs */
>>> +    [CSR_VSTART] =              { any,   read_vstart,
 write_vstart      },
>>> +    [CSR_VXSAT] =               { any,   read_vxsat,      write_vxsat
     },
>>> +    [CSR_VXRM] =                { any,   read_vxrm,       write_vxrm
      },
>>> +    [CSR_VL] =                  { any,   read_vl
      },
>>> +    [CSR_VTYPE] =               { any,   read_vtype
     },
>>>      /* User Timers and Counters */
>>>      [CSR_CYCLE] =               { ctr,  read_instret
      },
>>>      [CSR_INSTRET] =             { ctr,  read_instret
      },
>>> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
>>> index debb22a..fee02c0 100644
>>> --- a/target/riscv/helper.h
>>> +++ b/target/riscv/helper.h
>>> @@ -76,3 +76,357 @@ DEF_HELPER_2(mret, tl, env, tl)
>>>  DEF_HELPER_1(wfi, void, env)
>>>  DEF_HELPER_1(tlb_flush, void, env)
>>>  #endif
>>> +/* Vector functions */
>>> +DEF_HELPER_5(vector_vlb_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlh_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlw_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vle_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlbu_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlhu_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlwu_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlbff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlhff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlwff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vleff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlbuff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlhuff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vlwuff_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsb_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsh_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsw_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vse_v, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlsb_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlsh_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlsw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlse_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlsbu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlshu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlswu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vssb_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vssh_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vssw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsse_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxb_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxh_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxe_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxbu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxhu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vlxwu_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsxb_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsxh_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsxw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsxe_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsuxb_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsuxh_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsuxw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vsuxe_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoswapw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoswapd_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoaddw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoaddd_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoxorw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoxord_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoandw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoandd_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoorw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamoord_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamominw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamomind_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamomaxw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamomaxd_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamominuw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamominud_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamomaxuw_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_6(vector_vamomaxud_v, void, env, i32, i32, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vext_x_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfmv_f_s, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmv_s_x, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfmv_s_f, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vadc_vvm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vadc_vxm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vadc_vim, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmadc_vvm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmadc_vxm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmadc_vim, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vsbc_vvm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vsbc_vxm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmsbc_vvm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmsbc_vxm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmpopc_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmfirst_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vcompress_vm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmandnot_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmand_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmor_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmxor_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmornot_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmnand_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmnor_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmxnor_mm, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmsbf_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmsof_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vmsif_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_viota_m, void, env, i32, i32, i32)
>>> +DEF_HELPER_3(vector_vid_v, void, env, i32, i32)
>>> +DEF_HELPER_4(vector_vfcvt_xu_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfcvt_x_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfcvt_f_xu_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfcvt_f_x_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfwcvt_xu_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfwcvt_x_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfwcvt_f_xu_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfwcvt_f_x_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfwcvt_f_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfncvt_xu_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfncvt_x_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfncvt_f_xu_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfncvt_f_x_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfncvt_f_f_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfsqrt_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vfclass_v, void, env, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vadd_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vadd_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredsum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfadd_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredand_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfredsum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredor_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsub_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrsub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrsub_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredxor_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfredosum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vminu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vminu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredminu_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmin_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmin_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmin_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmin_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredmin_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfredmin_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmaxu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmaxu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredmaxu_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmax_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmax_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmax_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmax_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vredmax_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfredmax_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnj_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnj_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vand_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vand_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vand_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnjn_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnjn_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vor_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vor_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vor_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnjx_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfsgnjx_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vxor_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vxor_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vxor_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrgather_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrgather_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrgather_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslideup_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslideup_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslide1up_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslidedown_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslidedown_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vslide1down_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmerge_vvm, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmerge_vxm, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmerge_vim, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmerge_vfm, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmseq_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmseq_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmseq_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfeq_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfeq_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsne_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsne_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsne_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfle_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfle_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsltu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsltu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmford_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmford_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmslt_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmslt_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmflt_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmflt_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsleu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsleu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsleu_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfne_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfne_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsle_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsle_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsle_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfgt_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsgtu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsgtu_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsgt_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmsgt_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmfge_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsaddu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsaddu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsaddu_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vdivu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vdivu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfdiv_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfdiv_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsadd_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsadd_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vdiv_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vdiv_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfrdiv_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssubu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssubu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vremu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vremu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrem_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vrem_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vaadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vaadd_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vaadd_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulhu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulhu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmul_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmul_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsll_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsll_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsll_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmul_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmul_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vasub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vasub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulhsu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulhsu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsmul_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsmul_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulh_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmulh_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfrsub_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsrl_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsrl_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsrl_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmadd_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsra_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsra_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vsra_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmadd_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmadd_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssrl_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssrl_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssrl_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmsub_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssra_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssra_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vssra_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnmsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnmsub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmsub_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsrl_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsrl_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsrl_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmacc_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsra_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsra_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnsra_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vmacc_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmacc_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclipu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclipu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclipu_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmsac_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfmsac_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclip_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclip_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnclip_vi, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnmsac_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vnmsac_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmsac_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfnmsac_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwredsumu_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwaddu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwaddu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwadd_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwredsum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwadd_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwadd_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwredsum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsubu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsubu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwsub_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsub_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsub_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwredosum_vs, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwaddu_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwaddu_wx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwadd_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwadd_wf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwadd_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwadd_wx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsubu_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsubu_wx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwsub_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwsub_wf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsub_wv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsub_wx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmulu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmulu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmul_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmul_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmulsu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmulsu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmul_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmul_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmaccu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmaccu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmaccu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmaccu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmacc_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmacc_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmacc_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwnmacc_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwnmacc_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmaccsu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmaccsu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmaccsu_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmaccsu_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmsac_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwmsac_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwsmaccus_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vwmaccus_vx, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwnmsac_vv, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_5(vector_vfwnmsac_vf, void, env, i32, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vsetvli, void, env, i32, i32, i32)
>>> +DEF_HELPER_4(vector_vsetvl, void, env, i32, i32, i32)
>>> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
>>> index 77f794e..d125ff9 100644
>>> --- a/target/riscv/insn32.decode
>>> +++ b/target/riscv/insn32.decode
>>> @@ -25,7 +25,7 @@
>>>  %sh10    20:10
>>>  %csr    20:12
>>>  %rm     12:3
>>> -
>>> +%nf     29:3
>>>  # immediates:
>>>  %imm_i    20:s12
>>>  %imm_s    25:s7 7:5
>>> @@ -43,7 +43,6 @@
>>>  &u    imm rd
>>>  &shift     shamt rs1 rd
>>>  &atomic    aq rl rs2 rs1 rd
>>> -
>>>  # Formats 32:
>>>  @r       .......   ..... ..... ... ..... ....... &r
%rs2 %rs1 %rd
>>>  @i       ............    ..... ... ..... ....... &i      imm=%imm_i
 %rs1 %rd
>>> @@ -62,11 +61,17 @@
>>>  @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
>>>  @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
>>>  @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
>>> +@r_vm    ...... vm:1 ..... ..... ... ..... ....... %rs2 %rs1 %rd
>>> +@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... %rs2 %rs1 %rd
>>> +@r_nfvm  nf:3 ... vm:1 ..... ..... ... ..... ....... %rs2 %rs1 %rd
>>> +@r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... %rs1 %rd
>>> +@r2_vm   ...... vm:1 ..... ..... ... ..... ....... %rs2 %rd
>>> +@r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
>>> +@r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
>>>
>>>  @sfence_vma ....... ..... .....   ... ..... ....... %rs2 %rs1
>>>  @sfence_vm  ....... ..... .....   ... ..... ....... %rs1
>>>
>>> -
>>>  # *** Privileged Instructions ***
>>>  ecall      000000000000     00000 000 00000 1110011
>>>  ebreak     000000000001     00000 000 00000 1110011
>>> @@ -203,3 +208,366 @@ fcvt_w_d   1100001  00000 ..... ... ..... 1010011
@r2_rm
>>>  fcvt_wu_d  1100001  00001 ..... ... ..... 1010011 @r2_rm
>>>  fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
>>>  fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
>>> +
>>> +# *** RV32V Standard Extension ***
>>> +
>>> +# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
>>> +vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>> +vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>> +vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>> +vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
>>> +vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
>>> +vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
>>> +vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
>>> +vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>>> +vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>>> +vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>>> +vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
>>> +vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
>>> +vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
>>> +vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
>>> +vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
>>> +vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
>>> +vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
>>> +vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
>>> +
>>> +vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
>>> +vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
>>> +vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
>>> +vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
>>> +vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
>>> +vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
>>> +vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
>>> +vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
>>> +vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
>>> +vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
>>> +vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
>>> +
>>> +vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
>>> +vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
>>> +vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
>>> +vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
>>> +vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
>>> +vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
>>> +vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
>>> +vsxb_v     ... 011 . ..... ..... 000 ..... 0100111 @r_nfvm
>>> +vsxh_v     ... 011 . ..... ..... 101 ..... 0100111 @r_nfvm
>>> +vsxw_v     ... 011 . ..... ..... 110 ..... 0100111 @r_nfvm
>>> +vsxe_v     ... 011 . ..... ..... 111 ..... 0100111 @r_nfvm
>>> +vsuxb_v    ... 111 . ..... ..... 000 ..... 0100111 @r_nfvm
>>> +vsuxh_v    ... 111 . ..... ..... 101 ..... 0100111 @r_nfvm
>>> +vsuxw_v    ... 111 . ..... ..... 110 ..... 0100111 @r_nfvm
>>> +vsuxe_v    ... 111 . ..... ..... 111 ..... 0100111 @r_nfvm
>>> +
>>> +#*** Vector AMO operations are encoded under the standard AMO major
opcode.***
>>> +vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
>>> +vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
>>> +
>>> +#*** new major opcode OP-V ***
>>> +vadd_vv         000000 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vadd_vx         000000 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vadd_vi         000000 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vredsum_vs      000000 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfadd_vv        000000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfadd_vf        000000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vredand_vs      000001 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfredsum_vs     000001 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vsub_vv         000010 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsub_vx         000010 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vredor_vs       000010 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfsub_vv        000010 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfsub_vf        000010 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vrsub_vx        000011 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vrsub_vi        000011 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vredxor_vs      000011 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfredosum_vs    000011 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vminu_vv        000100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vminu_vx        000100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vredminu_vs     000100 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfmin_vv        000100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmin_vf        000100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmin_vv         000101 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmin_vx         000101 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vredmin_vs      000101 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfredmin_vs     000101 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmaxu_vv        000110 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmaxu_vx        000110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vredmaxu_vs     000110 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfmax_vv        000110 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmax_vf        000110 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmax_vv         000111 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmax_vx         000111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vredmax_vs      000111 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vfredmax_vs     000111 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfsgnj_vv       001000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfsgnj_vf       001000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vand_vv         001001 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vand_vx         001001 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vand_vi         001001 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfsgnjn_vv      001001 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfsgnjn_vf      001001 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vor_vv          001010 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vor_vx          001010 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vor_vi          001010 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfsgnjx_vv      001010 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfsgnjx_vf      001010 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vxor_vv         001011 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vxor_vx         001011 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vxor_vi         001011 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vrgather_vv     001100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vrgather_vx     001100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vrgather_vi     001100 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vext_x_v        001100 1 ..... ..... 010 ..... 1010111 @r
>>> +vfmv_f_s        001100 1 ..... ..... 001 ..... 1010111 @r
>>> +vmv_s_x         001101 1 ..... ..... 110 ..... 1010111 @r
>>> +vfmv_s_f        001101 1 ..... ..... 101 ..... 1010111 @r
>>> +vslideup_vx     001110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vslideup_vi     001110 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vslide1up_vx    001110 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vslidedown_vx   001111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vslidedown_vi   001111 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vslide1down_vx  001111 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vadc_vvm        010000 1 ..... ..... 000 ..... 1010111 @r
>>> +vadc_vxm        010000 1 ..... ..... 100 ..... 1010111 @r
>>> +vadc_vim        010000 1 ..... ..... 011 ..... 1010111 @r
>>> +vmadc_vvm       010001 1 ..... ..... 000 ..... 1010111 @r
>>> +vmadc_vxm       010001 1 ..... ..... 100 ..... 1010111 @r
>>> +vmadc_vim       010001 1 ..... ..... 011 ..... 1010111 @r
>>> +vsbc_vvm        010010 1 ..... ..... 000 ..... 1010111 @r
>>> +vsbc_vxm        010010 1 ..... ..... 100 ..... 1010111 @r
>>> +vmsbc_vvm       010011 1 ..... ..... 000 ..... 1010111 @r
>>> +vmsbc_vxm       010011 1 ..... ..... 100 ..... 1010111 @r
>>> +vmpopc_m        010100 . ..... ----- 010 ..... 1010111 @r2_vm
>>> +vmfirst_m       010101 . ..... ----- 010 ..... 1010111 @r2_vm
>>> +vmsbf_m         010110 . ..... 00001 010 ..... 1010111 @r2_vm
>>> +vmsof_m         010110 . ..... 00010 010 ..... 1010111 @r2_vm
>>> +vmsif_m         010110 . ..... 00011 010 ..... 1010111 @r2_vm
>>> +viota_m         010110 . ..... 10000 010 ..... 1010111 @r2_vm
>>> +vid_v           010110 . 00000 10001 010 ..... 1010111 @r1_vm
>>> +vmerge_vvm      010111 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmerge_vxm      010111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmerge_vim      010111 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vcompress_vm    010111 - ..... ..... 010 ..... 1010111 @r
>>> +vfmerge_vfm     010111 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmseq_vv        011000 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmseq_vx        011000 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmseq_vi        011000 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmandnot_mm     011000 - ..... ..... 010 ..... 1010111 @r
>>> +vmfeq_vv        011000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmfeq_vf        011000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmsne_vv        011001 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmsne_vx        011001 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmsne_vi        011001 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmand_mm        011001 - ..... ..... 010 ..... 1010111 @r
>>> +vmfle_vv        011001 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmfle_vf        011001 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmsltu_vv       011010 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmsltu_vx       011010 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmor_mm         011010 - ..... ..... 010 ..... 1010111 @r
>>> +vmford_vv       011010 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmford_vf       011010 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmslt_vv        011011 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmslt_vx        011011 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmxor_mm        011011 - ..... ..... 010 ..... 1010111 @r
>>> +vmflt_vv        011011 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmflt_vf        011011 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmsleu_vv       011100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmsleu_vx       011100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmsleu_vi       011100 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmornot_mm      011100 - ..... ..... 010 ..... 1010111 @r
>>> +vmfne_vv        011100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vmfne_vf        011100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmsle_vv        011101 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vmsle_vx        011101 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmsle_vi        011101 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmnand_mm       011101 - ..... ..... 010 ..... 1010111 @r
>>> +vmfgt_vf        011101 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vmsgtu_vx       011110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmsgtu_vi       011110 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmnor_mm        011110 - ..... ..... 010 ..... 1010111 @r
>>> +vmsgt_vx        011111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmsgt_vi        011111 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmxnor_mm       011111 - ..... ..... 010 ..... 1010111 @r
>>> +vmfge_vf        011111 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsaddu_vv       100000 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsaddu_vx       100000 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vsaddu_vi       100000 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vdivu_vv        100000 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vdivu_vx        100000 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfdiv_vv        100000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfdiv_vf        100000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsadd_vv        100001 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsadd_vx        100001 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vsadd_vi        100001 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vdiv_vv         100001 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vdiv_vx         100001 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfrdiv_vf       100001 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vssubu_vv       100010 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vssubu_vx       100010 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vremu_vv        100010 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vremu_vx        100010 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfcvt_xu_f_v    100010 . ..... 00000 001 ..... 1010111 @r2_vm
>>> +vfcvt_x_f_v     100010 . ..... 00001 001 ..... 1010111 @r2_vm
>>> +vfcvt_f_xu_v    100010 . ..... 00010 001 ..... 1010111 @r2_vm
>>> +vfcvt_f_x_v     100010 . ..... 00011 001 ..... 1010111 @r2_vm
>>> +vfwcvt_xu_f_v   100010 . ..... 01000 001 ..... 1010111 @r2_vm
>>> +vfwcvt_x_f_v    100010 . ..... 01001 001 ..... 1010111 @r2_vm
>>> +vfwcvt_f_xu_v   100010 . ..... 01010 001 ..... 1010111 @r2_vm
>>> +vfwcvt_f_x_v    100010 . ..... 01011 001 ..... 1010111 @r2_vm
>>> +vfwcvt_f_f_v    100010 . ..... 01100 001 ..... 1010111 @r2_vm
>>> +vfncvt_xu_f_v   100010 . ..... 10000 001 ..... 1010111 @r2_vm
>>> +vfncvt_x_f_v    100010 . ..... 10001 001 ..... 1010111 @r2_vm
>>> +vfncvt_f_xu_v   100010 . ..... 10010 001 ..... 1010111 @r2_vm
>>> +vfncvt_f_x_v    100010 . ..... 10011 001 ..... 1010111 @r2_vm
>>> +vfncvt_f_f_v    100010 . ..... 10100 001 ..... 1010111 @r2_vm
>>> +vssub_vv        100011 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vssub_vx        100011 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vrem_vv         100011 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vrem_vx         100011 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfsqrt_v        100011 . ..... 00000 001 ..... 1010111 @r2_vm
>>> +vfclass_v       100011 . ..... 10000 001 ..... 1010111 @r2_vm
>>> +vaadd_vv        100100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vaadd_vx        100100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vaadd_vi        100100 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmulhu_vv       100100 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmulhu_vx       100100 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfmul_vv        100100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmul_vf        100100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsll_vv         100101 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsll_vx         100101 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vsll_vi         100101 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmul_vv         100101 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmul_vx         100101 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vasub_vv        100110 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vasub_vx        100110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmulhsu_vv      100110 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmulhsu_vx      100110 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vsmul_vv        100111 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsmul_vx        100111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vmulh_vv        100111 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmulh_vx        100111 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfrsub_vf       100111 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsrl_vv         101000 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsrl_vx         101000 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vsrl_vi         101000 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfmadd_vv       101000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmadd_vf       101000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsra_vv         101001 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vsra_vx         101001 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vsra_vi         101001 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmadd_vv        101001 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmadd_vx        101001 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfnmadd_vv      101001 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfnmadd_vf      101001 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vssrl_vv        101010 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vssrl_vx        101010 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vssrl_vi        101010 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfmsub_vv       101010 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmsub_vf       101010 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vssra_vv        101011 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vssra_vx        101011 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vssra_vi        101011 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vnmsub_vv       101011 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vnmsub_vx       101011 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfnmsub_vv      101011 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfnmsub_vf      101011 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vnsrl_vv        101100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vnsrl_vx        101100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vnsrl_vi        101100 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfmacc_vv       101100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmacc_vf       101100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vnsra_vv        101101 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vnsra_vx        101101 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vnsra_vi        101101 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vmacc_vv        101101 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vmacc_vx        101101 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfnmacc_vv      101101 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfnmacc_vf      101101 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vnclipu_vv      101110 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vnclipu_vx      101110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vnclipu_vi      101110 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vfmsac_vv       101110 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfmsac_vf       101110 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vnclip_vv       101111 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vnclip_vx       101111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vnclip_vi       101111 . ..... ..... 011 ..... 1010111 @r_vm
>>> +vnmsac_vv       101111 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vnmsac_vx       101111 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfnmsac_vv      101111 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfnmsac_vf      101111 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwredsumu_vs    110000 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vwaddu_vv       110000 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwaddu_vx       110000 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwadd_vv       110000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwadd_vf       110000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwredsum_vs     110001 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vwadd_vv        110001 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwadd_vx        110001 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwredsum_vs    110001 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vwsubu_vv       110010 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwsubu_vx       110010 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwsub_vv       110010 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwsub_vf       110010 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwsub_vv        110011 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwsub_vx        110011 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwredosum_vs   110011 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vwaddu_wv       110100 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwaddu_wx       110100 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwadd_wv       110100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwadd_wf       110100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwadd_wv        110101 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwadd_wx        110101 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vwsubu_wv       110110 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwsubu_wx       110110 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwsub_wv       110110 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwsub_wf       110110 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwsub_wv        110111 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwsub_wx        110111 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vwmulu_vv       111000 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmulu_vx       111000 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwmul_vv       111000 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwmul_vf       111000 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwmulsu_vv      111010 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmulsu_vx      111010 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vwmul_vv        111011 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmul_vx        111011 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vwsmaccu_vv     111100 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vwsmaccu_vx     111100 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vwmaccu_vv      111100 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmaccu_vx      111100 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwmacc_vv      111100 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwmacc_vf      111100 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwsmacc_vv      111101 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vwsmacc_vx      111101 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vwmacc_vv       111101 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmacc_vx       111101 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwnmacc_vv     111101 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwnmacc_vf     111101 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwsmaccsu_vv    111110 . ..... ..... 000 ..... 1010111 @r_vm
>>> +vwsmaccsu_vx    111110 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vwmaccsu_vv     111110 . ..... ..... 010 ..... 1010111 @r_vm
>>> +vwmaccsu_vx     111110 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwmsac_vv      111110 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwmsac_vf      111110 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vwsmaccus_vx    111111 . ..... ..... 100 ..... 1010111 @r_vm
>>> +vwmaccus_vx     111111 . ..... ..... 110 ..... 1010111 @r_vm
>>> +vfwnmsac_vv     111111 . ..... ..... 001 ..... 1010111 @r_vm
>>> +vfwnmsac_vf     111111 . ..... ..... 101 ..... 1010111 @r_vm
>>> +vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
>>> +vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
>>> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c
b/target/riscv/insn_trans/trans_rvv.inc.c
>>> new file mode 100644
>>> index 0000000..dc8e6ce
>>> --- /dev/null
>>> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
>>> @@ -0,0 +1,484 @@
>>> +/*
>>> + * RISC-V translation routines for the RVV Standard Extension.
>>> + *
>>> + * Copyright (c) 2011-2019 C-SKY Limited. All rights reserved.
>>> + *
>>> + * This program is free software; you can redistribute it and/or
modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2 or later, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but
WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
along with
>>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#define GEN_VECTOR_R2_NFVM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 nf  = tcg_const_i32(a->nf);               \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, nf, vm, s1, d);    \
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(nf);                             \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +#define GEN_VECTOR_R_NFVM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 s2 = tcg_const_i32(a->rs2);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 nf  = tcg_const_i32(a->nf);               \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, nf, vm, s1, s2, d);\
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(s2);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(nf);                             \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +
>>> +#define GEN_VECTOR_R_WDVM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 s2 = tcg_const_i32(a->rs2);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 wd  = tcg_const_i32(a->wd);               \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, wd, vm, s1, s2, d);\
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(s2);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(wd);                             \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +#define GEN_VECTOR_R(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 s2 = tcg_const_i32(a->rs2);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    gen_helper_vector_##INSN(cpu_env, s1, s2, d);    \
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(s2);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    return true;                                       \
>>> +}
>>> +#define GEN_VECTOR_R2_VM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s2 = tcg_const_i32(a->rs2);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, vm, s2, d);        \
>>> +    tcg_temp_free_i32(s2);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +
>>> +#define GEN_VECTOR_R1_VM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, vm, d);        \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +#define GEN_VECTOR_R_VM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 s2 = tcg_const_i32(a->rs2);               \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    TCGv_i32 vm = tcg_const_i32(a->vm);                \
>>> +    gen_helper_vector_##INSN(cpu_env, vm, s1, s2, d);    \
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(s2);                             \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    tcg_temp_free_i32(vm);                             \
>>> +    return true;                                       \
>>> +}
>>> +#define GEN_VECTOR_R2_ZIMM(INSN) \
>>> +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \
>>> +{                                                      \
>>> +    TCGv_i32 s1 = tcg_const_i32(a->rs1);               \
>>> +    TCGv_i32 zimm = tcg_const_i32(a->zimm);            \
>>> +    TCGv_i32 d  = tcg_const_i32(a->rd);                \
>>> +    gen_helper_vector_##INSN(cpu_env, s1, zimm, d);      \
>>> +    tcg_temp_free_i32(s1);                             \
>>> +    tcg_temp_free_i32(zimm);                           \
>>> +    tcg_temp_free_i32(d);                              \
>>> +    return true;                                       \
>>> +}
>>> +
>>> +GEN_VECTOR_R2_NFVM(vlb_v)
>>> +GEN_VECTOR_R2_NFVM(vlh_v)
>
> ...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 12:00   ` liuzhiwei
  2019-08-29 15:14     ` Richard Henderson
@ 2019-08-29 21:50     ` Alistair Francis
  2019-08-30  9:06       ` Alex Bennée
  2019-09-02  6:36       ` liuzhiwei
  1 sibling, 2 replies; 27+ messages in thread
From: Alistair Francis @ 2019-08-29 21:50 UTC (permalink / raw)
  To: liuzhiwei
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno

On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>
> On 2019/8/29 上午5:34, Alistair Francis wrote:
> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> >> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
> >> ---
> >>   fpu/softfloat.c                         |   119 +
> >>   include/fpu/softfloat.h                 |     4 +
> >>   linux-user/riscv/cpu_loop.c             |     8 +-
> >>   target/riscv/Makefile.objs              |     2 +-
> >>   target/riscv/cpu.h                      |    30 +
> >>   target/riscv/cpu_bits.h                 |    15 +
> >>   target/riscv/cpu_helper.c               |     7 +
> >>   target/riscv/csr.c                      |    65 +-
> >>   target/riscv/helper.h                   |   354 +
> >>   target/riscv/insn32.decode              |   374 +-
> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
> >>   target/riscv/translate.c                |     1 +
> >>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >>   create mode 100644 target/riscv/vector_helper.c
> >>
> > Hello,
> >
> > Thanks for the patch!
> >
> > As others have pointed out you will need to split the patch up into
> > multiple smaller patches, otherwise it is too hard to review almost
> > 30,000 lines of code.
>
> Hi, Alistair
>
> I'm so sorry for the inconvenience. It will be a patch set with a cover
> letter in V2.

No worries.

>
> > Can you also include a cover letter with your patch series describing
> > how you are testing this? AFAIK vector extension support isn't in any
> > compiler so I'm assuming you are handwriting the assembly or have
> > toolchain patches. Either way it will help if you can share that so
> > others can test your implementation.
>
> Yes, it's handwriting assembly. The assembler in Binutils has support
> Vector extension.  First define an function test_vadd_vv_8 in assembly
> and then it can be called from a C program.
>
> The function is something like
>
> /* vadd.vv */
> TEST_FUNC(test_vadd_vv_8)
>          vsetvli        t1, x0, e8, m2
>          vlb.v           v6, (a4)
>          vsb.v           v6, (a3)
>          vsetvli        t1, a0, e8, m2
>          vlb.v           v0, (a1)
>          vlb.v           v2, (a2)
>          vadd.vv     v4, v0, v2
>          vsb.v          v4, (a3)
> ret
>          .size   test_vadd_vv_8, .-test_vadd_vv_8

If possible it might be worth releasing the code that you are using for testing.

>
> It takes more time to test than to implement the instructions. Maybe
> there is some better test method or some forced test cases in QEMU.
> Could you give me some advice for testing?

Richard's idea of risu seems like a good option.

Thinking about it a bit more we are going to have other extensions in
the future that will need assembly testing so setting up a test
framework seems like a good idea. I am happy to help try and get this
going as well.

Alistair

>
> Best Regards,
>
> Zhiwei
>
> > Alex and Richard have kindly started the review. Once you have
> > addressed their comments and split this patch up into smaller patches
> > you can send a v2 and we can go from there.
> >
> > Once again thanks for doing this implementation for QEMU!
> >
> > Alistair
> >


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 21:50     ` Alistair Francis
@ 2019-08-30  9:06       ` Alex Bennée
  2019-08-30 18:39         ` Alistair Francis
  2019-09-02  6:36       ` liuzhiwei
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2019-08-30  9:06 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, liuzhiwei, Aurelien Jarno


Alistair Francis <alistair23@gmail.com> writes:

> On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>>
>> On 2019/8/29 上午5:34, Alistair Francis wrote:
>> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> >> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>> >> ---
>> >>   fpu/softfloat.c                         |   119 +
>> >>   include/fpu/softfloat.h                 |     4 +
>> >>   linux-user/riscv/cpu_loop.c             |     8 +-
>> >>   target/riscv/Makefile.objs              |     2 +-
>> >>   target/riscv/cpu.h                      |    30 +
>> >>   target/riscv/cpu_bits.h                 |    15 +
>> >>   target/riscv/cpu_helper.c               |     7 +
>> >>   target/riscv/csr.c                      |    65 +-
>> >>   target/riscv/helper.h                   |   354 +
>> >>   target/riscv/insn32.decode              |   374 +-
>> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>> >>   target/riscv/translate.c                |     1 +
>> >>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
>> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>> >>   create mode 100644 target/riscv/vector_helper.c
>> >>
>> > Hello,
>> >
>> > Thanks for the patch!
>> >
>> > As others have pointed out you will need to split the patch up into
>> > multiple smaller patches, otherwise it is too hard to review almost
>> > 30,000 lines of code.
>>
>> Hi, Alistair
>>
>> I'm so sorry for the inconvenience. It will be a patch set with a cover
>> letter in V2.
>
> No worries.
>
>>
>> > Can you also include a cover letter with your patch series describing
>> > how you are testing this? AFAIK vector extension support isn't in any
>> > compiler so I'm assuming you are handwriting the assembly or have
>> > toolchain patches. Either way it will help if you can share that so
>> > others can test your implementation.
>>
>> Yes, it's handwriting assembly. The assembler in Binutils has support
>> Vector extension.  First define an function test_vadd_vv_8 in assembly
>> and then it can be called from a C program.
>>
>> The function is something like
>>
>> /* vadd.vv */
>> TEST_FUNC(test_vadd_vv_8)
>>          vsetvli        t1, x0, e8, m2
>>          vlb.v           v6, (a4)
>>          vsb.v           v6, (a3)
>>          vsetvli        t1, a0, e8, m2
>>          vlb.v           v0, (a1)
>>          vlb.v           v2, (a2)
>>          vadd.vv     v4, v0, v2
>>          vsb.v          v4, (a3)
>> ret
>>          .size   test_vadd_vv_8, .-test_vadd_vv_8
>
> If possible it might be worth releasing the code that you are using for testing.
>
>>
>> It takes more time to test than to implement the instructions. Maybe
>> there is some better test method or some forced test cases in QEMU.
>> Could you give me some advice for testing?
>
> Richard's idea of risu seems like a good option.
>
> Thinking about it a bit more we are going to have other extensions in
> the future that will need assembly testing so setting up a test
> framework seems like a good idea. I am happy to help try and get this
> going as well.

tests/tcg already has the bits you need for both linux-user and system
based testing. The main problem is getting a version of gcc that is new
enough to emit the newer instructions. I recently updated the images to
buster so gcc is pretty recent now (8.3).

I did start down the road of a general "op" test frame work which tried
to come up with a common framework/boilerplate so all you needed to do
was supply a new function (possible with a hex encoded instruction) and
a list of expected inputs and outputs:

  https://github.com/stsquad/qemu/commits/testing/generic-op-tester

I suspect it was over engineered but perhaps it would be worth reviving
it (or something like it) to make adding a simple single instruction
test case with minimal additional verbiage?

>
> Alistair
>
>>
>> Best Regards,
>>
>> Zhiwei
>>
>> > Alex and Richard have kindly started the review. Once you have
>> > addressed their comments and split this patch up into smaller patches
>> > you can send a v2 and we can go from there.
>> >
>> > Once again thanks for doing this implementation for QEMU!
>> >
>> > Alistair
>> >


--
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-30  9:06       ` Alex Bennée
@ 2019-08-30 18:39         ` Alistair Francis
  0 siblings, 0 replies; 27+ messages in thread
From: Alistair Francis @ 2019-08-30 18:39 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, liuzhiwei, Aurelien Jarno

On Fri, Aug 30, 2019 at 2:06 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Alistair Francis <alistair23@gmail.com> writes:
>
> > On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
> >>
> >> On 2019/8/29 上午5:34, Alistair Francis wrote:
> >> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
> >> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> >> >> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
> >> >> ---
> >> >>   fpu/softfloat.c                         |   119 +
> >> >>   include/fpu/softfloat.h                 |     4 +
> >> >>   linux-user/riscv/cpu_loop.c             |     8 +-
> >> >>   target/riscv/Makefile.objs              |     2 +-
> >> >>   target/riscv/cpu.h                      |    30 +
> >> >>   target/riscv/cpu_bits.h                 |    15 +
> >> >>   target/riscv/cpu_helper.c               |     7 +
> >> >>   target/riscv/csr.c                      |    65 +-
> >> >>   target/riscv/helper.h                   |   354 +
> >> >>   target/riscv/insn32.decode              |   374 +-
> >> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
> >> >>   target/riscv/translate.c                |     1 +
> >> >>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
> >> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
> >> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >> >>   create mode 100644 target/riscv/vector_helper.c
> >> >>
> >> > Hello,
> >> >
> >> > Thanks for the patch!
> >> >
> >> > As others have pointed out you will need to split the patch up into
> >> > multiple smaller patches, otherwise it is too hard to review almost
> >> > 30,000 lines of code.
> >>
> >> Hi, Alistair
> >>
> >> I'm so sorry for the inconvenience. It will be a patch set with a cover
> >> letter in V2.
> >
> > No worries.
> >
> >>
> >> > Can you also include a cover letter with your patch series describing
> >> > how you are testing this? AFAIK vector extension support isn't in any
> >> > compiler so I'm assuming you are handwriting the assembly or have
> >> > toolchain patches. Either way it will help if you can share that so
> >> > others can test your implementation.
> >>
> >> Yes, it's handwriting assembly. The assembler in Binutils has support
> >> Vector extension.  First define an function test_vadd_vv_8 in assembly
> >> and then it can be called from a C program.
> >>
> >> The function is something like
> >>
> >> /* vadd.vv */
> >> TEST_FUNC(test_vadd_vv_8)
> >>          vsetvli        t1, x0, e8, m2
> >>          vlb.v           v6, (a4)
> >>          vsb.v           v6, (a3)
> >>          vsetvli        t1, a0, e8, m2
> >>          vlb.v           v0, (a1)
> >>          vlb.v           v2, (a2)
> >>          vadd.vv     v4, v0, v2
> >>          vsb.v          v4, (a3)
> >> ret
> >>          .size   test_vadd_vv_8, .-test_vadd_vv_8
> >
> > If possible it might be worth releasing the code that you are using for testing.
> >
> >>
> >> It takes more time to test than to implement the instructions. Maybe
> >> there is some better test method or some forced test cases in QEMU.
> >> Could you give me some advice for testing?
> >
> > Richard's idea of risu seems like a good option.
> >
> > Thinking about it a bit more we are going to have other extensions in
> > the future that will need assembly testing so setting up a test
> > framework seems like a good idea. I am happy to help try and get this
> > going as well.

Ah, I looked into this more and it compares it to hardware running the
same binary. In this case there is no hardware so that doesn't work
too well.

What we could do though, is compare it to Spike (which I think has the
vector instructions?) which would have the same effect.

>
> tests/tcg already has the bits you need for both linux-user and system
> based testing. The main problem is getting a version of gcc that is new
> enough to emit the newer instructions. I recently updated the images to
> buster so gcc is pretty recent now (8.3).

In this case there is no GCC with the new instructions.

>
> I did start down the road of a general "op" test frame work which tried
> to come up with a common framework/boilerplate so all you needed to do
> was supply a new function (possible with a hex encoded instruction) and
> a list of expected inputs and outputs:
>
>   https://github.com/stsquad/qemu/commits/testing/generic-op-tester
>
> I suspect it was over engineered but perhaps it would be worth reviving
> it (or something like it) to make adding a simple single instruction
> test case with minimal additional verbiage?

That would be interesting, I'll take a look.

Alistair

>
> >
> > Alistair
> >
> >>
> >> Best Regards,
> >>
> >> Zhiwei
> >>
> >> > Alex and Richard have kindly started the review. Once you have
> >> > addressed their comments and split this patch up into smaller patches
> >> > you can send a v2 and we can go from there.
> >> >
> >> > Once again thanks for doing this implementation for QEMU!
> >> >
> >> > Alistair
> >> >
>
>
> --
> Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 21:50     ` Alistair Francis
  2019-08-30  9:06       ` Alex Bennée
@ 2019-09-02  6:36       ` liuzhiwei
  1 sibling, 0 replies; 27+ messages in thread
From: liuzhiwei @ 2019-09-02  6:36 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno


On 2019/8/30 上午5:50, Alistair Francis wrote:
> On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>> On 2019/8/29 上午5:34, Alistair Francis wrote:
>>> On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei <zhiwei_liu@c-sky.com> wrote:
>>>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>>>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>>>> ---
>>>>    fpu/softfloat.c                         |   119 +
>>>>    include/fpu/softfloat.h                 |     4 +
>>>>    linux-user/riscv/cpu_loop.c             |     8 +-
>>>>    target/riscv/Makefile.objs              |     2 +-
>>>>    target/riscv/cpu.h                      |    30 +
>>>>    target/riscv/cpu_bits.h                 |    15 +
>>>>    target/riscv/cpu_helper.c               |     7 +
>>>>    target/riscv/csr.c                      |    65 +-
>>>>    target/riscv/helper.h                   |   354 +
>>>>    target/riscv/insn32.decode              |   374 +-
>>>>    target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>>>    target/riscv/translate.c                |     1 +
>>>>    target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>>>>    13 files changed, 28017 insertions(+), 9 deletions(-)
>>>>    create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>>>    create mode 100644 target/riscv/vector_helper.c
>>>>
>>> Hello,
>>>
>>> Thanks for the patch!
>>>
>>> As others have pointed out you will need to split the patch up into
>>> multiple smaller patches, otherwise it is too hard to review almost
>>> 30,000 lines of code.
>> Hi, Alistair
>>
>> I'm so sorry for the inconvenience. It will be a patch set with a cover
>> letter in V2.
> No worries.
>
>>> Can you also include a cover letter with your patch series describing
>>> how you are testing this? AFAIK vector extension support isn't in any
>>> compiler so I'm assuming you are handwriting the assembly or have
>>> toolchain patches. Either way it will help if you can share that so
>>> others can test your implementation.
>> Yes, it's handwriting assembly. The assembler in Binutils has support
>> Vector extension.  First define an function test_vadd_vv_8 in assembly
>> and then it can be called from a C program.
>>
>> The function is something like
>>
>> /* vadd.vv */
>> TEST_FUNC(test_vadd_vv_8)
>>           vsetvli        t1, x0, e8, m2
>>           vlb.v           v6, (a4)
>>           vsb.v           v6, (a3)
>>           vsetvli        t1, a0, e8, m2
>>           vlb.v           v0, (a1)
>>           vlb.v           v2, (a2)
>>           vadd.vv     v4, v0, v2
>>           vsb.v          v4, (a3)
>> ret
>>           .size   test_vadd_vv_8, .-test_vadd_vv_8
> If possible it might be worth releasing the code that you are using for testing.
Yes,  but I didn't find a good place to release these test codes currently.
>
>> It takes more time to test than to implement the instructions. Maybe
>> there is some better test method or some forced test cases in QEMU.
>> Could you give me some advice for testing?
> Richard's idea of risu seems like a good option.
All the test cases will be validated in Spike,  which has supported the 
same vector specification. But this  cross validation work may delay 
until V3.
I will split the patch, and address comments as soon as possible, to 
ensure the patch V2 can be sent next week.
Would it be all right?
>
> Thinking about it a bit more we are going to have other extensions in
> the future that will need assembly testing so setting up a test
> framework seems like a good idea. I am happy to help try and get this
> going as well.
>
> Alistair
There is usually a big difference between new a ISA extension and the 
others. I doubt there is an general framework. A very light framework  
includes
building, input aiding  generation, result validation, and report maybe 
OK .

Best Regards,
Zhiwei
>> Best Regards,
>>
>> Zhiwei
>>
>>> Alex and Richard have kindly started the review. Once you have
>>> addressed their comments and split this patch up into smaller patches
>>> you can send a v2 and we can go from there.
>>>
>>> Once again thanks for doing this implementation for QEMU!
>>>
>>> Alistair
>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 15:14     ` Richard Henderson
@ 2019-09-02  6:54       ` liuzhiwei
  0 siblings, 0 replies; 27+ messages in thread
From: liuzhiwei @ 2019-09-02  6:54 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis
  Cc: Peter Maydell, Riku Voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, Laurent Vivier,
	Alistair Francis, Alex Bennée, Aurelien Jarno


On 2019/8/29 下午11:14, Richard Henderson wrote:
> On 8/29/19 5:00 AM, liuzhiwei wrote:
>> Maybe there is some better test method or some forced test cases in QEMU. Could
>> you give me some advice for testing?
> If you have hardware, or another simulator, RISU is very good
> for testing these sorts of things.
>
> See https://git.linaro.org/people/pmaydell/risu.git
>
> You'll need to write new support for RISC-V, but it's not hard
> and we can help out with that.
>
>
> r~
>
Hi, Richard

Thank you for your advice.  I will run test cases in Spike for cross 
validation at first.

Best Regards,
Zhiwei




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-29 15:09       ` Richard Henderson
@ 2019-09-02  7:45         ` liuzhiwei
  2019-09-03 14:38           ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: liuzhiwei @ 2019-09-02  7:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien


On 2019/8/29 下午11:09, Richard Henderson wrote:
> On 8/29/19 5:45 AM, liuzhiwei wrote:
>> Even in qemu,  it may be some situations that VSTART != 0. For example, a load
>> instruction leads to a page fault exception in a middle position. If VSTART ==
>> 0,  some elements that had been loaded before the exception will be loaded once
>> again.
> Alternately, you can validate all of the pages before performing any memory
> operations.  At which point there will never be an exception in the middle.

As a vector instruction may access memory  across many pages,  is there 
any way to validate the pages? Page table walk ?Or some TLB APIs?

> As it turns out, you *must* do this in order to allow watchpoints to work
> correctly.  David Hildebrand and I are at this moment fixing this aspect of
> watchpoints for s390x.
>
> See https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05979.html

I am interested in wathpoint implementation and  once implemented the 
user mode watchpoints in the wild.

A backtrace of watchpoint is like

#0  cpu_watchpoint_address_matches (wp=0x555556228110, addr=536871072, 
len=1) at qemu/exec.c:1094
#1  0x000055555567204f in check_watchpoint (offset=160, len=1, 
attrs=..., flags=2) at qemu/exec.c:2803
#2  0x0000555555672379 in watch_mem_write (opaque=0x0, addr=536871072, 
val=165, size=1, attrs=...) at qemu/exec.c:2878
#3  0x00005555556d44bb in memory_region_write_with_attrs_accessor 
(mr=0x5555561292e0 <io_mem_watch>, addr=536871072, value=0x7fffedffe2c8, 
size=1, shift=0, mask=255, attrs=...)
     at qemu/memory.c:553
#4  0x00005555556d45de in access_with_adjusted_size (addr=536871072, 
value=0x7fffedffe2c8, size=1, access_size_min=1, access_size_max=8, 
access_fn=0x5555556d43cd <memory_region_write_with_attrs_accessor>,
     mr=0x5555561292e0 <io_mem_watch>, attrs=...) at qemu/memory.c:594
#5  0x00005555556d7247 in memory_region_dispatch_write 
(mr=0x5555561292e0 <io_mem_watch>, addr=536871072, data=165, size=1, 
attrs=...) at qemu/memory.c:1480
#6  0x00005555556f0d13 in io_writex (env=0x5555561efb58, 
iotlbentry=0x5555561f5398, mmu_idx=1, val=165, addr=536871072, 
retaddr=0, recheck=false, size=1) at qemu/accel/tcg/cputlb.c:909
#7  0x00005555556f19a6 in io_writeb (env=0x5555561efb58, mmu_idx=1, 
index=0, val=165 '\245', addr=536871072, retaddr=0, recheck=false) at 
qemu/accel/tcg/softmmu_template.h:268
#8  0x00005555556f1b54 in helper_ret_stb_mmu (env=0x5555561efb58, 
addr=536871072, val=165 '\245', oi=1, retaddr=0) at 
qemu/accel/tcg/softmmu_template.h:304
#9  0x0000555555769f06 in cpu_stb_data_ra (env=0x5555561efb58, 
ptr=536871072, v=165, retaddr=0) at 
qemu/include/exec/cpu_ldst_template.h:182
#10 0x0000555555769f80 in cpu_stb_data (env=0x5555561efb58, 
ptr=536871072, v=165) at /qemu/include/exec/cpu_ldst_template.h:194
#11 0x000055555576a913 in csky_cpu_stb_data (env=0x5555561efb58, 
vaddr=536871072, data=165 '\245') at qemu/target/csky/csky_ldst.c:48
#12 0x000055555580ba7d in helper_vdsp2_vstru_n (env=0x5555561efb58, 
insn=4167183360) at qemu/target/csky/op_vdsp2.c:1317

The path is not related to probe_write in the patch().

Could you give more details or a test case where watchpoint doesn't work 
correctly?

>
> r~
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-riscv] [PATCH] RISCV: support riscv vector extension 0.7.1
       [not found] ` <CAEiOBXXofjrY2=sjuMDb9dTV2fk9yUVKnr+qmf+7mg9vki6OCw@mail.gmail.com>
@ 2019-09-02  8:17   ` liuzhiwei
  0 siblings, 0 replies; 27+ messages in thread
From: liuzhiwei @ 2019-09-02  8:17 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: Peter Maydell, riku.voipio, open list:RISC-V, Sagar Karandikar,
	Bastian Koppelmann, Palmer Dabbelt,
	qemu-devel@nongnu.org Developers, laurent, Alistair Francis,
	Alex Bennée, aurelien


On 2019/8/29 下午10:06, Chih-Min Chao wrote:
> Hi Liuzhiwei,
>
> Some comments:
>      1. vector extension allows flexible implementation. It is better 
> to describe the limitation of current implementation (such as 
> vlen/elen/slen) , supported sections and unsupported features.
Thanks!  All mentioned will be in patch V2.
>      2. there should be cfg.ext_v  to turn on  vector extension from 
> command line
I will add the vector extension to cpu "any".  Is it all right?
>      3. from license
>            It should be   "Copyright  (c) 2019 C-SKY Limited, All 
> rights reserved."  but not  "2011 ~ 2019"
>
> It is huge work wait and thanks for your contribution.
>
> chihmin
>
> On Wed, Aug 28, 2019 at 3:06 PM liuzhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>     Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>     Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com
>     <mailto:zhiwei_liu@c-sky.com>>
>     ---
>      fpu/softfloat.c                         |   119 +
>      include/fpu/softfloat.h                 |     4 +
>      linux-user/riscv/cpu_loop.c             |     8 +-
>      target/riscv/Makefile.objs              |     2 +-
>      target/riscv/cpu.h                      |    30 +
>      target/riscv/cpu_bits.h                 |    15 +
>      target/riscv/cpu_helper.c               |     7 +
>      target/riscv/csr.c                      |    65 +-
>      target/riscv/helper.h                   |   354 +
>      target/riscv/insn32.decode              |   374 +-
>      target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>      target/riscv/translate.c                |     1 +
>      target/riscv/vector_helper.c            | 26563
>     ++++++++++++++++++++++++++++++
>      13 files changed, 28017 insertions(+), 9 deletions(-)
>      create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>      create mode 100644 target/riscv/vector_helper.c
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28 18:54 ` Richard Henderson
  2019-08-28 20:43   ` Richard Henderson
@ 2019-09-02  9:43   ` liuzhiwei
  2019-09-03 14:21     ` Richard Henderson
  2019-12-19  9:11   ` LIU Zhiwei
  2 siblings, 1 reply; 27+ messages in thread
From: liuzhiwei @ 2019-09-02  9:43 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien


On 2019/8/29 上午2:54, Richard Henderson wrote:
> On 8/27/19 7:36 PM, liuzhiwei wrote:
>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   fpu/softfloat.c                         |   119 +
>>   include/fpu/softfloat.h                 |     4 +
>>   linux-user/riscv/cpu_loop.c             |     8 +-
>>   target/riscv/Makefile.objs              |     2 +-
>>   target/riscv/cpu.h                      |    30 +
>>   target/riscv/cpu_bits.h                 |    15 +
>>   target/riscv/cpu_helper.c               |     7 +
>>   target/riscv/csr.c                      |    65 +-
>>   target/riscv/helper.h                   |   354 +
>>   target/riscv/insn32.decode              |   374 +-
>>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>   target/riscv/translate.c                |     1 +
>>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>>   13 files changed, 28017 insertions(+), 9 deletions(-)
> As Alex mentioned, this is *far* too big to be presented as a single patch.
OK, split it into patch set in V2
>
>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>> index 3ff3fa5..3b0754c 100644
>> --- a/include/fpu/softfloat.h
>> +++ b/include/fpu/softfloat.h
>> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status *status);
>>   float16 float16_sqrt(float16, float_status *status);
>>   int float16_compare(float16, float16, float_status *status);
>>   int float16_compare_quiet(float16, float16, float_status *status);
>> +int float16_unordered_quiet(float16, float16, float_status *status);
>> +int float16_le(float16, float16, float_status *status);
>> +int float16_lt(float16, float16, float_status *status);
>> +int float16_eq_quiet(float16, float16, float_status *status);
> As Alex mentioned, none of these changes are required, as all
> functionality is provided by float16_compare{,_quiet}.
Yes, use float16_compare instead.
>> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
>> index 12aa3c0..b01548a 100644
>> --- a/linux-user/riscv/cpu_loop.c
>> +++ b/linux-user/riscv/cpu_loop.c
>> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>>           signum = 0;
>>           sigcode = 0;
>>           sigaddr = 0;
>> -
>> +        if (env->foflag) {
>> +            if (env->vfp.vl != 0) {
>> +                env->foflag = false;
>> +                env->pc += 4;
>> +                continue;
>> +            }
> This is most definitely not the correct way to implement first-fault.
>
> You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
> where we test pages for validity with tlb_vaddr_to_host.
Why should  test pages for validity? If there is a page fault in running 
time, it just the case why it must use the fault-only-first instruction.
>> +    /* vector coprocessor state.  */
>> +    struct {
>> +        union VECTOR {
>> +            float64  f64[VUNIT(64)];
>> +            float32  f32[VUNIT(32)];
>> +            float16  f16[VUNIT(16)];
>> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
>> +            uint64_t u64[VUNIT(64)];
>> +            int64_t  s64[VUNIT(64)];
>> +            uint32_t u32[VUNIT(32)];
>> +            int32_t  s32[VUNIT(32)];
>> +            uint16_t u16[VUNIT(16)];
>> +            int16_t  s16[VUNIT(16)];
>> +            uint8_t  u8[VUNIT(8)];
>> +            int8_t   s8[VUNIT(8)];
>> +        } vreg[32];
>> +        target_ulong vxrm;
>> +        target_ulong vxsat;
>> +        target_ulong vl;
>> +        target_ulong vstart;
>> +        target_ulong vtype;
>> +        float_status fp_status;
>> +    } vfp;
> You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
> in the context of risc-v.
> I'm not sure that vreg[].element[] really makes the most sense in the context
> of how risc-v rearranges its elements.  It will almost certainly fail clang
> validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
> vreg[n+1].
>
> It might be best to have a single array:
>
>      union {
>          uint64_t u64[32 * VLEN / 64];
>          ...
>          uint8_t u8[32 * VLEN / 8];
>      } velt;
>
> This is clearer to the compiler that this is a single block of memory that we
> can index as we please.

A single array is a good idea. But vreg[] will be better for understanding as it preserve the register concepts.

> Note that float64/float32/float16 are legacy.  They will always be equivalent
> to the unsigned integer types of the same size.
>
> Is there really any vector operation at all that is dependent on XLEN?  If not,
> then there is no reason to confuse things by including target_ulong.
>
OK.
>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> index e32b612..405caf6 100644
>> --- a/target/riscv/cpu_helper.c
>> +++ b/target/riscv/cpu_helper.c
>> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>>           [PRV_H] = RISCV_EXCP_H_ECALL,
>>           [PRV_M] = RISCV_EXCP_M_ECALL
>>       };
>> +    if (env->foflag) {
>> +        if (env->vfp.vl != 0) {
>> +            env->foflag = false;
>> +            env->pc += 4;
>> +            return;
>> +        }
>> +    }
> Again, not the way to implement first-fault.
>
> In particular, you haven't even verified that do_interrupt has been called on
> behalf of a RISCV_EXCP_LOAD_PAGE_FAULT.  This could be a timer tick.

I don't think this could be a timer tick. A timer tick must not 
interrupt one instruction in qemu.

According to the specification, if there is a RISCV_EXCP_LOAD_PAGE_FAULT 
in the instruction,  and some elements had been loaded or stored,

the remaining elements will not be processed again after restore from 
the exception.

If there is a RISCV_EXCP_LOAD_PAGE_FAULT in the instruction,  and no 
elements had been loaded or stored, the remaining elements will be 
processed again after restore from the exception.

>
>> +#define MAX_U8      ((uint8_t)0xff)
>> +#define MIN_U8      ((uint8_t)0x0)
>> +#define MAX_S8      ((int8_t)0x7f)
>> +#define MIN_S8      ((int8_t)0x80)
>> +#define SIGNBIT16   (1 << 15)
>> +#define MAX_U16     ((uint16_t)0xffff)
>> +#define MIN_U16     ((uint16_t)0x0)
>> +#define MAX_S16     ((int16_t)0x7fff)
>> +#define MIN_S16     ((int16_t)0x8000)
>> +#define SIGNBIT32   (1 << 31)
>> +#define MAX_U32     ((uint32_t)0xffffffff)
>> +#define MIN_U32     ((uint32_t)0x0)
>> +#define MAX_S32     ((int32_t)0x7fffffff)
>> +#define MIN_S32     ((int32_t)0x80000000)
>> +#define SIGNBIT64   ((uint64_t)1 << 63)
>> +#define MAX_U64     ((uint64_t)0xffffffffffffffff)
>> +#define MIN_U64     ((uint64_t)0x0)
>> +#define MAX_S64     ((int64_t)0x7fffffffffffffff)
>> +#define MIN_S64     ((int64_t)0x8000000000000000)
> Why are you replicating INT8_MIN et al?
Thanks, it will be removed.
>
>
>> +static target_ulong vector_get_index(CPURISCVState *env, int rs1, int rs2,
>> +    int index, int mem, int width, int nf)
>> +{
>> +    target_ulong abs_off, base = env->gpr[rs1];
>> +    target_long offset;
>> +    switch (width) {
>> +    case 8:
>> +        offset = sign_extend(env->vfp.vreg[rs2].s8[index], 8) + nf * mem;
>> +        break;
>> +    case 16:
>> +        offset = sign_extend(env->vfp.vreg[rs2].s16[index], 16) + nf * mem;
>> +        break;
>> +    case 32:
>> +        offset = sign_extend(env->vfp.vreg[rs2].s32[index], 32) + nf * mem;
>> +        break;
>> +    case 64:
>> +        offset = env->vfp.vreg[rs2].s64[index] + nf * mem;
>> +        break;
>> +    default:
>> +        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
> This is broken.  You cannot use GETPC() anywhere except in the outermost
> HELPER().  Otherwise you're not computing the return address back into the
> code_gen_buffer, which is what is required to properly unwind the guest state.

Yes, I will fix it.

>
>> +static inline bool vector_vtype_ill(CPURISCVState *env)
>> +{
>> +    if ((env->vfp.vtype >> (sizeof(target_ulong) - 1)) & 0x1) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +static inline void vector_vtype_set_ill(CPURISCVState *env)
>> +{
>> +    env->vfp.vtype = ((target_ulong)1) << (sizeof(target_ulong) - 1);
>> +    return;
>> +}
>> +
>> +static inline int vector_vtype_get_sew(CPURISCVState *env)
>> +{
>> +    return (env->vfp.vtype >> 2) & 0x7;
>> +}
>> +
>> +static inline int vector_get_width(CPURISCVState *env)
>> +{
>> +    return  8 * (1 << vector_vtype_get_sew(env));
>> +}
>> +
>> +static inline int vector_get_lmul(CPURISCVState *env)
>> +{
>> +    return 1 << (env->vfp.vtype & 0x3);
>> +}
>> +
>> +static inline int vector_get_vlmax(CPURISCVState *env)
>> +{
>> +    return vector_get_lmul(env) * VLEN / vector_get_width(env);
>> +}
>> +
>> +static inline int vector_elem_mask(CPURISCVState *env, uint32_t vm, int width,
>> +    int lmul, int index)
>> +{
>> +    int mlen = width / lmul;
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return vm || ((env->vfp.vreg[0].u8[idx] >> pos) & 0x1);
>> +}
> I would strongly encourage you place the components of vtype within tb_flags
> via cpu_get_tb_cpu_state.  This would allow you to move quite a few checks from
> run-time to translation-time.
Good idea and some difficult.
> Recall that translation happens once (per configuration), whereas execution
> happens many times.  Obviously, the more configurations that we create, the
> more translation that must happen.
>
> But the vtypei argument to vsetvli is a good choice, because it is constant,
> relates directly to the compiled code, and is unrelated to the length of the
> data being processed.
Good choice for what? I am not quite understand.
> With that, you can verify at translation:
>
> (1) vill
> (2) v[n], for (n % lmul) != 0
> (3) v[n] overlapping v[0] for masked/carry operations, with lmul > 1
>
> and
>
> (4) you can arrange the helpers so that instead of 1 helper that has to
>      handle all SEW, you have N helpers, each handling a different SEW.
For all vector instructions or just vsetvli?
> And with all of this done, I believe you no longer need to pass the register
> number to the helper.  You can pass the address of v[n], which is much more
> like how the tcg generic vector support works.
>
> Whether or not to include VL in tb_flags is a harder choice.  Certainly not the
> exact value of VL, as that would lead to different translations for every loop
> tail.  But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
> single bit.  Knowing that this condition is true would allow some use of the
> tcg generic vector support.
>
> E.g. vadd.vv could be
>
>      if (masked) {
>          switch (SEW) {
>          case MO_8:
>              gen_helper_vadd8_mask(...);
>              break;
>          ...
>          }
>      } else if (vl_eq_vlmax) {
>          tcg_gen_gvec_add(SEW, vreg_ofs(vd), vreg_ofs(vs2), vreg_ofs(vs1),
>                           VLEN * LMUL, VLEN * LMUL);
>      } else {
>          switch (SEW) {
>          case MO_8:
>              gen_helper_vadd8(...);
>              break;
>          ...
>          }
>      }
>
> Or, equivalently, pack pointers to the actual generator functions into a
> structure so that this code structure can be shared between many instructions.
>
> Bear in mind that all tcg gvec operations operate strictly upon lanes.  I.e.
>
>     vd[x] = vs1[x] op vs2[x]
>
> thus the actual arrangement of the elements in storage is irrelevant and SLEN
> need not be considered here.

Thank you very much.  Although it is some difficult for me to address 
your comments, they are very helpful.

Best Regards,

Zhiwei

>
>
> r~
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-09-02  9:43   ` liuzhiwei
@ 2019-09-03 14:21     ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2019-09-03 14:21 UTC (permalink / raw)
  To: liuzhiwei, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien

On 9/2/19 2:43 AM, liuzhiwei wrote:
>> This is most definitely not the correct way to implement first-fault.
>>
>> You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
>> where we test pages for validity with tlb_vaddr_to_host.
> Why should  test pages for validity? If there is a page fault in running time,
> it just the case why it must use the fault-only-first instruction.

So that the helper does not fault for the Nth access, N > 1.

You test to see if the page has a mapping, and if it doesn't,
you end the instruction, without going through the exception
path that I have objections to.

Except for gather loads, you don't have to test for every
access, only at page boundaries.  And then you may also arrange
to use direct host access to the pages that you've validated.

Again, have a look at sve_ldff1_r.

> A single array is a good idea. But vreg[] will be better for understanding as it preserve the register concepts. 

A function access to the registers would be just as good for that.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-09-02  7:45         ` liuzhiwei
@ 2019-09-03 14:38           ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2019-09-03 14:38 UTC (permalink / raw)
  To: liuzhiwei, qemu-devel, qemu-riscv
  Cc: peter.maydell, palmer, sagark, kbastian, riku.voipio, laurent,
	Alistair.Francis, alex.bennee, aurelien

On 9/2/19 12:45 AM, liuzhiwei wrote:
> 
> On 2019/8/29 下午11:09, Richard Henderson wrote:
>> On 8/29/19 5:45 AM, liuzhiwei wrote:
>>> Even in qemu,  it may be some situations that VSTART != 0. For example, a load
>>> instruction leads to a page fault exception in a middle position. If VSTART ==
>>> 0,  some elements that had been loaded before the exception will be loaded once
>>> again.
>> Alternately, you can validate all of the pages before performing any memory
>> operations.  At which point there will never be an exception in the middle.
> 
> As a vector instruction may access memory  across many pages,  is there any way
> to validate the pages? Page table walk ?Or some TLB APIs?

Yes, there are TLB APIs.  Several of them, depending on what is needed.

> #0  cpu_watchpoint_address_matches (wp=0x555556228110, addr=536871072, len=1)
> at qemu/exec.c:1094
> #1  0x000055555567204f in check_watchpoint (offset=160, len=1, attrs=...,
> flags=2) at qemu/exec.c:2803
> #2  0x0000555555672379 in watch_mem_write (opaque=0x0, addr=536871072, val=165,
> size=1, attrs=...) at qemu/exec.c:2878
> #3  0x00005555556d44bb in memory_region_write_with_attrs_accessor
> (mr=0x5555561292e0 <io_mem_watch>, addr=536871072, value=0x7fffedffe2c8,
> size=1, shift=0, mask=255, attrs=...)
>     at qemu/memory.c:553
> #4  0x00005555556d45de in access_with_adjusted_size (addr=536871072,
> value=0x7fffedffe2c8, size=1, access_size_min=1, access_size_max=8,
> access_fn=0x5555556d43cd <memory_region_write_with_attrs_accessor>,
>     mr=0x5555561292e0 <io_mem_watch>, attrs=...) at qemu/memory.c:594
> #5  0x00005555556d7247 in memory_region_dispatch_write (mr=0x5555561292e0
> <io_mem_watch>, addr=536871072, data=165, size=1, attrs=...) at qemu/memory.c:1480
> #6  0x00005555556f0d13 in io_writex (env=0x5555561efb58,
> iotlbentry=0x5555561f5398, mmu_idx=1, val=165, addr=536871072, retaddr=0,
> recheck=false, size=1) at qemu/accel/tcg/cputlb.c:909
> #7  0x00005555556f19a6 in io_writeb (env=0x5555561efb58, mmu_idx=1, index=0,
> val=165 '\245', addr=536871072, retaddr=0, recheck=false) at
> qemu/accel/tcg/softmmu_template.h:268
> #8  0x00005555556f1b54 in helper_ret_stb_mmu (env=0x5555561efb58,
> addr=536871072, val=165 '\245', oi=1, retaddr=0) at
> qemu/accel/tcg/softmmu_template.h:304
> #9  0x0000555555769f06 in cpu_stb_data_ra (env=0x5555561efb58, ptr=536871072,
> v=165, retaddr=0) at qemu/include/exec/cpu_ldst_template.h:182
> #10 0x0000555555769f80 in cpu_stb_data (env=0x5555561efb58, ptr=536871072,
> v=165) at /qemu/include/exec/cpu_ldst_template.h:194
> #11 0x000055555576a913 in csky_cpu_stb_data (env=0x5555561efb58,
> vaddr=536871072, data=165 '\245') at qemu/target/csky/csky_ldst.c:48
> #12 0x000055555580ba7d in helper_vdsp2_vstru_n (env=0x5555561efb58,
> insn=4167183360) at qemu/target/csky/op_vdsp2.c:1317
> 
> The path is not related to probe_write in the patch().

Of course.  It wasn't supposed to be.

> Could you give more details or a test case where watchpoint doesn't work
> correctly?

If the store partially, but not completely, overlaps the watchpoint.  This is
obviously much easier to do with large vector operations than with normal
integer operations.

In this case, we may have completed some of the stores before encountering the
watchpoint.  Which, inside check_watchpoint(), will longjmp back to the cpu
main loop.  Now we have a problem: the store is partially complete and it
should not be.

Therefore, we now have patches queued in tcg-next that adjust probe_write to
perform both access and watchpoint tests.  There is still target-specific code
that must be adjusted to match, so there are not currently any examples in the
tree to show.

However, the idea is:
  (1) Instructions that perform more than one host store must probe
      the entire range to be stored before performing any stores.

  (2) Instructions that perform more than one host load must either
      probe the entire range to be loaded, or collect the data in
      temporary storage.  If not using probes, writeback to the
      register file must be delayed until after all loads are done.

  (3) Any one probe may not cross a page boundary; splitting of the
      access across pages must be done by the helper.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-08-28 18:54 ` Richard Henderson
  2019-08-28 20:43   ` Richard Henderson
  2019-09-02  9:43   ` liuzhiwei
@ 2019-12-19  9:11   ` LIU Zhiwei
  2019-12-19 20:38     ` Richard Henderson
  2 siblings, 1 reply; 27+ messages in thread
From: LIU Zhiwei @ 2019-12-19  9:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Chih-Min Chao, palmer, Alistair.Francis, qemu-devel

Hi Richard,

Sorry to reply so late.

Upstream is really difficult . I was really frustrated to recieve so 
many difficult comments.

It is hard for me to absorb them and will take a lot of time to fixup. 
Now I will move on.

On 2019/8/29 2:54, Richard Henderson wrote:
> On 8/27/19 7:36 PM, liuzhiwei wrote:
>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> Signed-off-by: liuzhiwei <zhiwei_liu@c-sky.com>
>> ---
>>   fpu/softfloat.c                         |   119 +
>>   include/fpu/softfloat.h                 |     4 +
>>   linux-user/riscv/cpu_loop.c             |     8 +-
>>   target/riscv/Makefile.objs              |     2 +-
>>   target/riscv/cpu.h                      |    30 +
>>   target/riscv/cpu_bits.h                 |    15 +
>>   target/riscv/cpu_helper.c               |     7 +
>>   target/riscv/csr.c                      |    65 +-
>>   target/riscv/helper.h                   |   354 +
>>   target/riscv/insn32.decode              |   374 +-
>>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>   target/riscv/translate.c                |     1 +
>>   target/riscv/vector_helper.c            | 26563 ++++++++++++++++++++++++++++++
>>   13 files changed, 28017 insertions(+), 9 deletions(-)
>> +    /* vector coprocessor state.  */
>> +    struct {
>> +        union VECTOR {
>> +            float64  f64[VUNIT(64)];
>> +            float32  f32[VUNIT(32)];
>> +            float16  f16[VUNIT(16)];
>> +            target_ulong ul[VUNIT(sizeof(target_ulong))];
>> +            uint64_t u64[VUNIT(64)];
>> +            int64_t  s64[VUNIT(64)];
>> +            uint32_t u32[VUNIT(32)];
>> +            int32_t  s32[VUNIT(32)];
>> +            uint16_t u16[VUNIT(16)];
>> +            int16_t  s16[VUNIT(16)];
>> +            uint8_t  u8[VUNIT(8)];
>> +            int8_t   s8[VUNIT(8)];
>> +        } vreg[32];
>> +        target_ulong vxrm;
>> +        target_ulong vxsat;
>> +        target_ulong vl;
>> +        target_ulong vstart;
>> +        target_ulong vtype;
>> +        float_status fp_status;
>> +    } vfp;
> You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
> in the context of risc-v.
>
> I'm not sure that vreg[].element[] really makes the most sense in the context
> of how risc-v rearranges its elements.

Use vreg[].element[] is my gut feeling.  It will be easiest to 
understand the code.

As you said, view all vector registers as a single block of memory is 
good for programing.

> It will almost certainly fail clang
> validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
> vreg[n+1].

I'm sorry that it's really hard to absorb your opinion. I don't know why 
clang will fail

when index beyond the end of vreg[n] into vreg[n+1].

> It might be best to have a single array:
>
>      union {
>          uint64_t u64[32 * VLEN / 64];
>          ...
>          uint8_t u8[32 * VLEN / 8];
>      } velt;
>
> This is clearer to the compiler that this is a single block of memory that we
> can index as we please.

As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be 
a property which can be

specified from command line.  So the sub-struct maybe defined as

struct {
     union{
         uint64_t *u64 ;
         int64_t  *s64;
         uint32_t *u32;
         int32_t  *s32;
         uint16_t *u16;
         int16_t  *s16;
         uint8_t  *u8;
         int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext;

Will that be OK?

>> +static inline bool vector_vtype_ill(CPURISCVState *env)
>> +{
>> +    if ((env->vfp.vtype >> (sizeof(target_ulong) - 1)) & 0x1) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +static inline void vector_vtype_set_ill(CPURISCVState *env)
>> +{
>> +    env->vfp.vtype = ((target_ulong)1) << (sizeof(target_ulong) - 1);
>> +    return;
>> +}
>> +
>> +static inline int vector_vtype_get_sew(CPURISCVState *env)
>> +{
>> +    return (env->vfp.vtype >> 2) & 0x7;
>> +}
>> +
>> +static inline int vector_get_width(CPURISCVState *env)
>> +{
>> +    return  8 * (1 << vector_vtype_get_sew(env));
>> +}
>> +
>> +static inline int vector_get_lmul(CPURISCVState *env)
>> +{
>> +    return 1 << (env->vfp.vtype & 0x3);
>> +}
>> +
>> +static inline int vector_get_vlmax(CPURISCVState *env)
>> +{
>> +    return vector_get_lmul(env) * VLEN / vector_get_width(env);
>> +}
>> +
>> +static inline int vector_elem_mask(CPURISCVState *env, uint32_t vm, int width,
>> +    int lmul, int index)
>> +{
>> +    int mlen = width / lmul;
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return vm || ((env->vfp.vreg[0].u8[idx] >> pos) & 0x1);
>> +}
> I would strongly encourage you place the components of vtype within tb_flags
> via cpu_get_tb_cpu_state.  This would allow you to move quite a few checks from
> run-time to translation-time.
>
> Recall that translation happens once (per configuration), whereas execution
> happens many times.  Obviously, the more configurations that we create, the
> more translation that must happen.
All check code will be moved from execution time to translation.
> But the vtypei argument to vsetvli is a good choice, because it is constant,
> relates directly to the compiled code, and is unrelated to the length of the
> data being processed.
>
> With that, you can verify at translation:
>
> (1) vill
> (2) v[n], for (n % lmul) != 0
> (3) v[n] overlapping v[0] for masked/carry operations, with lmul > 1
>
> and
>
> (4) you can arrange the helpers so that instead of 1 helper that has to
>      handle all SEW, you have N helpers, each handling a different SEW.
>
> And with all of this done, I believe you no longer need to pass the register
> number to the helper.  You can pass the address of v[n], which is much more
> like how the tcg generic vector support works.
>
> Whether or not to include VL in tb_flags is a harder choice.  Certainly not the
> exact value of VL, as that would lead to different translations for every loop
> tail.  But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
> single bit.  Knowing that this condition is true would allow some use of the
> tcg generic vector support.

The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the 
bit of (VSTART == 0 && VL == VLMAX).

So it will take 8 bits of tb flags for vector extension at least.

> E.g. vadd.vv could be
>
>      if (masked) {
>          switch (SEW) {
>          case MO_8:
>              gen_helper_vadd8_mask(...);
>              break;
>          ...
>          }
>      } else if (vl_eq_vlmax) {
>          tcg_gen_gvec_add(SEW, vreg_ofs(vd), vreg_ofs(vs2), vreg_ofs(vs1),
>                           VLEN * LMUL, VLEN * LMUL);
>      } else {
>          switch (SEW) {
>          case MO_8:
>              gen_helper_vadd8(...);
>              break;
>          ...
>          }
>      }
>
> Or, equivalently, pack pointers to the actual generator functions into a
> structure so that this code structure can be shared between many instructions.

It's quiker to use generic vector of TCG.

However, I have one problem to support both command line VLEN and vreg_ofs.

As in SVE,  vreg ofs is the offset from cpu_env. If the structure of 
vector extension (to support command line VLEN) is

struct {
     union{
         uint64_t *u64 ;
         int64_t  *s64;
         uint32_t *u32;
         int32_t  *s32;
         uint16_t *u16;
         int16_t  *s16;
         uint8_t  *u8;
         int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext

I can't find the way to get the direct offset of vreg from cpu_env.

Maybe I should specify a max VLEN like the way of SVE?

Best Regards,

LIU Zhiwei

> Bear in mind that all tcg gvec operations operate strictly upon lanes.  I.e.
>
>     vd[x] = vs1[x] op vs2[x]
>
> thus the actual arrangement of the elements in storage is irrelevant and SLEN
> need not be considered here.
>
>
> r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-12-19  9:11   ` LIU Zhiwei
@ 2019-12-19 20:38     ` Richard Henderson
  2019-12-25  9:36       ` LIU Zhiwei
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2019-12-19 20:38 UTC (permalink / raw)
  To: LIU Zhiwei; +Cc: Chih-Min Chao, palmer, Alistair.Francis, qemu-devel

On 12/18/19 11:11 PM, LIU Zhiwei wrote:
> I'm sorry that it's really hard to absorb your opinion. I don't know why clang
> will fail
> 
> when index beyond the end of vreg[n] into vreg[n+1].

I thought sure one of the address sanitizer checks would detect array bounds
overrun.  But it becomes irrelevant

> As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
> property which can be
> 
> specified from command line.  So the sub-struct maybe defined as
> 
> struct {
>     union{
>         uint64_t *u64 ;
>         int64_t  *s64;
>         uint32_t *u32;
>         int32_t  *s32;
>         uint16_t *u16;
>         int16_t  *s16;
>         uint8_t  *u8;
>         int8_t   *s8;
>     } mem;
>     target_ulong vxrm;
>     target_ulong vxsat;
>     target_ulong vl;
>     target_ulong vstart;
>     target_ulong vtype;
> } vext;
> 
> Will that be OK?

Pointers have consequences.  It can be done, but I don't think it is ideal.

> The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit of
> (VSTART == 0 && VL == VLMAX).
> 
> So it will take 8 bits of tb flags for vector extension at least.

Good.

> However, I have one problem to support both command line VLEN and vreg_ofs.
> 
> As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
> extension (to support command line VLEN) is
> 
> struct {
>     union{
>         uint64_t *u64 ;
>         int64_t  *s64;
>         uint32_t *u32;
>         int32_t  *s32;
>         uint16_t *u16;
>         int16_t  *s16;
>         uint8_t  *u8;
>         int8_t   *s8;
>     } mem;
>     target_ulong vxrm;
>     target_ulong vxsat;
>     target_ulong vl;
>     target_ulong vstart;
>     target_ulong vtype;
> } vext
> 
> I can't find the way to get the direct offset of vreg from cpu_env.
> 
> Maybe I should specify a max VLEN like the way of SVE?

I think a maximum vlen is best.  A command-line option to adjust vlen is all
well and good, but there's no reason to have to support vlen=(1<<29).

Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
be aligned mod 16.

I think that all you need is

    uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);

which gives us

uint32_t vreg_ofs(DisasContext *ctx, int reg)
{
    return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
}

I don't see the point of a union for vreg.  I don't think you'll find that you
actually use it at all.

You do need to document the element ordering that you're going to use for vreg.
 I.e. the mapping between the architectural vector register state and the
emulation state.  You have two choices:

(1) all bytes in host endianness (e.g. target/ppc)
(2) bytes within each uint64_t in host endianness,
    but each uint64_t is little-endian (e.g. target/arm).

Both require some fixup when running on a big-endian host.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-12-19 20:38     ` Richard Henderson
@ 2019-12-25  9:36       ` LIU Zhiwei
  2019-12-28  1:14         ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: LIU Zhiwei @ 2019-12-25  9:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: palmer, Alistair.Francis, qemu-devel, Chih-Min Chao

[-- Attachment #1: Type: text/plain, Size: 5164 bytes --]


On 2019/12/20 4:38, Richard Henderson wrote:
> On 12/18/19 11:11 PM, LIU Zhiwei wrote:
>> I'm sorry that it's really hard to absorb your opinion. I don't know why clang
>> will fail
>>
>> when index beyond the end of vreg[n] into vreg[n+1].
> I thought sure one of the address sanitizer checks would detect array bounds
> overrun.  But it becomes irrelevant
>
>> As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
>> property which can be
>>
>> specified from command line.  So the sub-struct maybe defined as
>>
>> struct {
>>      union{
>>          uint64_t *u64 ;
>>          int64_t  *s64;
>>          uint32_t *u32;
>>          int32_t  *s32;
>>          uint16_t *u16;
>>          int16_t  *s16;
>>          uint8_t  *u8;
>>          int8_t   *s8;
>>      } mem;
>>      target_ulong vxrm;
>>      target_ulong vxsat;
>>      target_ulong vl;
>>      target_ulong vstart;
>>      target_ulong vtype;
>> } vext;
>>
>> Will that be OK?
> Pointers have consequences.  It can be done, but I don't think it is ideal.
>
>> The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit of
>> (VSTART == 0 && VL == VLMAX).
>>
>> So it will take 8 bits of tb flags for vector extension at least.
> Good.
>> However, I have one problem to support both command line VLEN and vreg_ofs.
>>
>> As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
>> extension (to support command line VLEN) is
>>
>> struct {
>>      union{
>>          uint64_t *u64 ;
>>          int64_t  *s64;
>>          uint32_t *u32;
>>          int32_t  *s32;
>>          uint16_t *u16;
>>          int16_t  *s16;
>>          uint8_t  *u8;
>>          int8_t   *s8;
>>      } mem;
>>      target_ulong vxrm;
>>      target_ulong vxsat;
>>      target_ulong vl;
>>      target_ulong vstart;
>>      target_ulong vtype;
>> } vext
>>
>> I can't find the way to get the direct offset of vreg from cpu_env.
>>
>> Maybe I should specify a max VLEN like the way of SVE?
> I think a maximum vlen is best.  A command-line option to adjust vlen is all
> well and good, but there's no reason to have to support vlen=(1<<29).
>
> Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
> will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
> be aligned mod 16.
>
> I think that all you need is
>
>      uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);
>
> which gives us
>
> uint32_t vreg_ofs(DisasContext *ctx, int reg)
> {
>      return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
> }

struct {

         uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
         target_ulong vxrm;
         target_ulong vxsat;
         target_ulong vl;
         target_ulong vstart;
         target_ulong vtype;
     } vext;

Is it OK?

> I don't see the point of a union for vreg.  I don't think you'll find that you
> actually use it at all.

I think I can move most of execution check to translate time like SVE 
now. However, there are still some differences from SVE.

1)cpu_env must be used as a parameter for helper function.

     The helpers need  use env->vext.vl and env->vext.vstart.  Thus it 
will be difficult to use out of line tcg_gen_gvec_ool.

     void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,

                         uint32_t oprsz, uint32_t maxsz, int32_t data,
                         gen_helper_gvec_2 *fn)
     {
         ......
         fn(a0, a1, desc);
          ......
      }
     Maybe I have to write  something similar to tcg_gen_gvec_ool in 
trans_rvv.inc.c.  But it will be redundant.

2)simd_desc is not proper.

     I also need to transfer some members of DisasContext to helpers.

     (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of 
elements of this operation, so it will defined as ctx->lmul * ctx->vlen 
/ ctx->sew;

Data is reserved to expand.  Mlen is mask length for one elment, so it 
will defined as ctx->sew/ctx->lmul. As with Mlen, a active element will

be selected by

    static inline int vext_elem_mask(void *v0, int mlen, int index)
    {
         int idx = (index * mlen) / 8;
         int pos = (index * mlen) % 8;

         return (v0[idx] >> pos) & 0x1;
    }

     So I may have to implement vext_desc instead of use the simd_desc, 
which will be another redundant. Maybe a better way to mask elements?

> You do need to document the element ordering that you're going to use for vreg.
>   I.e. the mapping between the architectural vector register state and the
> emulation state.  You have two choices:
>
> (1) all bytes in host endianness (e.g. target/ppc)
> (2) bytes within each uint64_t in host endianness,
>      but each uint64_t is little-endian (e.g. target/arm).
>
> Both require some fixup when running on a big-endian host.

Yes, I will take (2).


Best Regards,

Zhiwei

>
> r~

[-- Attachment #2: Type: text/html, Size: 7573 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-12-25  9:36       ` LIU Zhiwei
@ 2019-12-28  1:14         ` Richard Henderson
  2019-12-30  8:11           ` LIU Zhiwei
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2019-12-28  1:14 UTC (permalink / raw)
  To: LIU Zhiwei; +Cc: palmer, Alistair.Francis, qemu-devel, Chih-Min Chao

On 12/25/19 8:36 PM, LIU Zhiwei wrote:
> struct {
> 
>         uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
>         target_ulong vxrm;
>         target_ulong vxsat;
>         target_ulong vl;
>         target_ulong vstart;
>         target_ulong vtype;
>     } vext;
> 
> Is it OK?
I don't think there's a good reason for the vext structure -- I would drop
that.  Otherwise it looks good.

> However, there are still some differences from SVE.
> 
> 1)cpu_env must be used as a parameter for helper function.
> 
>     The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be
> difficult to use out of line tcg_gen_gvec_ool.

Sure.  That's also true of any of the fp operations, which will want to
accumulate ieee exceptions.

See tcg_gen_gvec_*_ptr(), which allows you to pass in cpu_env.

> 2)simd_desc is not proper.
> 
>     I also need to transfer some members of DisasContext to helpers. 
> 
>     (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of
> this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;

The oprsz & maxsz parameters to tcg_gen_gvec_* should be given (ctx->lmul *
ctx->vlen).  The sew parameter should be implied by the helper function called,
each helper function using a different type.  Therefore vlmax can be trivially
computed within the helper from oprsz / sizeof(type).

> Data is reserved to expand.  Mlen is mask length for one elment, so it will
> defined as ctx->sew/ctx->lmul. As with Mlen, a active element will
> 
> be selected by
> 
>     static inline int vext_elem_mask(void *v0, int mlen, int index)
>     {
>         int idx = (index * mlen) / 8;
>         int pos = (index * mlen) % 8;
> 
>         return (v0[idx] >> pos) & 0x1;
>     }
> 
>     So I may have to implement vext_desc instead of use the simd_desc, which
> will be another redundant. Maybe a better way to mask elements?

I think you will want to define your own vext_desc, building upon simd_desc,
such that lg2(mlen) is passed in the first N bits of simd_data.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-12-28  1:14         ` Richard Henderson
@ 2019-12-30  8:11           ` LIU Zhiwei
  2020-01-05 20:19             ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: LIU Zhiwei @ 2019-12-30  8:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: palmer, Alistair.Francis, qemu-devel, Chih-Min Chao



On 2019/12/28 9:14, Richard Henderson wrote:
> On 12/25/19 8:36 PM, LIU Zhiwei wrote:
>> struct {
>>
>>          uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
>>          target_ulong vxrm;
>>          target_ulong vxsat;
>>          target_ulong vl;
>>          target_ulong vstart;
>>          target_ulong vtype;
>>      } vext;
>>
>> Is it OK?
> I don't think there's a good reason for the vext structure -- I would drop
> that.  Otherwise it looks good.
>
>> However, there are still some differences from SVE.
>>
>> 1)cpu_env must be used as a parameter for helper function.
>>
>>      The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be
>> difficult to use out of line tcg_gen_gvec_ool.
> Sure.  That's also true of any of the fp operations, which will want to
> accumulate ieee exceptions.
>
> See tcg_gen_gvec_*_ptr(), which allows you to pass in cpu_env.
Thanks. The tcg_gen_gvec_*_ptr is good.
>
>> 2)simd_desc is not proper.
>>
>>      I also need to transfer some members of DisasContext to helpers.
>>
>>      (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of
>> this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;
> The oprsz & maxsz parameters to tcg_gen_gvec_* should be given (ctx->lmul *
> ctx->vlen).  The sew parameter should be implied by the helper function called,
> each helper function using a different type.  Therefore vlmax can be trivially
> computed within the helper from oprsz / sizeof(type).
It's clear that the oprsz & maxsz paramenters should be given (ctx->lmul 
* ctx->vlen) for tcg_gen_gvec_add.

However It's not clear when use tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool. 
I think the meaning of oprsz is the
the bits of active elements.  Therefore , oprsz is  8 * env->vext.vl in 
RISC-V and it can't be fetched  from
TB_FLAGS like SVE.

Probably oprsz field will be not be used in RISC-V vector extension.
>> Data is reserved to expand.  Mlen is mask length for one elment, so it will
>> defined as ctx->sew/ctx->lmul. As with Mlen, a active element will
>>
>> be selected by
>>
>>      static inline int vext_elem_mask(void *v0, int mlen, int index)
>>      {
>>          int idx = (index * mlen) / 8;
>>          int pos = (index * mlen) % 8;
>>
>>          return (v0[idx] >> pos) & 0x1;
>>      }
>>
>>      So I may have to implement vext_desc instead of use the simd_desc, which
>> will be another redundant. Maybe a better way to mask elements?
> I think you will want to define your own vext_desc, building upon simd_desc,
> such that lg2(mlen) is passed in the first N bits of simd_data.
Good. It's a good way to use the tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool API.

Best Regards,
Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1
  2019-12-30  8:11           ` LIU Zhiwei
@ 2020-01-05 20:19             ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-01-05 20:19 UTC (permalink / raw)
  To: LIU Zhiwei; +Cc: palmer, Alistair.Francis, qemu-devel, Chih-Min Chao

On 12/30/19 6:11 PM, LIU Zhiwei wrote:
> 
> However It's not clear when use tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool. I think
> the meaning of oprsz is the
> the bits of active elements.  Therefore , oprsz is  8 * env->vext.vl in RISC-V
> and it can't be fetched  from
> TB_FLAGS like SVE.
> 
> Probably oprsz field will be not be used in RISC-V vector extension.

Correct.  For those risc-v helpers that are called when VL != VLMAX, you would
ignore the oprsz field and fetch it from env.

It may still be handy to pass in vlmax as maxsz, even if you leave the oprsz
field 0.  You'll find that out as you do the coding, I suppose.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-01-05 20:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1566959818-38369-1-git-send-email-zhiwei_liu@c-sky.com>
2019-08-28  9:08 ` [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1 Alex Bennée
2019-08-28 16:39   ` Richard Henderson
2019-08-29 13:35   ` liuzhiwei
2019-08-28 18:54 ` Richard Henderson
2019-08-28 20:43   ` Richard Henderson
2019-08-29 12:45     ` liuzhiwei
2019-08-29 15:09       ` Richard Henderson
2019-09-02  7:45         ` liuzhiwei
2019-09-03 14:38           ` Richard Henderson
2019-09-02  9:43   ` liuzhiwei
2019-09-03 14:21     ` Richard Henderson
2019-12-19  9:11   ` LIU Zhiwei
2019-12-19 20:38     ` Richard Henderson
2019-12-25  9:36       ` LIU Zhiwei
2019-12-28  1:14         ` Richard Henderson
2019-12-30  8:11           ` LIU Zhiwei
2020-01-05 20:19             ` Richard Henderson
2019-08-28 21:34 ` Alistair Francis
2019-08-29 12:00   ` liuzhiwei
2019-08-29 15:14     ` Richard Henderson
2019-09-02  6:54       ` liuzhiwei
2019-08-29 21:50     ` Alistair Francis
2019-08-30  9:06       ` Alex Bennée
2019-08-30 18:39         ` Alistair Francis
2019-09-02  6:36       ` liuzhiwei
     [not found] ` <CAL1e-=iHangj7w+HgJ+FM=iqRLmaY-_CYeUv0gx+c8bpScb9RQ@mail.gmail.com>
     [not found]   ` <46ade3da-d642-bd19-7975-7dc228d401e4@c-sky.com>
2019-08-29 18:32     ` Aleksandar Markovic
     [not found] ` <CAEiOBXXofjrY2=sjuMDb9dTV2fk9yUVKnr+qmf+7mg9vki6OCw@mail.gmail.com>
2019-09-02  8:17   ` [Qemu-devel] [Qemu-riscv] " liuzhiwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).