From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
To: Max Chou <max.chou@sifive.com>,
qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
Alistair Francis <alistair.francis@wdc.com>,
Bin Meng <bin.meng@windriver.com>,
Weiwei Li <liwei1518@gmail.com>,
Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Subject: Re: [RFC PATCH 3/6] target/riscv: Inline vext_ldst_us and coressponding function for performance
Date: Thu, 15 Feb 2024 18:11:36 -0300 [thread overview]
Message-ID: <2701c3a3-d9ab-4058-99f6-d542baf293ec@ventanamicro.com> (raw)
In-Reply-To: <20240215192823.729209-4-max.chou@sifive.com>
On 2/15/24 16:28, Max Chou wrote:
> In the vector unit-stride load/store helper functions. the vext_ldst_us
> function corresponding most of the execution time. Inline the functions
> can avoid the function call overhead to imperove the helper function
> performance.
>
> Signed-off-by: Max Chou <max.chou@sifive.com>
> ---
The inline is a good idea but I think we can do better. I mentioned in a thread
last year [1] about the time we're spending in single byte loads/stores, even
for strided instructions.
E.g. in vext_ldst_stride():
for (i = env->vstart; i < env->vl; i++, env->vstart++) {
k = 0;
while (k < nf) {
if (!vm && !vext_elem_mask(v0, i)) {
/* set masked-off elements to 1s */
vext_set_elems_1s(vd, vma, (i + k * max_elems) * esz,
(i + k * max_elems + 1) * esz);
k++;
continue;
}
target_ulong addr = base + stride * i + (k << log2_esz);
ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
k++;
}
}
We're doing single byte load/stores in ldst_elem() when, in this case, we could do
it in a whole block only once. ARM does something similar in SVE.
I update the gitlab bug https://gitlab.com/qemu-project/qemu/-/issues/2137 with this
additional info too.
Thanks,
Daniel
[1] https://lore.kernel.org/qemu-riscv/0e54c6c1-2903-7942-eff2-2b8c5e21187e@ventanamicro.com/
> target/riscv/vector_helper.c | 30 ++++++++++++++++--------------
> 1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index e8fbb921449..866f77d321d 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -149,25 +149,27 @@ static inline void vext_set_elem_mask(void *v0, int index,
> typedef void vext_ldst_elem_fn(CPURISCVState *env, abi_ptr addr,
> uint32_t idx, void *vd, uintptr_t retaddr);
>
> -#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \
> -static void NAME(CPURISCVState *env, abi_ptr addr, \
> - uint32_t idx, void *vd, uintptr_t retaddr)\
> -{ \
> - ETYPE *cur = ((ETYPE *)vd + H(idx)); \
> - *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \
> -} \
> +#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \
> +static inline QEMU_ALWAYS_INLINE \
> +void NAME(CPURISCVState *env, abi_ptr addr, \
> + uint32_t idx, void *vd, uintptr_t retaddr) \
> +{ \
> + ETYPE *cur = ((ETYPE *)vd + H(idx)); \
> + *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \
> +} \
>
> GEN_VEXT_LD_ELEM(lde_b, int8_t, H1, ldsb)
> GEN_VEXT_LD_ELEM(lde_h, int16_t, H2, ldsw)
> GEN_VEXT_LD_ELEM(lde_w, int32_t, H4, ldl)
> GEN_VEXT_LD_ELEM(lde_d, int64_t, H8, ldq)
>
> -#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \
> -static void NAME(CPURISCVState *env, abi_ptr addr, \
> - uint32_t idx, void *vd, uintptr_t retaddr)\
> -{ \
> - ETYPE data = *((ETYPE *)vd + H(idx)); \
> - cpu_##STSUF##_data_ra(env, addr, data, retaddr); \
> +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \
> +static inline QEMU_ALWAYS_INLINE \
> +void NAME(CPURISCVState *env, abi_ptr addr, \
> + uint32_t idx, void *vd, uintptr_t retaddr) \
> +{ \
> + ETYPE data = *((ETYPE *)vd + H(idx)); \
> + cpu_##STSUF##_data_ra(env, addr, data, retaddr); \
> }
>
> GEN_VEXT_ST_ELEM(ste_b, int8_t, H1, stb)
> @@ -289,7 +291,7 @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
> */
>
> /* unmasked unit-stride load and store operation */
> -static void
> +static inline QEMU_ALWAYS_INLINE void
> vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl,
> uintptr_t ra)
next prev parent reply other threads:[~2024-02-15 21:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 19:28 [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions Max Chou
2024-02-15 19:28 ` [RFC PATCH 1/6] target/riscv: Seperate vector segment " Max Chou
2024-02-15 19:28 ` [RFC PATCH 2/6] accel/tcg: Avoid uncessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
2024-02-15 20:03 ` Richard Henderson
2024-02-17 9:08 ` Max Chou
2024-02-15 20:21 ` Daniel Henrique Barboza
2024-02-17 9:45 ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 3/6] target/riscv: Inline vext_ldst_us and coressponding function for performance Max Chou
2024-02-15 20:09 ` Richard Henderson
2024-02-15 21:11 ` Daniel Henrique Barboza [this message]
2024-02-17 10:10 ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 4/6] accel/tcg: Inline cpu_mmu_lookup function Max Chou
2024-02-15 20:10 ` Richard Henderson
2024-02-17 17:27 ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 5/6] accel/tcg: Inline do_ld1_mmu function Max Chou
2024-02-15 20:12 ` Richard Henderson
2024-02-15 19:28 ` [RFC PATCH 6/6] accel/tcg: Inline do_st1_mmu function Max Chou
2024-02-15 20:24 ` [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions Richard Henderson
2024-02-17 9:52 ` Max Chou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2701c3a3-d9ab-4058-99f6-d542baf293ec@ventanamicro.com \
--to=dbarboza@ventanamicro.com \
--cc=alistair.francis@wdc.com \
--cc=bin.meng@windriver.com \
--cc=liwei1518@gmail.com \
--cc=max.chou@sifive.com \
--cc=palmer@dabbelt.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-riscv@nongnu.org \
--cc=zhiwei_liu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).