qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Chou <max.chou@sifive.com>
To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: dbarboza@ventanamicro.com, Max Chou <max.chou@sifive.com>
Subject: [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions
Date: Fri, 16 Feb 2024 03:28:11 +0800	[thread overview]
Message-ID: <20240215192823.729209-1-max.chou@sifive.com> (raw)

Hi all,

When glibc with RVV support [1], the memcpy benchmark will run 2x to 60x
slower than the scalar equivalent on QEMU and it hurts developer
productivity.

From the performance analysis result, we can observe that the glibc
memcpy spends most of the time in the vector unit-stride load/store
helper functions.

Samples: 465K of event 'cycles:u', Event count (approx.): 1707645730664
  Children      Self  Command       Shared Object            Symbol
+   28.46%    27.85%  qemu-riscv64  qemu-riscv64             [.] vext_ldst_us
+   26.92%     0.00%  qemu-riscv64  [unknown]                [.] 0x00000000000000ff
+   14.41%    14.41%  qemu-riscv64  qemu-riscv64             [.] qemu_plugin_vcpu_mem_cb
+   13.85%    13.85%  qemu-riscv64  qemu-riscv64             [.] lde_b
+   13.64%    13.64%  qemu-riscv64  qemu-riscv64             [.] cpu_stb_mmu
+    9.25%     9.19%  qemu-riscv64  qemu-riscv64             [.] cpu_ldb_mmu
+    7.81%     7.81%  qemu-riscv64  qemu-riscv64             [.] cpu_mmu_lookup
+    7.70%     7.70%  qemu-riscv64  qemu-riscv64             [.] ste_b
+    5.53%     0.00%  qemu-riscv64  qemu-riscv64             [.] adjust_addr (inlined)   


So this patchset tries to improve the performance of the RVV version of
glibc memcpy on QEMU by improving the corresponding helper function
quality.

The overall performance improvement can achieve following numbers
(depending on the size).
Average: 2.86X / Smallest: 1.15X / Largest: 4.49X

PS: This RFC patchset only focuses on the vle8.v & vse8.v instructions,
the next version or next serious will complete other vector ld/st part.

Regards,
Max.

[1] https://inbox.sourceware.org/libc-alpha/20230504074851.38763-1-hau.hsu@sifive.com

Max Chou (6):
  target/riscv: Seperate vector segment ld/st instructions
  accel/tcg: Avoid uncessary call overhead from qemu_plugin_vcpu_mem_cb
  target/riscv: Inline vext_ldst_us and coressponding function for
    performance
  accel/tcg: Inline cpu_mmu_lookup function
  accel/tcg: Inline do_ld1_mmu function
  accel/tcg: Inline do_st1_mmu function

 accel/tcg/ldst_common.c.inc             |  40 ++++++--
 accel/tcg/user-exec.c                   |  17 ++--
 target/riscv/helper.h                   |   4 +
 target/riscv/insn32.decode              |  11 +-
 target/riscv/insn_trans/trans_rvv.c.inc |  61 +++++++++++
 target/riscv/vector_helper.c            | 130 +++++++++++++++++++-----
 6 files changed, 221 insertions(+), 42 deletions(-)

-- 
2.34.1



             reply	other threads:[~2024-02-15 19:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 19:28 Max Chou [this message]
2024-02-15 19:28 ` [RFC PATCH 1/6] target/riscv: Seperate vector segment ld/st instructions Max Chou
2024-02-15 19:28 ` [RFC PATCH 2/6] accel/tcg: Avoid uncessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
2024-02-15 20:03   ` Richard Henderson
2024-02-17  9:08     ` Max Chou
2024-02-15 20:21   ` Daniel Henrique Barboza
2024-02-17  9:45     ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 3/6] target/riscv: Inline vext_ldst_us and coressponding function for performance Max Chou
2024-02-15 20:09   ` Richard Henderson
2024-02-15 21:11   ` Daniel Henrique Barboza
2024-02-17 10:10     ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 4/6] accel/tcg: Inline cpu_mmu_lookup function Max Chou
2024-02-15 20:10   ` Richard Henderson
2024-02-17 17:27     ` Max Chou
2024-02-15 19:28 ` [RFC PATCH 5/6] accel/tcg: Inline do_ld1_mmu function Max Chou
2024-02-15 20:12   ` Richard Henderson
2024-02-15 19:28 ` [RFC PATCH 6/6] accel/tcg: Inline do_st1_mmu function Max Chou
2024-02-15 20:24 ` [RFC PATCH 0/6] Improve the performance of RISC-V vector unit-stride ld/st instructions Richard Henderson
2024-02-17  9:52   ` Max Chou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240215192823.729209-1-max.chou@sifive.com \
    --to=max.chou@sifive.com \
    --cc=dbarboza@ventanamicro.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).