From: Robert Hoo <robert.hu@linux.intel.com>
To: qemu-devel@nongnu.org, pbonzini@redhat.com, richard.henderson@linaro.org
Cc: robert.hu@intel.com, Robert Hoo <robert.hu@linux.intel.com>
Subject: [PATCH 2/2] util/bufferiszero: improve avx2 accelerator
Date: Wed, 25 Mar 2020 14:50:21 +0800 [thread overview]
Message-ID: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com> (raw)
In-Reply-To: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com>
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
branch.
The authorship of this patch actually belongs to Richard Henderson <richard.henderson@linaro.org>,
I just fix a boudary case on his original patch.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
util/bufferiszero.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index b801253..695bb4c 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -158,27 +158,19 @@ buffer_zero_avx2(const void *buf, size_t len)
__m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32);
__m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32);
- if (likely(p <= e)) {
- /* Loop over 32-byte aligned blocks of 128. */
- do {
- __builtin_prefetch(p);
- if (unlikely(!_mm256_testz_si256(t, t))) {
- return false;
- }
- t = p[-4] | p[-3] | p[-2] | p[-1];
- p += 4;
- } while (p <= e);
- } else {
- t |= _mm256_loadu_si256(buf + 32);
- if (len <= 128) {
- goto last2;
+ /* Loop over 32-byte aligned blocks of 128. */
+ while (p <= e) {
+ __builtin_prefetch(p);
+ if (unlikely(!_mm256_testz_si256(t, t))) {
+ return false;
}
- }
+ t = p[-4] | p[-3] | p[-2] | p[-1];
+ p += 4;
+ } ;
/* Finish the last block of 128 unaligned. */
t |= _mm256_loadu_si256(buf + len - 4 * 32);
t |= _mm256_loadu_si256(buf + len - 3 * 32);
- last2:
t |= _mm256_loadu_si256(buf + len - 2 * 32);
t |= _mm256_loadu_si256(buf + len - 1 * 32);
@@ -263,7 +255,7 @@ static void init_accel(unsigned cache)
}
if (cache & CACHE_AVX2) {
fn = buffer_zero_avx2;
- length_to_accel = 64;
+ length_to_accel = 128;
}
#endif
#ifdef CONFIG_AVX512F_OPT
--
1.8.3.1
next prev parent reply other threads:[~2020-03-25 6:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-25 6:50 [PATCH 1/2] util/bufferiszero: assign length_to_accel value for each accelerator case Robert Hoo
2020-03-25 6:50 ` Robert Hoo [this message]
2020-03-25 12:54 ` [PATCH 2/2] util/bufferiszero: improve avx2 accelerator Eric Blake
2020-03-26 2:09 ` Hu, Robert
2020-03-26 9:43 ` Paolo Bonzini
2020-03-26 13:26 ` Eric Blake
2020-03-26 13:51 ` Robert Hoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1585119021-46593-2-git-send-email-robert.hu@linux.intel.com \
--to=robert.hu@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=robert.hu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).