From: David Laight <David.Laight@ACULAB.COM>
To: 'Palmer Dabbelt' <firstname.lastname@example.org>,
Paul Walmsley <email@example.com>,
Atish Patra <Atish.Patra@wdc.com>,
"Christoph Hellwig" <firstname.lastname@example.org>
Subject: RE: [PATCH] riscv: use the generic string routines
Date: Sat, 11 Sep 2021 17:26:12 +0000 [thread overview]
Message-ID: <241c29b27c4c4acbbf893516bfa6f5aa@AcuMS.aculab.com> (raw)
> These ended up getting rejected by Linus, so I'm going to hold off on
> this for now. If they're really out of lib/ then I'll take the C
> routines in arch/riscv, but either way it's an issue for the next
I've been half following this.
I've not seen any comparisons between the C functions proposed
here and the riscv asm ones that had the fix for misaligned
IIRC there is a comment in the asm ones that the unrolled
'read lots' - 'write lots' loop is faster than the older
(asm) read-write loop.
But I've not seen any archictural discussions at all.
A simple in-order single-issue cpu will execute the
unrolled loop faster just because it has fewer instructions.
The read-lots - write-lots almost certainly helps
avoid read-latency delaying things if multiple reads
can be pipelined.
The writes are almost certainly 'posted' and pipelined,
But a simple cpu could easily require all writes finish
before doing a read.
A super-scaler (multi-issue) cpu gives you the ability
to get the loop control instructions 'for free' with
carefully written assembler.
At which point a copy for 'life cache' data should be
limited only by the cpu's cache memory bandwidth.
If reads and writes can interleave then a loop that
alternates reads and writes (read each register
just after writing it) may mean that you always
keep the cpu-cache interface busy.
This would be especially true if the cpu can execute
both a cache read and write in the same cycle.
(Which many moderate performance cpu can.)
None of the requires out-of-order execution, just
execution to continue while a read is in progress.
I'm also guessing that any performance testing has been
done with the (relatively) cheap boards that are readily
But I've also seen references in the press to much faster
riscv cpu that are definitely multi-issue and may have
some simple out-of-order execution.
Any changes ought to be tested on these faster systems.
I also recall that some of the performance measurements
were made with long buffers - they will be dominated by the
cache to DRAM (and maybe TLB lookup) timings, not the copy
For a simple cpu you ought to be able to measure the
number of cpu cycles used for a copy - and account for
all of them.
For something like x86 you can show that the copy is
being limited by the cpu-cache bandwidth.
(FWIW measurements of the inet checksum code on x86
show it runs at half the expected speed on a lot of
Intel cpu - no one ever measured it.)
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2021-09-11 17:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-19 11:43 [PATCH] riscv: use the generic string routines Matteo Croce
2021-08-03 16:54 ` Matteo Croce
2021-08-04 20:40 ` Palmer Dabbelt
2021-08-05 8:20 ` David Laight
2021-08-05 10:31 ` Matteo Croce
2021-09-11 3:49 ` Palmer Dabbelt
2021-09-11 17:26 ` David Laight [this message]
2021-09-12 0:10 ` Guo Ren
2021-09-13 11:35 ` David Laight
2021-09-19 19:13 ` Matteo Croce
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).