From: Stefan Roese <sr@denx.de>
To: Tom Rini <trini@konsulko.com>
Cc: u-boot@lists.denx.de,
Rasmus Villemoes <rasmus.villemoes@prevas.dk>,
sjg@chromium.org, Wolfgang Denk <wd@denx.de>
Subject: Re: [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions
Date: Thu, 12 Aug 2021 10:43:56 +0200 [thread overview]
Message-ID: <b7eb3113-418f-b668-6f95-30e46cdbb5bf@denx.de> (raw)
In-Reply-To: <bf556e0e-158f-c23e-ea40-7bf9f0b370d6@denx.de>
On 11.08.21 16:28, Stefan Roese wrote:
> On 11.08.21 16:25, Tom Rini wrote:
>> On Wed, Aug 11, 2021 at 04:02:39PM +0200, Stefan Roese wrote:
>>>
>>> On an NXP LX2160 based platform it has been noticed, that the currently
>>> implemented memset/memcpy functions for aarch64 are suboptimal.
>>> Especially the memset() for clearing the NXP MC firmware memory is very
>>> expensive (time-wise).
>>>
>>> By using optimized functions, a speedup of ~ factor 6 has been measured.
>>
>> To be clear, you re-measured with the cache check code added, and this
>> is the speed up?
>
> I forgot doing this. BTW: I was wrong with factor ~6. From my notices,
> it is ~ factor 4 using the optimized memset() version.
>
> I'll follow-up on this mail with some measurements for all affected
> functions, using small and large sizes. Hopefully tomorrow.
Here the numbers:
Current original version:
-------------------------
memset() 32 Bytes, 16M times:
time: 0.446 seconds
memset() 16MiB, 256 times:
time: 1.076 seconds
memcpy() 512MiB:
time: 0.224 seconds
New optimized version:
----------------------
memset() 32 Bytes, 16M times:
time: 0.287 seconds
memset() 16MiB, 256 times:
time: 0.292 seconds
memcpy() 512MiB:
time: 0.222 seconds
Summary:
The optimized memcpy is nearly identical to the original one. But the
optimized memset is much faster, for small and big sizes. Small sizes
factor ~1.6 and big sizes factor ~3.7.
Note: These measurements were done on the NXP LX2160ARDB board.
Thanks,
Stefan
prev parent reply other threads:[~2021-08-12 8:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-11 14:02 [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions Stefan Roese
2021-08-11 14:02 ` [PATCH v3 1/3] arm64: arch/arm/lib: Add optimized memset/memcpy/memmove functions Stefan Roese
2021-08-11 14:02 ` [PATCH v3 2/3] arm64: memset-arm64: Use simple memset when cache is disabled Stefan Roese
2021-08-11 14:02 ` [PATCH v3 3/3] arm64: Kconfig: Enable usage of optimized memset/memcpy/memmove Stefan Roese
2021-08-11 14:25 ` [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions Tom Rini
2021-08-11 14:28 ` Stefan Roese
2021-08-12 8:43 ` Stefan Roese [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7eb3113-418f-b668-6f95-30e46cdbb5bf@denx.de \
--to=sr@denx.de \
--cc=rasmus.villemoes@prevas.dk \
--cc=sjg@chromium.org \
--cc=trini@konsulko.com \
--cc=u-boot@lists.denx.de \
--cc=wd@denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.