All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Roese <sr@denx.de>
To: Tom Rini <trini@konsulko.com>
Cc: u-boot@lists.denx.de,
	Rasmus Villemoes <rasmus.villemoes@prevas.dk>,
	sjg@chromium.org, Wolfgang Denk <wd@denx.de>
Subject: Re: [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions
Date: Thu, 12 Aug 2021 10:43:56 +0200	[thread overview]
Message-ID: <b7eb3113-418f-b668-6f95-30e46cdbb5bf@denx.de> (raw)
In-Reply-To: <bf556e0e-158f-c23e-ea40-7bf9f0b370d6@denx.de>

On 11.08.21 16:28, Stefan Roese wrote:
> On 11.08.21 16:25, Tom Rini wrote:
>> On Wed, Aug 11, 2021 at 04:02:39PM +0200, Stefan Roese wrote:
>>>
>>> On an NXP LX2160 based platform it has been noticed, that the currently
>>> implemented memset/memcpy functions for aarch64 are suboptimal.
>>> Especially the memset() for clearing the NXP MC firmware memory is very
>>> expensive (time-wise).
>>>
>>> By using optimized functions, a speedup of ~ factor 6 has been measured.
>>
>> To be clear, you re-measured with the cache check code added, and this
>> is the speed up?
> 
> I forgot doing this. BTW: I was wrong with factor ~6. From my notices,
> it is ~ factor 4 using the optimized memset() version.
> 
> I'll follow-up on this mail with some measurements for all affected
> functions, using small and large sizes. Hopefully tomorrow.

Here the numbers:

Current original version:
-------------------------
memset() 32 Bytes, 16M times:
time: 0.446 seconds

memset() 16MiB, 256 times:
time: 1.076 seconds

memcpy() 512MiB:
time: 0.224 seconds

New optimized version:
----------------------
memset() 32 Bytes, 16M times:
time: 0.287 seconds

memset() 16MiB, 256 times:
time: 0.292 seconds

memcpy() 512MiB:
time: 0.222 seconds

Summary:
The optimized memcpy is nearly identical to the original one. But the
optimized memset is much faster, for small and big sizes. Small sizes
factor ~1.6 and big sizes factor ~3.7.

Note: These measurements were done on the NXP LX2160ARDB board.

Thanks,
Stefan

      reply	other threads:[~2021-08-12  8:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11 14:02 [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions Stefan Roese
2021-08-11 14:02 ` [PATCH v3 1/3] arm64: arch/arm/lib: Add optimized memset/memcpy/memmove functions Stefan Roese
2021-08-11 14:02 ` [PATCH v3 2/3] arm64: memset-arm64: Use simple memset when cache is disabled Stefan Roese
2021-08-11 14:02 ` [PATCH v3 3/3] arm64: Kconfig: Enable usage of optimized memset/memcpy/memmove Stefan Roese
2021-08-11 14:25 ` [PATCH v3 0/3] arm64: Add optimized memset/memcpy/memove functions Tom Rini
2021-08-11 14:28   ` Stefan Roese
2021-08-12  8:43     ` Stefan Roese [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7eb3113-418f-b668-6f95-30e46cdbb5bf@denx.de \
    --to=sr@denx.de \
    --cc=rasmus.villemoes@prevas.dk \
    --cc=sjg@chromium.org \
    --cc=trini@konsulko.com \
    --cc=u-boot@lists.denx.de \
    --cc=wd@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.