All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: Sarosh Arif <sarosh.arif@emumba.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy() code semantics with optimized rte_memcpy()
Date: Tue, 28 Jul 2020 15:50:31 +0200	[thread overview]
Message-ID: <20200728135031.GX5869@platinum> (raw)
In-Reply-To: <CABoZmYNfz0oTwMw3CE3whsERUMhU9i4krsSo3O7C76u_TRDbDw@mail.gmail.com>

Hi Sarosh,

On Tue, Jul 28, 2020 at 06:30:46PM +0500, Sarosh Arif wrote:
> Hello,
> The following things made me think that rte_memcpy() is more optimized
> than memcpy():
> 1. dpdk documentation recommends to use rte_memcpy() instead of memcpy():
>     https://doc.dpdk.org/guides/prog_guide/writing_efficient_code.html
> 2. Here some benchmarks are available:
>     https://software.intel.com/content/www/us/en/develop/articles/performance-optimization-of-memcpy-in-dpdk.html
> 3. rte_memcpy() has __attribute__((always_inline)) associated with it,
> so compiler also tries to inline it.
> 
> Using rte_memcpy() everywhere ensures consistency in code-base.
> Here are the results of the performance number measurement using "perf":
> 
> rte_memcpy()
> 
>  Performance counter stats
>           1.573864      task-clock (msec)         #    0.898 CPUs
> utilized
>                  0      context-switches          #    0.000 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                342      page-faults               #    0.217 M/sec
>          5,483,016      cycles                    #    3.484 GHz
>          5,554,017      instructions              #    1.01  insn per
> cycle
>          1,114,593      branches                  #  708.189 M/sec
>             33,796      branch-misses             #    3.03% of all
> branches
>          1,369,247      L1-dcache-loads           #  869.991 M/sec
>      <not counted>      L1-dcache-load-misses
>                (0.00%)
>      <not counted>      LLC-loads
>                (0.00%)
>      <not counted>      LLC-load-misses
>                (0.00%)
> 
>        0.001753373 seconds time elapsed
> 
> 
> 
> memcpy()
> 
>  Performance counter stats
>           1.631135      task-clock (msec)         #    0.902 CPUs
> utilized
>                  0      context-switches          #    0.000 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                342      page-faults               #    0.210 M/sec
>          5,676,549      cycles                    #    3.480 GHz
>                (73.99%)
>          5,739,593      instructions              #    1.01  insn per
> cycle
>          1,141,121      branches                  #  699.587 M/sec
>             34,553      branch-misses             #    3.03% of all
> branches
>          1,417,494      L1-dcache-loads           #  869.023 M/sec
>             67,312      L1-dcache-load-misses     #    4.75% of all
> L1-dcache hits    (26.01%)
>      <not counted>      LLC-loads
>                (0.00%)
>      <not counted>      LLC-load-misses
>                (0.00%)
> 
>       0.001808500 seconds time elapsed
> 

Can you give more details about your use-case? I mean what code
are you running for this benchmark.

I'll tend to agree with Stephen: memcpy() with a constant (small) size
should directly be replaced by the optimal code for this architecture.

rte_memcpy() uses vector instructions, and is probably better than
libc's memcpy for larger copies.

Thanks,
Olivier


> 
> 
> On Thu, Jul 23, 2020 at 8:47 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Thu, 23 Jul 2020 12:02:40 +0500
> > Sarosh Arif <sarosh.arif@emumba.com> wrote:
> >
> > > Since rte_memcpy is more optimized it should be used instead of memcpy
> > >
> > > Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
> >
> > Really did you measure this.
> > For fixed size structures, compiler can inline memcpy small set of instructions.

  reply	other threads:[~2020-07-28 13:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-23  7:02 [dpdk-dev] [PATCH] mbuf: replace c memcpy() code semantics with optimized rte_memcpy() Sarosh Arif
2020-07-23 15:47 ` Stephen Hemminger
2020-07-28 13:30   ` Sarosh Arif
2020-07-28 13:50     ` Olivier Matz [this message]
2020-07-28 17:46 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728135031.GX5869@platinum \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=sarosh.arif@emumba.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.