All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"ophirmu@mellanox.com" <ophirmu@mellanox.com>
Subject: Re: [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
Date: Wed, 25 Oct 2017 08:54:10 +0000	[thread overview]
Message-ID: <B9E724F4CB7543449049E7AE7669D82F48176B@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAAEB8E@IRSMSX103.ger.corp.intel.com>



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, October 25, 2017 16:51
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>; ophirmu@mellanox.com
> Subject: RE: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over
> memcpy
> 
> 
> 
> > -----Original Message-----
> > From: Li, Xiaoyun
> > Sent: Wednesday, October 25, 2017 7:55 AM
> > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> > Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Helin
> > <helin.zhang@intel.com>; ophirmu@mellanox.com
> > Subject: RE: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over
> > memcpy
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Li, Xiaoyun
> > > Sent: Friday, October 20, 2017 09:03
> > > To: Richardson, Bruce <bruce.richardson@intel.com>; Thomas Monjalon
> > > <thomas@monjalon.net>
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > > dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Helin
> > > <helin.zhang@intel.com>; ophirmu@mellanox.com
> > > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch
> > > over memcpy
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richardson, Bruce
> > > > Sent: Thursday, October 19, 2017 17:30
> > > > To: Thomas Monjalon <thomas@monjalon.net>
> > > > Cc: Li, Xiaoyun <xiaoyun.li@intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>; dev@dpdk.org; Lu, Wenzhuo
> > > > <wenzhuo.lu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> > > > ophirmu@mellanox.com
> > > > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch
> > > > over memcpy
> > > >
> > > > On Thu, Oct 19, 2017 at 11:00:33AM +0200, Thomas Monjalon wrote:
> > > > > 19/10/2017 10:50, Li, Xiaoyun:
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > Sent: Thursday, October 19, 2017 16:34
> > > > > > > To: Li, Xiaoyun <xiaoyun.li@intel.com>
> > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > > > > > > Richardson, Bruce <bruce.richardson@intel.com>;
> > > > > > > dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang,
> > > > > > > Helin <helin.zhang@intel.com>; ophirmu@mellanox.com
> > > > > > > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time
> > > > > > > dispatch over memcpy
> > > > > > >
> > > > > > > 19/10/2017 09:51, Li, Xiaoyun:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > 19/10/2017 04:45, Li, Xiaoyun:
> > > > > > > > > > Hi
> > > > > > > > > > > > >
> > > > > > > > > > > > > The significant change of this patch is to call
> > > > > > > > > > > > > a function pointer for packet size > 128
> > > > (RTE_X86_MEMCPY_THRESH).
> > > > > > > > > > > > The perf drop is due to function call replacing inline.
> > > > > > > > > > > >
> > > > > > > > > > > > > Please could you provide some benchmark numbers?
> > > > > > > > > > > > I ran memcpy_perf_test which would show the time
> > > > > > > > > > > > cost of memcpy. I ran it on broadwell with sse and avx2.
> > > > > > > > > > > > But I just draw pictures and looked at the trend
> > > > > > > > > > > > not computed the exact percentage. Sorry about that.
> > > > > > > > > > > > The picture shows results of copy size of 2, 4, 6,
> > > > > > > > > > > > 8, 9, 12, 16, 32, 64, 128, 192, 256, 320, 384,
> > > > > > > > > > > > 448, 512, 768, 1024, 1518, 1522, 1536, 1600, 2048,
> > > > > > > > > > > > 2560, 3072, 3584, 4096, 4608, 5120, 5632, 6144,
> > > > > > > > > > > > 6656, 7168,
> > > > > > > > > > > 7680, 8192.
> > > > > > > > > > > > In my test, the size grows, the drop degrades.
> > > > > > > > > > > > (Using copy time indicates the
> > > > > > > > > > > > perf.) From the trend picture, when the size is
> > > > > > > > > > > > smaller than
> > > > > > > > > > > > 128 bytes, the perf drops a lot, almost 50%. And
> > > > > > > > > > > > above
> > > > > > > > > > > > 128 bytes, it approaches the original dpdk.
> > > > > > > > > > > > I computed it right now, it shows that when
> > > > > > > > > > > > greater than
> > > > > > > > > > > > 128 bytes and smaller than 1024 bytes, the perf
> > > > > > > > > > > > drops about
> > > > 15%.
> > > > > > > > > > > > When above
> > > > > > > > > > > > 1024 bytes, the perf drops about 4%.
> > > > > > > > > > > >
> > > > > > > > > > > > > From a test done at Mellanox, there might be a
> > > > > > > > > > > > > performance degradation of about 15% in testpmd
> > > > > > > > > > > > > txonly
> > > > with AVX2.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I did tests on X710, XXV710, X540 and MT27710 but
> > > > > > > > > > didn't see
> > > > > > > > > performance degradation.
> > > > > > > > > >
> > > > > > > > > > I used command
> > > > > > > > > > "./x86_64-native-linuxapp-gcc/app/testpmd
> > > > > > > > > > -c 0xf -n
> > > > > > > > > > 4 -- -
> > > > > > > > > I" and set fwd txonly.
> > > > > > > > > > I tested it on v17.11-rc1, then revert my patch and
> > > > > > > > > > tested it
> > > again.
> > > > > > > > > > Show port stats all and see the throughput pps. But
> > > > > > > > > > the results are similar
> > > > > > > > > and no drop.
> > > > > > > > > >
> > > > > > > > > > Did I miss something?
> > > > > > > > >
> > > > > > > > > I do not understand. Yesterday you confirmed a 15% drop
> > > > > > > > > with buffers between
> > > > > > > > > 128 and 1024 bytes.
> > > > > > > > > But you do not see this drop in your txonly tests, right?
> > > > > > > > >
> > > > > > > > Yes. The drop is using test.
> > > > > > > > Using command "make test -j" and then " ./build/app/test -c f -n
> 4 "
> > > > > > > > Then run "memcpy_perf_autotest"
> > > > > > > > The results are the cycles that memory copy costs.
> > > > > > > > But I just use it to show the trend because I heard that
> > > > > > > > it's not
> > > > > > > recommended to use micro benchmarks like test_memcpy_perf
> > > > > > > for memcpy performance report as they aren't likely able to
> > > > > > > reflect performance of real world applications.
> > > > > > >
> > > > > > > Yes real applications can hide the memcpy cost.
> > > > > > > Sometimes, the cost appear for real :)
> > > > > > >
> > > > > > > > Details can be seen at
> > > > > > > > https://software.intel.com/en-us/articles/performance-opti
> > > > > > > > miza
> > > > > > > > ti
> > > > > > > > on-of-
> > > > > > > > memcpy-in-dpdk
> > > > > > > >
> > > > > > > > And I didn't see drop in testpmd txonly test. Maybe it's
> > > > > > > > because not a lot
> > > > > > > memcpy calls.
> > > > > > >
> > > > > > > It has been seen in a mlx4 use-case using more memcpy.
> > > > > > > I think 15% in micro-benchmark is too much.
> > > > > > > What can we do? Raise the threshold?
> > > > > > >
> > > > > > I think so. If there is big drop, can try raise the threshold. Maybe
> 1024?
> > > > but not sure.
> > > > > > But I didn't reproduce the 15% drop on mellanox and not sure
> > > > > > how to
> > > > verify it.
> > > > >
> > > > > I think we should focus on micro-benchmark and find a
> > > > > reasonnable threshold for a reasonnable drop tradeoff.
> > > > >
> > > > Sadly, it may not be that simple. What shows best performance for
> > > > micro- benchmarks may not show the same effect in a real application.
> > > >
> > > > /Bruce
> > >
> > > Then how to measure the performance?
> > >
> > > And I cannot reproduce 15% drop on mellanox.
> > > Could the person who tested 15% drop help to do test again with 1024
> > > threshold and see if there is any improvement?
> >
> > As Bruce said, best performance on micro-benchmark may not show the
> same effect in real applications.
> > And I cannot reproduce the 15% drop.
> > And I don't know if raising the threshold can improve the perf or not.
> > Could the person who tested 15% drop help to do test again with 1024
> threshold and see if there is any improvement?
> 
> As I already asked before - why not to make that threshold dynamic?
> Konstantin
> 
I want to confirm that raising threshold is useful. Then can make it dynamic and set it very large as default.

> >
> > Best Regards
> > Xiaoyun Li
> >
> >

  reply	other threads:[~2017-10-25  8:55 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-26  7:41 [PATCH v3 0/3] dynamic linking support Xiaoyun Li
2017-09-26  7:41 ` [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-01 23:41   ` Ananyev, Konstantin
2017-10-02  0:12     ` Li, Xiaoyun
2017-09-26  7:41 ` [PATCH v3 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-09-26  7:41 ` [PATCH v3 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02  0:08   ` Ananyev, Konstantin
2017-10-02  0:09     ` Li, Xiaoyun
2017-10-02  9:35     ` Ananyev, Konstantin
2017-10-02 16:13 ` [PATCH v4 0/3] run-time Linking support Xiaoyun Li
2017-10-02 16:13   ` [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-02 16:39     ` Ananyev, Konstantin
2017-10-02 23:10       ` Li, Xiaoyun
2017-10-03 11:15         ` Ananyev, Konstantin
2017-10-03 11:39           ` Li, Xiaoyun
2017-10-03 12:12             ` Ananyev, Konstantin
2017-10-03 12:23               ` Li, Xiaoyun
2017-10-02 16:13   ` [PATCH v4 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-02 16:13   ` [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02 16:52     ` Ananyev, Konstantin
2017-10-03  8:15       ` Li, Xiaoyun
2017-10-03 11:23         ` Ananyev, Konstantin
2017-10-03 11:27           ` Li, Xiaoyun
2017-10-03 14:59   ` [PATCH v5 0/3] run-time Linking support Xiaoyun Li
2017-10-03 14:59     ` [PATCH v5 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-03 14:59     ` [PATCH v5 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-03 14:59     ` [PATCH v5 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-04 17:56     ` [PATCH v5 0/3] run-time Linking support Ananyev, Konstantin
2017-10-04 22:33       ` Li, Xiaoyun
2017-10-04 22:58     ` [PATCH v6 " Xiaoyun Li
2017-10-04 22:58       ` [PATCH v6 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-05  9:37         ` Ananyev, Konstantin
2017-10-05  9:38           ` Ananyev, Konstantin
2017-10-05 11:19           ` Li, Xiaoyun
2017-10-05 11:26             ` Richardson, Bruce
2017-10-05 11:26             ` Li, Xiaoyun
2017-10-05 12:12               ` Ananyev, Konstantin
2017-10-04 22:58       ` [PATCH v6 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-04 22:58       ` [PATCH v6 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05  9:40         ` Ananyev, Konstantin
2017-10-05 10:23           ` Li, Xiaoyun
2017-10-05 12:33       ` [PATCH v7 0/3] run-time Linking support Xiaoyun Li
2017-10-05 12:33         ` [PATCH v7 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-09 17:47           ` Thomas Monjalon
2017-10-13  1:06             ` Li, Xiaoyun
2017-10-13  7:21               ` Thomas Monjalon
2017-10-13  7:30                 ` Li, Xiaoyun
2017-10-13  7:31                 ` Ananyev, Konstantin
2017-10-13  7:36                   ` Thomas Monjalon
2017-10-13  7:41                     ` Li, Xiaoyun
2017-10-05 12:33         ` [PATCH v7 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-05 12:33         ` [PATCH v7 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05 13:24         ` [PATCH v7 0/3] run-time Linking support Ananyev, Konstantin
2017-10-09 17:40         ` Thomas Monjalon
2017-10-13  0:58           ` Li, Xiaoyun
2017-10-13  9:01         ` [PATCH v8 " Xiaoyun Li
2017-10-13  9:01           ` [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-13  9:28             ` Thomas Monjalon
2017-10-13 10:26               ` Ananyev, Konstantin
2017-10-17 21:24             ` Thomas Monjalon
2017-10-18  2:21               ` Li, Xiaoyun
2017-10-18  6:22                 ` Li, Xiaoyun
2017-10-19  2:45                   ` Li, Xiaoyun
2017-10-19  6:58                     ` Thomas Monjalon
2017-10-19  7:51                       ` Li, Xiaoyun
2017-10-19  8:33                         ` Thomas Monjalon
2017-10-19  8:50                           ` Li, Xiaoyun
2017-10-19  8:59                             ` Ananyev, Konstantin
2017-10-19  9:00                             ` Thomas Monjalon
2017-10-19  9:29                               ` Bruce Richardson
2017-10-20  1:02                                 ` Li, Xiaoyun
2017-10-25  6:55                                   ` Li, Xiaoyun
2017-10-25  7:25                                     ` Thomas Monjalon
2017-10-29  8:49                                       ` Thomas Monjalon
2017-11-02 10:22                                         ` Wang, Zhihong
2017-11-02 10:44                                           ` Thomas Monjalon
2017-11-02 10:58                                             ` Li, Xiaoyun
2017-11-02 12:15                                               ` Thomas Monjalon
2017-11-03  7:47                                             ` Yao, Lei A
2017-10-25  8:50                                     ` Ananyev, Konstantin
2017-10-25  8:54                                       ` Li, Xiaoyun [this message]
2017-10-25  9:00                                         ` Thomas Monjalon
2017-10-25 10:32                                           ` Li, Xiaoyun
2017-10-25  9:14                                         ` Ananyev, Konstantin
2017-10-13  9:01           ` [PATCH v8 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-13  9:01           ` [PATCH v8 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-13 13:13           ` [PATCH v8 0/3] run-time Linking support Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B9E724F4CB7543449049E7AE7669D82F48176B@SHSMSX101.ccr.corp.intel.com \
    --to=xiaoyun.li@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=ophirmu@mellanox.com \
    --cc=thomas@monjalon.net \
    --cc=wenzhuo.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.