All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Jia He <hejianet@gmail.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Zhao, Bing" <ilovethull@163.com>,
	Olivier MATZ <olivier.matz@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"jia.he@hxt-semitech.com" <jia.he@hxt-semitech.com>,
	"jie2.liu@hxt-semitech.com" <jie2.liu@hxt-semitech.com>,
	"bing.zhao@hxt-semitech.com" <bing.zhao@hxt-semitech.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	jianbo.liu@arm.com, hemant.agrawal@nxp.com
Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
Date: Tue, 31 Oct 2017 16:44:35 +0530	[thread overview]
Message-ID: <20171031111433.GA21742@jerin> (raw)
In-Reply-To: <b54ce545-b75d-9813-209f-4125bd76c7db@gmail.com>

-----Original Message-----
> Date: Tue, 31 Oct 2017 10:55:15 +0800
> From: Jia He <hejianet@gmail.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "Zhao, Bing"
>  <ilovethull@163.com>, Olivier MATZ <olivier.matz@6wind.com>,
>  "dev@dpdk.org" <dev@dpdk.org>, "jia.he@hxt-semitech.com"
>  <jia.he@hxt-semitech.com>, "jie2.liu@hxt-semitech.com"
>  <jie2.liu@hxt-semitech.com>, "bing.zhao@hxt-semitech.com"
>  <bing.zhao@hxt-semitech.com>, "Richardson, Bruce"
>  <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
>  loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.4.0
> 
> Hi Jerin

Hi Jia,

> 
> Do you think  next step whether I need to implement the load_acquire half
> barrier as per freebsd

I did a quick prototype using C11 memory model(ACQUIRE/RELEASE) schematics
and tested on two arm64 platform in Cavium(Platform A: Non arm64 OOO machine)
and Platform B: arm64 OOO machine)

smp_rmb() performs better in Platform A:
acquire/release semantics perform better in platform B:

Here is the patch:
https://github.com/jerinjacobk/mytests/blob/master/ring/0001-ring-using-c11-memory-model.patch

In terms of next step:
- I am not sure the cost associated with acquire/release semantics on x86 or ppc.
IMO, We need to have both options under conditional compilation
flags and let the target platform choose the best one.

Thoughts?

Here is the performance numbers:
- Both platforms are running at different frequency, So absolute numbers does not
  matter, Just check the relative numbers.

Platform A: Performance numbers:
================================
no patch(Non arm64 OOO machine)
-------------------------------

SP/SC single enq/dequeue: 40
MP/MC single enq/dequeue: 282
SP/SC burst enq/dequeue (size: 8): 11
MP/MC burst enq/dequeue (size: 8): 42
SP/SC burst enq/dequeue (size: 32): 8
MP/MC burst enq/dequeue (size: 32): 16

### Testing empty dequeue ###
SC empty dequeue: 8.01
MC empty dequeue: 11.01

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 11.30
MP/MC bulk enq/dequeue (size: 8): 42.85
SP/SC bulk enq/dequeue (size: 32): 8.25
MP/MC bulk enq/dequeue (size: 32): 16.46

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 20.62
MP/MC bulk enq/dequeue (size: 8): 56.30
SP/SC bulk enq/dequeue (size: 32): 10.94
MP/MC bulk enq/dequeue (size: 32): 18.66
Test OK

# smp_rmb() patch((Non OOO arm64 machine)
http://dpdk.org/dev/patchwork/patch/30029/
-----------------------------------------

SP/SC single enq/dequeue: 42
MP/MC single enq/dequeue: 291
SP/SC burst enq/dequeue (size: 8): 12
MP/MC burst enq/dequeue (size: 8): 44
SP/SC burst enq/dequeue (size: 32): 8
MP/MC burst enq/dequeue (size: 32): 16

### Testing empty dequeue ###
SC empty dequeue: 13.01
MC empty dequeue: 15.01

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 11.60
MP/MC bulk enq/dequeue (size: 8): 44.32
SP/SC bulk enq/dequeue (size: 32): 8.60
MP/MC bulk enq/dequeue (size: 32): 16.50

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 20.95
MP/MC bulk enq/dequeue (size: 8): 56.90
SP/SC bulk enq/dequeue (size: 32): 10.90
MP/MC bulk enq/dequeue (size: 32): 18.78
Test OK
RTE>>

# c11 memory model patch((Non OOO arm64 machine)
https://github.com/jerinjacobk/mytests/blob/master/ring/0001-ring-using-c11-memory-model.patch
---------------------------------------------------------------------------------------------
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 197
MP/MC single enq/dequeue: 328
SP/SC burst enq/dequeue (size: 8): 31
MP/MC burst enq/dequeue (size: 8): 50
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 13.01
MC empty dequeue: 18.02

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 30.95
MP/MC bulk enq/dequeue (size: 8): 50.30
SP/SC bulk enq/dequeue (size: 32): 13.27
MP/MC bulk enq/dequeue (size: 32): 18.11

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 43.38
MP/MC bulk enq/dequeue (size: 8): 64.42
SP/SC bulk enq/dequeue (size: 32): 16.71
MP/MC bulk enq/dequeue (size: 32): 22.21


Platform B: Performance numbers:
==============================
#no patch(OOO arm64 machine)
----------------------------

### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 81
MP/MC single enq/dequeue: 207
SP/SC burst enq/dequeue (size: 8): 15
MP/MC burst enq/dequeue (size: 8): 31
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 11

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 5.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 15.38
MP/MC bulk enq/dequeue (size: 8): 30.64
SP/SC bulk enq/dequeue (size: 32): 7.25
MP/MC bulk enq/dequeue (size: 32): 11.06

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.51
MP/MC bulk enq/dequeue (size: 8): 49.38
SP/SC bulk enq/dequeue (size: 32): 14.32
MP/MC bulk enq/dequeue (size: 32): 15.89

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 72.66
MP/MC bulk enq/dequeue (size: 8): 121.89
SP/SC bulk enq/dequeue (size: 32): 16.88
MP/MC bulk enq/dequeue (size: 32): 24.23
Test OK
RTE>>


# smp_rmb() patch((OOO arm64 machine)
http://dpdk.org/dev/patchwork/patch/30029/
-------------------------------------------

### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 152
MP/MC single enq/dequeue: 265
SP/SC burst enq/dequeue (size: 8): 24
MP/MC burst enq/dequeue (size: 8): 39
SP/SC burst enq/dequeue (size: 32): 9
MP/MC burst enq/dequeue (size: 32): 13

### Testing empty dequeue ###
SC empty dequeue: 31.01
MC empty dequeue: 32.01

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 24.26
MP/MC bulk enq/dequeue (size: 8): 39.52
SP/SC bulk enq/dequeue (size: 32): 9.47
MP/MC bulk enq/dequeue (size: 32): 13.31

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 40.29
MP/MC bulk enq/dequeue (size: 8): 59.57
SP/SC bulk enq/dequeue (size: 32): 17.34
MP/MC bulk enq/dequeue (size: 32): 21.58

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 79.05
MP/MC bulk enq/dequeue (size: 8): 153.46
SP/SC bulk enq/dequeue (size: 32): 26.41
MP/MC bulk enq/dequeue (size: 32): 38.37
Test OK
RTE>>


# c11 memory model patch((OOO arm64 machine)
https://github.com/jerinjacobk/mytests/blob/master/ring/0001-ring-using-c11-memory-model.patch
----------------------------------------------------------------------------------------------
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 98
MP/MC single enq/dequeue: 130
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 22
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 9

### Testing empty dequeue ###
SC empty dequeue: 4.00
MC empty dequeue: 5.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.40
MP/MC bulk enq/dequeue (size: 8): 22.88
SP/SC bulk enq/dequeue (size: 32): 7.62
MP/MC bulk enq/dequeue (size: 32): 8.96

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 20.24
MP/MC bulk enq/dequeue (size: 8): 25.83
SP/SC bulk enq/dequeue (size: 32): 12.21
MP/MC bulk enq/dequeue (size: 32): 13.20

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 67.54
MP/MC bulk enq/dequeue (size: 8): 124.63
SP/SC bulk enq/dequeue (size: 32): 21.13
MP/MC bulk enq/dequeue (size: 32): 28.44
Test OK
RTE>>quit


> or find any other performance test case to compare the performance impact?

As far as I know, ring_perf_autotest is the better performance test.
If you have trouble in using "High-resolution cycle counter" in your platform then also
you can use ring_perf_auto test to compare the performance(as relative
number matters)

Jerin

> Thanks for any suggestions.
> 
> Cheers,
> Jia

  reply	other threads:[~2017-10-31 11:15 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-10  9:56 [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue Jia He
2017-10-12 15:53 ` Olivier MATZ
2017-10-12 16:15   ` Stephen Hemminger
2017-10-12 17:05   ` Ananyev, Konstantin
2017-10-12 17:23     ` Jerin Jacob
2017-10-13  1:02       ` Jia He
2017-10-13  1:15         ` Jia He
2017-10-13  1:16         ` Jia He
2017-10-13  1:49           ` Jerin Jacob
2017-10-13  3:23             ` Jia He
2017-10-13  5:57               ` Zhao, Bing
2017-10-13  7:33             ` Jianbo Liu
2017-10-13  8:20               ` Jia He
2017-10-19 10:02           ` Ananyev, Konstantin
2017-10-19 11:18             ` Zhao, Bing
2017-10-19 14:15               ` Ananyev, Konstantin
2017-10-19 20:02                 ` Ananyev, Konstantin
2017-10-20  1:57                   ` Jia He
2017-10-20  5:43                     ` Jerin Jacob
2017-10-23  8:49                       ` Jia He
2017-10-23  9:05                         ` Kuusisaari, Juhamatti
2017-10-23  9:10                           ` Bruce Richardson
2017-10-23 10:06                         ` Jerin Jacob
2017-10-24  2:04                           ` Jia He
2017-10-25 13:26                             ` Jerin Jacob
2017-10-26  2:27                               ` Jia He
2017-10-31  2:55                               ` Jia He
2017-10-31 11:14                                 ` Jerin Jacob [this message]
2017-11-01  2:53                                   ` Jia He
2017-11-01 19:04                                     ` Jerin Jacob
2017-11-02  1:09                                       ` Jia He
2017-11-02  8:57                                       ` Jia He
2017-11-03  2:55                                         ` Jia He
2017-11-03 12:47                                           ` Jerin Jacob
2017-11-01  4:48                                   ` Jia He
2017-11-01 19:10                                     ` Jerin Jacob
2017-10-20  7:03                     ` Ananyev, Konstantin
2017-10-13  0:24     ` Liu, Jie2
2017-10-13  2:12       ` Zhao, Bing
2017-10-13  2:34         ` Jerin Jacob
2017-10-16 10:51       ` Kuusisaari, Juhamatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171031111433.GA21742@jerin \
    --to=jerin.jacob@caviumnetworks.com \
    --cc=bing.zhao@hxt-semitech.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=hejianet@gmail.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=ilovethull@163.com \
    --cc=jia.he@hxt-semitech.com \
    --cc=jianbo.liu@arm.com \
    --cc=jie2.liu@hxt-semitech.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.