All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Gavin Hu <gavin.hu@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "nd@arm.com" <nd@arm.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
	"nipun.gupta@nxp.com" <nipun.gupta@nxp.com>,
	"Honnappa.Nagarahalli@arm.com" <Honnappa.Nagarahalli@arm.com>,
	"i.maximets@samsung.com" <i.maximets@samsung.com>,
	"chaozhu@linux.vnet.ibm.com" <chaozhu@linux.vnet.ibm.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [EXT] [PATCH v8 3/3] spinlock: reimplement with atomic one-way barrier builtins
Date: Thu, 14 Mar 2019 14:22:31 +0000	[thread overview]
Message-ID: <20190314142224.GA10102@dc5-eodlnx05.marvell.com> (raw)
In-Reply-To: <1552031797-146710-4-git-send-email-gavin.hu@arm.com>

On Fri, Mar 08, 2019 at 03:56:37PM +0800, Gavin Hu wrote:
> External Email
> 
> ----------------------------------------------------------------------
> The __sync builtin based implementation generates full memory barriers
> ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
> barriers.
> 
> Here is the assembly code of __sync_compare_and_swap builtin.
> __sync_bool_compare_and_swap(dst, exp, src);
>    0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
>    0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
>    0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
>    0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
>    0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
>    0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
>    0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
> <rte_atomic16_cmpset+52>  // b.any
>    0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
>    0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
> <rte_atomic16_cmpset+32>
>    0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
>    0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none
> 
> The benchmarking results showed constant improvements on all available
> platforms:
> 1. Cavium ThunderX2: 126% performance;
> 2. Hisilicon 1616: 30%;
> 3. Qualcomm Falkor: 13%;
> 4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%
> 
> Here is the example test result on TX2:
> $sudo ./build/app/test -l 16-27 -- i
> RTE>>spinlock_autotest
> 
> *** spinlock_autotest without this patch ***
> Test with lock on 12 cores...
> Core [16] Cost Time = 53886 us
> Core [17] Cost Time = 53605 us
> Core [18] Cost Time = 53163 us
> Core [19] Cost Time = 49419 us
> Core [20] Cost Time = 34317 us
> Core [21] Cost Time = 53408 us
> Core [22] Cost Time = 53970 us
> Core [23] Cost Time = 53930 us
> Core [24] Cost Time = 53283 us
> Core [25] Cost Time = 51504 us
> Core [26] Cost Time = 50718 us
> Core [27] Cost Time = 51730 us
> Total Cost Time = 612933 us
> 
> *** spinlock_autotest with this patch ***
> Test with lock on 12 cores...
> Core [16] Cost Time = 18808 us
> Core [17] Cost Time = 29497 us
> Core [18] Cost Time = 29132 us
> Core [19] Cost Time = 26150 us
> Core [20] Cost Time = 21892 us
> Core [21] Cost Time = 24377 us
> Core [22] Cost Time = 27211 us
> Core [23] Cost Time = 11070 us
> Core [24] Cost Time = 29802 us
> Core [25] Cost Time = 15793 us
> Core [26] Cost Time = 7474 us
> Core [27] Cost Time = 29550 us
> Total Cost Time = 270756 us
> 
> In the tests on ThunderX2, with more cores contending, the performance gain
> was even higher, indicating the __atomic implementation scales up better
> than __sync.
> 
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Reviewed-by: Steve Capper <Steve.Capper@arm.com>

Reviewed-by: Jerin Jacob <jerinj@marvell.com>

      parent reply	other threads:[~2019-03-14 14:22 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 10:42 [PATCH v1 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 10:42 ` [PATCH v1 1/5] test/spinlock: remove 1us delay for correct spinlock benchmarking Gavin Hu
2018-12-20 10:42 ` [PATCH v1 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 10:42 ` [PATCH v1 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 10:42 ` [PATCH v1 4/5] spinlock: move the implementation to arm specific file Gavin Hu
2018-12-20 12:47   ` David Marchand
2018-12-20 12:55     ` David Marchand
2018-12-20 14:40       ` Gavin Hu (Arm Technology China)
2018-12-20 14:36     ` Gavin Hu (Arm Technology China)
2018-12-20 15:09       ` David Marchand
2018-12-20 15:58         ` Gavin Hu (Arm Technology China)
2018-12-20 15:59           ` David Marchand
2018-12-20 10:42 ` [PATCH v1 5/5] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-20 17:42 ` [PATCH v2 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 17:42   ` [PATCH v2 1/5] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2018-12-20 17:42   ` [PATCH v2 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 17:42   ` [PATCH v2 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 17:42   ` [PATCH v2 4/5] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-20 17:42   ` [PATCH v2 5/5] eal: fix clang compilation error on x86 Gavin Hu
2019-01-15  7:54 ` [PATCH v4 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15  7:54 ` [PATCH v4 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15  7:54 ` [PATCH v4 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15  7:54 ` [PATCH v4 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15  7:54 ` [PATCH v4 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-01-15 10:32 ` [PATCH v5 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15 10:32 ` [PATCH v5 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15 17:42   ` Honnappa Nagarahalli
2019-01-15 10:32 ` [PATCH v5 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15 10:32 ` [PATCH v5 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15 10:32 ` [PATCH v5 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-03-08  7:16 ` [PATCH v6 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:16 ` [PATCH v6 1/3] test/spinlock: dealy 1 us to create contention Gavin Hu
2019-03-08  7:16 ` [PATCH v6 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:16 ` [PATCH v6 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:37 ` [PATCH v7 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:37 ` [PATCH v7 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:37 ` [PATCH v7 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:37 ` [PATCH v7 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:56 ` [PATCH v8 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-11 12:21   ` Nipun Gupta
2019-03-15 12:21   ` Ananyev, Konstantin
2019-03-28  7:47   ` Thomas Monjalon
2019-03-08  7:56 ` [PATCH v8 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:56 ` [PATCH v8 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:56 ` [PATCH v8 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-12 14:53   ` [EXT] " Jerin Jacob Kollanukkaran
2019-03-14  0:31     ` Honnappa Nagarahalli
2019-03-14  2:36       ` Gavin Hu (Arm Technology China)
2019-03-14 14:22   ` Jerin Jacob Kollanukkaran [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190314142224.GA10102@dc5-eodlnx05.marvell.com \
    --to=jerinj@marvell.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    --cc=gavin.hu@arm.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=i.maximets@samsung.com \
    --cc=nd@arm.com \
    --cc=nipun.gupta@nxp.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.