All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Hu <gavin.hu@arm.com>
To: dev@dpdk.org
Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com,
	bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
	nd@arm.com, Honnappa.Nagarahalli@arm.com,
	Gavin Hu <gavin.hu@arm.com>
Subject: [PATCH v1 5/5] spinlock: reimplement with atomic one-way barrier builtins
Date: Thu, 20 Dec 2018 18:42:46 +0800	[thread overview]
Message-ID: <20181220104246.5590-6-gavin.hu@arm.com> (raw)
In-Reply-To: <20181220104246.5590-1-gavin.hu@arm.com>

The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed 3X performance gain on Cavium ThunderX2 and
13% on Qualcomm Falmon and 3.7% on 4-A72 Marvell macchiatobin.
Here is the example test result on TX2:

*** spinlock_autotest without this patch ***
Core [123] Cost Time = 639822 us
Core [124] Cost Time = 633253 us
Core [125] Cost Time = 646030 us
Core [126] Cost Time = 643189 us
Core [127] Cost Time = 647039 us
Total Cost Time = 95433298 us

*** spinlock_autotest with this patch ***
Core [123] Cost Time = 163615 us
Core [124] Cost Time = 166471 us
Core [125] Cost Time = 189044 us
Core [126] Cost Time = 195745 us
Core [127] Cost Time = 78423 us
Total Cost Time = 27339656 us

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Reviewed-by: Steve Capper <Steve.Capper@arm.com>
---
 lib/librte_eal/common/include/arch/arm/rte_spinlock.h | 16 ++++++++++++----
 lib/librte_eal/common/include/generic/rte_spinlock.h  |  2 +-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
index 25d22fd1e..33a6d382f 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
@@ -19,21 +19,29 @@ extern "C" {
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {
-	while (__sync_lock_test_and_set(&sl->locked, 1))
-		while (sl->locked)
+	int exp = 0;
+
+	while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 1,
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
 			rte_pause();
+		exp = 0;
+	}
 }
 
 static inline void
 rte_spinlock_unlock(rte_spinlock_t *sl)
 {
-	__sync_lock_release(&sl->locked);
+	__atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
 }
 
 static inline int
 rte_spinlock_trylock(rte_spinlock_t *sl)
 {
-	return __sync_lock_test_and_set(&sl->locked, 1) == 0;
+	int exp = 0;
+	return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+				false, /* disallow spurious failure */
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
 }
 
 static inline int rte_tm_supported(void)
diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
index e555ecb95..a8cc75adc 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -87,7 +87,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
  */
 static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
 {
-	return sl->locked;
+	return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE);
 }
 
 /**
-- 
2.11.0

  parent reply	other threads:[~2018-12-20 10:43 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 10:42 [PATCH v1 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 10:42 ` [PATCH v1 1/5] test/spinlock: remove 1us delay for correct spinlock benchmarking Gavin Hu
2018-12-20 10:42 ` [PATCH v1 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 10:42 ` [PATCH v1 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 10:42 ` [PATCH v1 4/5] spinlock: move the implementation to arm specific file Gavin Hu
2018-12-20 12:47   ` David Marchand
2018-12-20 12:55     ` David Marchand
2018-12-20 14:40       ` Gavin Hu (Arm Technology China)
2018-12-20 14:36     ` Gavin Hu (Arm Technology China)
2018-12-20 15:09       ` David Marchand
2018-12-20 15:58         ` Gavin Hu (Arm Technology China)
2018-12-20 15:59           ` David Marchand
2018-12-20 10:42 ` Gavin Hu [this message]
2018-12-20 17:42 ` [PATCH v2 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 17:42   ` [PATCH v2 1/5] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2018-12-20 17:42   ` [PATCH v2 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 17:42   ` [PATCH v2 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 17:42   ` [PATCH v2 4/5] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-20 17:42   ` [PATCH v2 5/5] eal: fix clang compilation error on x86 Gavin Hu
2019-01-15  7:54 ` [PATCH v4 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15  7:54 ` [PATCH v4 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15  7:54 ` [PATCH v4 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15  7:54 ` [PATCH v4 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15  7:54 ` [PATCH v4 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-01-15 10:32 ` [PATCH v5 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15 10:32 ` [PATCH v5 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15 17:42   ` Honnappa Nagarahalli
2019-01-15 10:32 ` [PATCH v5 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15 10:32 ` [PATCH v5 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15 10:32 ` [PATCH v5 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-03-08  7:16 ` [PATCH v6 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:16 ` [PATCH v6 1/3] test/spinlock: dealy 1 us to create contention Gavin Hu
2019-03-08  7:16 ` [PATCH v6 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:16 ` [PATCH v6 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:37 ` [PATCH v7 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:37 ` [PATCH v7 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:37 ` [PATCH v7 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:37 ` [PATCH v7 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:56 ` [PATCH v8 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-11 12:21   ` Nipun Gupta
2019-03-15 12:21   ` Ananyev, Konstantin
2019-03-28  7:47   ` Thomas Monjalon
2019-03-08  7:56 ` [PATCH v8 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:56 ` [PATCH v8 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:56 ` [PATCH v8 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-12 14:53   ` [EXT] " Jerin Jacob Kollanukkaran
2019-03-14  0:31     ` Honnappa Nagarahalli
2019-03-14  2:36       ` Gavin Hu (Arm Technology China)
2019-03-14 14:22   ` Jerin Jacob Kollanukkaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181220104246.5590-6-gavin.hu@arm.com \
    --to=gavin.hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.