All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Hu <gavin.hu@arm.com>
To: dev@dpdk.org
Cc: nd@arm.com, thomas@monjalon.net, jerinj@marvell.com,
	hemant.agrawal@nxp.com, nipun.gupta@nxp.com,
	Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com,
	i.maximets@samsung.com, chaozhu@linux.vnet.ibm.com,
	stable@dpdk.org
Subject: [PATCH v8 3/3] spinlock: reimplement with atomic one-way barrier builtins
Date: Fri,  8 Mar 2019 15:56:37 +0800	[thread overview]
Message-ID: <1552031797-146710-4-git-send-email-gavin.hu@arm.com> (raw)
In-Reply-To: <1552031797-146710-1-git-send-email-gavin.hu@arm.com>
In-Reply-To: <20181220104246.5590-1-gavin.hu@arm.com>

The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Reviewed-by: Steve Capper <Steve.Capper@arm.com>
---
 lib/librte_eal/common/include/generic/rte_spinlock.h | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
index c4c3fc3..87ae7a4 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -61,9 +61,14 @@ rte_spinlock_lock(rte_spinlock_t *sl);
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {
-	while (__sync_lock_test_and_set(&sl->locked, 1))
-		while(sl->locked)
+	int exp = 0;
+
+	while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
 			rte_pause();
+		exp = 0;
+	}
 }
 #endif
 
@@ -80,7 +85,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl);
 static inline void
 rte_spinlock_unlock (rte_spinlock_t *sl)
 {
-	__sync_lock_release(&sl->locked);
+	__atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
 }
 #endif
 
@@ -99,7 +104,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
 static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl)
 {
-	return __sync_lock_test_and_set(&sl->locked,1) == 0;
+	int exp = 0;
+	return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+				0, /* disallow spurious failure */
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
 }
 #endif
 
@@ -113,7 +121,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
  */
 static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
 {
-	return sl->locked;
+	return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE);
 }
 
 /**
-- 
2.7.4

  parent reply	other threads:[~2019-03-08  7:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 10:42 [PATCH v1 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 10:42 ` [PATCH v1 1/5] test/spinlock: remove 1us delay for correct spinlock benchmarking Gavin Hu
2018-12-20 10:42 ` [PATCH v1 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 10:42 ` [PATCH v1 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 10:42 ` [PATCH v1 4/5] spinlock: move the implementation to arm specific file Gavin Hu
2018-12-20 12:47   ` David Marchand
2018-12-20 12:55     ` David Marchand
2018-12-20 14:40       ` Gavin Hu (Arm Technology China)
2018-12-20 14:36     ` Gavin Hu (Arm Technology China)
2018-12-20 15:09       ` David Marchand
2018-12-20 15:58         ` Gavin Hu (Arm Technology China)
2018-12-20 15:59           ` David Marchand
2018-12-20 10:42 ` [PATCH v1 5/5] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-20 17:42 ` [PATCH v2 0/5] spinlock optimization and test case enhancements Gavin Hu
2018-12-20 17:42   ` [PATCH v2 1/5] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2018-12-20 17:42   ` [PATCH v2 2/5] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-20 17:42   ` [PATCH v2 3/5] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-20 17:42   ` [PATCH v2 4/5] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-20 17:42   ` [PATCH v2 5/5] eal: fix clang compilation error on x86 Gavin Hu
2019-01-15  7:54 ` [PATCH v4 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15  7:54 ` [PATCH v4 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15  7:54 ` [PATCH v4 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15  7:54 ` [PATCH v4 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15  7:54 ` [PATCH v4 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-01-15 10:32 ` [PATCH v5 0/4] spinlock optimization and test case enhancements gavin hu
2019-01-15 10:32 ` [PATCH v5 1/4] eal: fix clang compilation error on x86 gavin hu
2019-01-15 17:42   ` Honnappa Nagarahalli
2019-01-15 10:32 ` [PATCH v5 2/4] test/spinlock: remove 1us delay for correct benchmarking gavin hu
2019-01-15 10:32 ` [PATCH v5 3/4] test/spinlock: amortize the cost of getting time gavin hu
2019-01-15 10:32 ` [PATCH v5 4/4] spinlock: reimplement with atomic one-way barrier builtins gavin hu
2019-03-08  7:16 ` [PATCH v6 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:16 ` [PATCH v6 1/3] test/spinlock: dealy 1 us to create contention Gavin Hu
2019-03-08  7:16 ` [PATCH v6 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:16 ` [PATCH v6 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:37 ` [PATCH v7 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-08  7:37 ` [PATCH v7 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:37 ` [PATCH v7 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:37 ` [PATCH v7 3/3] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2019-03-08  7:56 ` [PATCH v8 0/3] generic spinlock optimization and test case enhancements Gavin Hu
2019-03-11 12:21   ` Nipun Gupta
2019-03-15 12:21   ` Ananyev, Konstantin
2019-03-28  7:47   ` Thomas Monjalon
2019-03-08  7:56 ` [PATCH v8 1/3] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2019-03-08  7:56 ` [PATCH v8 2/3] test/spinlock: amortize the cost of getting time Gavin Hu
2019-03-08  7:56 ` Gavin Hu [this message]
2019-03-12 14:53   ` [EXT] [PATCH v8 3/3] spinlock: reimplement with atomic one-way barrier builtins Jerin Jacob Kollanukkaran
2019-03-14  0:31     ` Honnappa Nagarahalli
2019-03-14  2:36       ` Gavin Hu (Arm Technology China)
2019-03-14 14:22   ` Jerin Jacob Kollanukkaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1552031797-146710-4-git-send-email-gavin.hu@arm.com \
    --to=gavin.hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    --cc=hemant.agrawal@nxp.com \
    --cc=i.maximets@samsung.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=nipun.gupta@nxp.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.