All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] use compiler atomic builtins for app modules
@ 2021-11-16  9:41 Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong

Since atomic operations have been adopted in DPDK now[1],
change rte_atomicNN_xxx APIs to compiler's atomic built-ins
in app modules[2].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html

v2:
  By Honnappa Nagarahalli:
  1. Replace the RELAXED barriers with suitable ones for shared
     data sync in pmd_perf and timer test cases.
  2. Avoid unnecessary atomic operations in compress and testpmd
     modules.
  3. Fix some typo.

Joyce Kong (12):
  test/pmd_perf: use compiler atomic builtins for polling sync
  test/ring_perf: use compiler atomic builtins for lcores sync
  test/timer: use compiler atomic builtins for sync
  test/stack_perf: use compiler atomics for lcore sync
  test/bpf: use compiler atomics for calculation
  test/func_reentrancy: use compiler atomics for data sync
  app/eventdev: use compiler atomics for shared data sync
  app/crypto: use compiler atomic builtins for display sync
  app/compress: use compiler atomic builtins for display sync
  app/testpmd: remove atomic operations for port status
  app/bbdev: use compiler atomics for shared data sync
  app: remove unnecessary include of atomic header file

 app/proc-info/main.c                          |   1 -
 app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
 .../comp_perf_test_common.h                   |   2 +-
 .../comp_perf_test_cyclecount.c               |  15 +-
 .../comp_perf_test_throughput.c               |  10 +-
 .../comp_perf_test_verify.c                   |   6 +-
 app/test-crypto-perf/cperf_test_latency.c     |   6 +-
 .../cperf_test_pmd_cyclecount.c               |   9 +-
 app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
 app/test-crypto-perf/cperf_test_verify.c      |   9 +-
 app/test-eventdev/evt_main.c                  |   1 -
 app/test-eventdev/test_order_atq.c            |   4 +-
 app/test-eventdev/test_order_common.c         |   4 +-
 app/test-eventdev/test_order_common.h         |   8 +-
 app/test-eventdev/test_order_queue.c          |   4 +-
 app/test-pipeline/config.c                    |   1 -
 app/test-pipeline/init.c                      |   1 -
 app/test-pipeline/main.c                      |   1 -
 app/test-pipeline/runtime.c                   |   1 -
 app/test-pmd/cmdline.c                        |   1 -
 app/test-pmd/config.c                         |   1 -
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/icmpecho.c                       |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/parameters.c                     |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |  58 ++++----
 app/test-pmd/txonly.c                         |   1 -
 app/test/test_barrier.c                       |   1 -
 app/test/test_bpf.c                           |  28 ++--
 app/test/test_func_reentrancy.c               |  27 ++--
 app/test/test_mbuf.c                          |   1 -
 app/test/test_mp_secondary.c                  |   1 -
 app/test/test_pmd_perf.c                      |  14 +-
 app/test/test_ring.c                          |   1 -
 app/test/test_ring_perf.c                     |   9 +-
 app/test/test_stack_perf.c                    |  14 +-
 app/test/test_timer.c                         |  30 ++--
 app/test/test_timer_secondary.c               |   1 -
 42 files changed, 197 insertions(+), 226 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16 21:30   ` Honnappa Nagarahalli
  2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for polling sync in pmd_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_pmd_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 1df86ce080..546384a50d 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -10,7 +10,6 @@
 #include <rte_cycles.h>
 #include <rte_ethdev.h>
 #include <rte_byteorder.h>
-#include <rte_atomic.h>
 #include <rte_malloc.h>
 #include "packet_burst_generator.h"
 #include "test.h"
@@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
 	return 0;
 }
 
-static rte_atomic64_t start;
+static uint64_t start;
 
 static inline int
 poll_burst(void *args)
@@ -563,8 +562,7 @@ poll_burst(void *args)
 		num[portid] = pkt_per_port;
 	}
 
-	while (!rte_atomic64_read(&start))
-		;
+	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
 
 	cur_tsc = rte_rdtsc();
 	while (total) {
@@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore)
 	pkt_per_port = MAX_TRAFFIC_BURST;
 	num = pkt_per_port * conf->nb_ports;
 
-	rte_atomic64_init(&start);
-
 	/* start polling thread, but not actually poll yet */
 	rte_eal_remote_launch(poll_burst,
 			      (void *)&pkt_per_port, lcore);
 
 	/* Only when polling first */
 	if (flags == SC_BURST_POLL_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
+	else
+		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
 
 	/* start xmit */
 	i = 0;
@@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* only when polling second  */
 	if (flags == SC_BURST_XMIT_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
 
 	/* wait for polling finished */
 	diff_tsc = rte_eal_wait_lcore(lcore);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcores sync in ring_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring_perf.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index fd82e20412..2d8bb675a3 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 	return 0;
 }
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 static uint64_t queue_count[RTE_MAX_LCORE];
 
 #define TIME_MS 100
@@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			rte_pause();
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
@@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize)
 		param.r = r;
 
 		/* clear synchro and start workers */
-		rte_atomic32_set(&synchro, 0);
+		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN) < 0)
 			return -1;
 
 		/* start synchro and launch test on main */
-		rte_atomic32_set(&synchro, 1);
+		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16 19:52   ` Honnappa Nagarahalli
  2021-11-16 20:20   ` David Marchand
  2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic
built-ins for lcore_state and collisions sync.

Also, move 'main_init_workers' outside of
'timer_stress2_main_loop' to guarantee lcore_state
initialized correctly before the threads launched.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_timer.c           | 30 +++++++++++++-----------------
 app/test/test_timer_secondary.c |  1 -
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/app/test/test_timer.c b/app/test/test_timer.c
index a10b2fe9da..c97e5c891c 100644
--- a/app/test/test_timer.c
+++ b/app/test/test_timer.c
@@ -102,7 +102,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_random.h>
 #include <rte_malloc.h>
@@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
 
 /* Need to synchronize worker lcores through multiple steps. */
 enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED };
-static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
+static uint16_t lcore_state[RTE_MAX_LCORE];
 
 static void
 main_init_workers(void)
@@ -211,7 +210,7 @@ main_init_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
+		__atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -221,11 +220,10 @@ main_start_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
+		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE);
 	}
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -235,8 +233,7 @@ main_wait_for_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -245,9 +242,8 @@ worker_wait_to_start(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL)
-		rte_pause();
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
+	rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE);
 }
 
 static void
@@ -255,7 +251,7 @@ worker_finish(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE);
 }
 
 
@@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
 	unsigned int lcore_id = rte_lcore_id();
 	unsigned int main_lcore = rte_get_main_lcore();
 	int32_t my_collisions = 0;
-	static rte_atomic32_t collisions;
+	static uint32_t collisions;
 
 	if (lcore_id == main_lcore) {
 		cb_count = 0;
 		test_failed = 0;
-		rte_atomic32_set(&collisions, 0);
-		main_init_workers();
+		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
 		timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0);
 		if (timers == NULL) {
 			printf("Test Failed\n");
@@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 			my_collisions++;
 	}
 	if (my_collisions != 0)
-		rte_atomic32_add(&collisions, my_collisions);
+		__atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED);
 
 	/* wait long enough for timers to expire */
 	rte_delay_ms(100);
@@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 
 	/* now check that we get the right number of callbacks */
 	if (lcore_id == main_lcore) {
-		my_collisions = rte_atomic32_read(&collisions);
+		my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED);
 		if (my_collisions != 0)
 			printf("- %d timer reset collisions (OK)\n", my_collisions);
 		rte_timer_manage();
@@ -573,6 +568,7 @@ test_timer(void)
 	/* run a second, slightly different set of stress tests */
 	printf("\nStart timer stress tests 2\n");
 	test_failed = 0;
+	main_init_workers();
 	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN);
 	rte_eal_mp_wait_lcore();
 	if (test_failed)
diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
index 16a9f1878b..5795c97f07 100644
--- a/app/test/test_timer_secondary.c
+++ b/app/test/test_timer_secondary.c
@@ -9,7 +9,6 @@
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_memzone.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_cycles.h>
 #include <rte_mempool.h>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (2 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcore sync in stack_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_stack_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c
index 4ee40d5d19..1eae00a334 100644
--- a/app/test/test_stack_perf.c
+++ b/app/test/test_stack_perf.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <inttypes.h>
 
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_launch.h>
 #include <rte_pause.h>
@@ -24,7 +23,7 @@
  */
 static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
 
-static rte_atomic32_t lcore_barrier;
+static uint32_t lcore_barrier;
 
 struct lcore_pair {
 	unsigned int c1;
@@ -144,9 +143,8 @@ bulk_push_pop(void *p)
 	s = args->s;
 	size = args->sz;
 
-	rte_atomic32_sub(&lcore_barrier, 1);
-	while (rte_atomic32_read(&lcore_barrier) != 0)
-		rte_pause();
+	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	uint64_t start = rte_rdtsc();
 
@@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s,
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
-		rte_atomic32_set(&lcore_barrier, 2);
+		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
 
 		args[0].sz = args[1].sz = bulk_sizes[i];
 		args[0].s = args[1].s = s;
@@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n)
 		int cnt = 0;
 		double avg;
 
-		rte_atomic32_set(&lcore_barrier, n);
+		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
 
 		RTE_LCORE_FOREACH_WORKER(lcore_id) {
 			if (++cnt >= n)
@@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
 	struct lcore_pair cores;
 	struct rte_stack *s;
 
-	rte_atomic32_init(&lcore_barrier);
+	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 05/12] test/bpf: use compiler atomics for calculation
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (3 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for calculation in bpf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_bpf.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e3e9a1b0b5..b8be1e3d30 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg)
 	memset(&dfe, 0, sizeof(dfe));
 
 	rv = 1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = -1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = (int32_t)TEST_FILL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_3;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (4 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko, Bruce Richardson,
	Vladimir Medvedkin, Yipeng Wang, Sameh Gobriel, Anatoly Burakov,
	Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in func_reentrancy test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 838ab6f0f9..7825c6cb86 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
@@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
 
 #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
 
-static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
-static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
+static uint32_t obj_count;
+static uint32_t synchro;
 
 #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
 	if (lcore_self != rte_get_main_lcore())                  \
-		while (rte_atomic32_read(&synchro) == 0);        \
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
 } while(0)
 
 /*
@@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
 
 	WAIT_SYNCHRO_FOR_WORKERS();
 
-	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
+	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */
 	if (rte_eal_init(0, NULL) != -1)
 		return -1;
 
@@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
 		if (rp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg)
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
 		if (mp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_hash_create(&hash_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple times simultaneously */
@@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_fbk_hash_create(&fbk_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, &config);
 		if (lpm != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case)
 	if (pt_case->func == NULL)
 		return -1;
 
-	rte_atomic32_set(&obj_count, 0);
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	cores = RTE_MIN(rte_lcore_count(), MAX_LCORES);
 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
@@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case)
 		rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
 	}
 
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	if (pt_case->func(pt_case->arg) < 0)
 		ret = -1;
@@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case)
 			pt_case->clean(lcore_id);
 	}
 
-	count = rte_atomic32_read(&obj_count);
+	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
 	if (count != 1) {
 		printf("%s: common object allocated %d times (should be 1)\n",
 			pt_case->name, count);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 07/12] app/eventdev: use compiler atomics for shared data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (5 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in eventdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/evt_main.c          | 1 -
 app/test-eventdev/test_order_atq.c    | 4 ++--
 app/test-eventdev/test_order_common.c | 4 ++--
 app/test-eventdev/test_order_common.h | 8 ++++----
 app/test-eventdev/test_order_queue.c  | 4 ++--
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index 3534aabca7..194c980c7a 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -6,7 +6,6 @@
 #include <unistd.h>
 #include <signal.h>
 
-#include <rte_atomic.h>
 #include <rte_debug.h>
 #include <rte_eal.h>
 #include <rte_eventdev.h>
diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c
index 71215a07b6..2fee4b4daa 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c
index d7760061ba..ff7813f9c2 100644
--- a/app/test-eventdev/test_order_common.c
+++ b/app/test-eventdev/test_order_common.c
@@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt)
 		evt_err("failed to allocate t->expected_flow_seq memory");
 		goto exp_nomem;
 	}
-	rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts);
+	__atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED);
 	t->err = false;
 	t->nb_pkts = opt->nb_pkts;
 	t->nb_flows = opt->nb_flows;
@@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt,
 
 	while (t->err == false) {
 		uint64_t new_cycles = rte_get_timer_cycles();
-		int64_t remaining = rte_atomic64_read(&t->outstand_pkts);
+		int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED);
 
 		if (remaining <= 0) {
 			t->result = EVT_TEST_SUCCESS;
diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h
index cd9d6009ec..92781d9587 100644
--- a/app/test-eventdev/test_order_common.h
+++ b/app/test-eventdev/test_order_common.h
@@ -48,7 +48,7 @@ struct test_order {
 	 * The atomic_* is an expensive operation,Since it is a functional test,
 	 * We are using the atomic_ operation to reduce the code complexity.
 	 */
-	rte_atomic64_t outstand_pkts;
+	uint64_t outstand_pkts;
 	enum evt_test_result result;
 	uint32_t nb_flows;
 	uint64_t nb_pkts;
@@ -95,7 +95,7 @@ static __rte_always_inline void
 order_process_stage_1(struct test_order *const t,
 		struct rte_event *const ev, const uint32_t nb_flows,
 		uint32_t *const expected_flow_seq,
-		rte_atomic64_t *const outstand_pkts)
+		uint64_t *const outstand_pkts)
 {
 	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
 	/* compare the seqn against expected value */
@@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t,
 	 */
 	expected_flow_seq[flow]++;
 	rte_pktmbuf_free(ev->mbuf);
-	rte_atomic64_sub(outstand_pkts, 1);
+	__atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline void
@@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t,
 	const uint8_t port = w->port_id;\
 	const uint32_t nb_flows = t->nb_flows;\
 	uint32_t *expected_flow_seq = t->expected_flow_seq;\
-	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
+	uint64_t *outstand_pkts = &t->outstand_pkts;\
 	if (opt->verbose_level > 1)\
 		printf("%s(): lcore %d dev_id %d port=%d\n",\
 			__func__, rte_lcore_id(), dev_id, port)
diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c
index 621367805a..80eaea5cf5 100644
--- a/app/test-eventdev/test_order_queue.c
+++ b/app/test-eventdev/test_order_queue.c
@@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (6 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Declan Doherty, Ciara Power
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync in crypto cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
 app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
 app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index 69f55de50a..ce49feaba9 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	if (ctx == NULL)
 		return 0;
@@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg)
 		time_max = tunit*(double)(tsc_max) / tsc_hz;
 		time_min = tunit*(double)(tsc_min) / tsc_hz;
 
+		uint16_t exp = 0;
 		if (ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, "
 						"cycles, time (us)");
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index fda97e8ab9..ba1f104f72 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 	state.lcore = rte_lcore_id();
 	state.linearize = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static bool warmup = true;
 
 	/*
@@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 			continue;
 		}
 
+		uint16_t exp = 0;
 		if (!opts->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(PRETTY_HDR_FMT, "lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
@@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 					state.cycles_per_enq,
 					state.cycles_per_deq);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 739ed9e573..51512af2ad 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	struct rte_crypto_op *ops[ctx->options->max_burst_size];
 	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
@@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx)
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
 
+		uint16_t exp = 0;
 		if (!ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 					"lcore id", "Buf Size", "Burst Size",
 					"Enqueued", "Dequeued", "Failed Enq",
@@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx)
 					throughput_gbps,
 					cycles_per_packet);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("#lcore id,Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 1962438034..496eb0de00 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
 	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 	uint64_t ops_failed = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	uint64_t i;
 	uint16_t ops_unused = 0;
@@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
 		ops_deqd_total += ops_deqd;
 	}
 
+	uint16_t exp = 0;
 	if (!ctx->options->csv) {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 				"lcore id", "Buf Size", "Burst size",
 				"Enqueued", "Dequeued", "Failed Enq",
@@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
 				ops_deqd_failed,
 				ops_failed);
 	} else {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("\n# lcore id, Buffer Size(B), "
 				"Burst Size,Enqueued,Dequeued,Failed Enq,"
 				"Failed Deq,Failed Ops\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (7 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 20:15   ` Honnappa Nagarahalli
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-compress-perf/comp_perf_test_common.h    |  2 +-
 .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
 .../comp_perf_test_throughput.c                   | 10 +++++++---
 app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h
index 72705c6a2b..d039e5a29a 100644
--- a/app/test-compress-perf/comp_perf_test_common.h
+++ b/app/test-compress-perf/comp_perf_test_common.h
@@ -14,7 +14,7 @@ struct cperf_mem_resources {
 	uint16_t qp_id;
 	uint8_t lcore_id;
 
-	rte_atomic16_t print_info_once;
+	uint16_t print_info_once;
 
 	uint32_t total_bufs;
 	uint8_t *compressed_data;
diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c
index c875ddbdac..da55b02b74 100644
--- a/app/test-compress-perf/comp_perf_test_cyclecount.c
+++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
@@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	struct cperf_cyclecount_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static rte_spinlock_t print_spinlock;
 	int i;
 
@@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED,  __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			(ctx->ver.mem.total_bufs * test_data->num_iter);
 
 	/* R E P O R T processing */
-	if (rte_atomic16_test_and_set(&display_once)) {
+	rte_spinlock_lock(&print_spinlock);
 
-		rte_spinlock_lock(&print_spinlock);
+	if (display_once == 0) {
+		display_once = 1;
 
 		printf("\nLegend for the table\n"
 		"  - Retries section: number of retries for the following operations:\n"
@@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			"setup/op",
 			"[C-e]", "[C-d]",
 			"[D-e]", "[D-d]");
-
-		rte_spinlock_unlock(&print_spinlock);
 	}
 
-	rte_spinlock_lock(&print_spinlock);
-
 	printf("%12u"
 	       "%6u"
 	       "%12zu"
diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c
index 13922b658c..d3dff070b0 100644
--- a/app/test-compress-perf/comp_perf_test_throughput.c
+++ b/app/test-compress-perf/comp_perf_test_throughput.c
@@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
 	struct cperf_benchmark_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	int i, ret = EXIT_SUCCESS;
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
 	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
 			1000000000;
 
-	if (rte_atomic16_test_and_set(&display_once)) {
+	exp = 0;
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 		printf("\n%12s%6s%12s%17s%15s%16s\n",
 			"lcore id", "Level", "Comp size", "Comp ratio [%]",
 			"Comp [Gbps]", "Decomp [Gbps]");
diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c
index 5e13257b79..f6e21368e8 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
 	struct cperf_verify_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->options;
 	int ret = EXIT_SUCCESS;
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	uint32_t lcore = rte_lcore_id();
 
 	ctx->mem.lcore_id = lcore;
@@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
 	ctx->ratio = (double) ctx->comp_data_sz /
 			test_data->input_data_sz * 100;
 
+	uint16_t exp = 0;
 	if (!ctx->silent) {
-		if (rte_atomic16_test_and_set(&display_once)) {
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			printf("%12s%6s%12s%17s\n",
 			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 10/12] app/testpmd: remove atomic operations for port status
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (8 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 21:34   ` Honnappa Nagarahalli
  2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

The port_status changes do not need to be handled
atomically, as they are modified during initialization
or through the testpmd prompt instead of multiple
threads.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a66dfb297c..ed472cacd2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -36,7 +36,6 @@
 #include <rte_alarm.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
@@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2729,8 +2728,9 @@ start_port(portid_t pid)
 
 		need_check_link_status = 0;
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED,
-						 RTE_PORT_HANDLING) == 0) {
+		if (port->port_status == RTE_PORT_STOPPED)
+			port->port_status = RTE_PORT_HANDLING;
+		else {
 			fprintf(stderr, "Port %d is now not stopped\n", pi);
 			continue;
 		}
@@ -2766,8 +2766,9 @@ start_port(portid_t pid)
 						     nb_txq + nb_hairpinq,
 						     &(port->dev_conf));
 			if (diag != 0) {
-				if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2828,9 +2829,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup tx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2880,9 +2881,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup rx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2917,16 +2918,18 @@ start_port(portid_t pid)
 				pi, rte_strerror(-diag));
 
 			/* Fail to setup rx queue, return */
-			if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+			if (port->port_status == RTE_PORT_HANDLING)
+				port->port_status = RTE_PORT_STOPPED;
+			else
 				fprintf(stderr,
 					"Port %d can not be set back to stopped\n",
 					pi);
 			continue;
 		}
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STARTED;
+		else
 			fprintf(stderr, "Port %d can not be set into started\n",
 				pi);
 
@@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED,
-						RTE_PORT_HANDLING) == 0)
+		if (port->port_status == RTE_PORT_STARTED)
+			port->port_status = RTE_PORT_HANDLING;
+		else
 			continue;
 
 		if (hairpin_mode & 0xf) {
@@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
 			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
 				pi);
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr, "Port %d can not be set into stopped\n",
 				pi);
 		need_check_link_status = 1;
@@ -3119,8 +3124,7 @@ close_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
+		if (port->port_status == RTE_PORT_CLOSED) {
 			fprintf(stderr, "Port %d is already closed\n", pi);
 			continue;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (9 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in bbdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 7b4529789b..0fa119a502 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -133,7 +133,7 @@ struct test_op_params {
 	uint16_t num_to_process;
 	uint16_t num_lcores;
 	int vector_mask;
-	rte_atomic16_t sync;
+	uint16_t sync;
 	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
 };
 
@@ -148,9 +148,9 @@ struct thread_params {
 	uint8_t iter_count;
 	double iter_average;
 	double bler;
-	rte_atomic16_t nb_dequeued;
-	rte_atomic16_t processing_status;
-	rte_atomic16_t burst_sz;
+	uint16_t nb_dequeued;
+	int16_t processing_status;
+	uint16_t burst_sz;
 	struct test_op_params *op_params;
 	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
 	struct rte_bbdev_enc_op *enc_ops[MAX_BURST];
@@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id,
 	}
 
 	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		printf(
 			"Dequeue interrupt handler called for incorrect event!\n");
 		return;
 	}
 
-	burst_sz = rte_atomic16_read(&tp->burst_sz);
+	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
 	num_ops = tp->op_params->num_to_process;
 
 	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
 		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
 		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
 		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else /*RTE_BBDEV_OP_TURBO_ENC*/
 		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 
 	if (deq < burst_sz) {
 		printf(
 			"After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n",
 			burst_sz, deq);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
-	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
-		rte_atomic16_add(&tp->nb_dequeued, deq);
+	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) {
+		__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id,
 
 	if (ret) {
 		printf("Buffers validation failed\n");
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 	}
 
 	switch (test_vector.op_type) {
@@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id,
 		break;
 	default:
 		printf("Unknown op type: %d\n", test_vector.op_type);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id,
 	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
 			((double)total_time / (double)rte_get_tsc_hz());
 
-	rte_atomic16_add(&tp->nb_dequeued, deq);
+	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 }
 
 static int
@@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad,
 	else
 		return TEST_SKIPPED;
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = bler_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad,
 			throughput_function = throughput_pmd_lcore_enc;
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = throughput_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad,
 	 * Wait for main lcore operations.
 	 */
 	tp = &t_params[0];
-	while ((rte_atomic16_read(&tp->nb_dequeued) <
-			op_params->num_to_process) &&
-			(rte_atomic16_read(&tp->processing_status) !=
-			TEST_FAILED))
+	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+		op_params->num_to_process) &&
+		(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+		TEST_FAILED))
 		rte_pause();
 
 	tp->ops_per_sec /= TEST_REPETITIONS;
 	tp->mbps /= TEST_REPETITIONS;
-	ret |= (int)rte_atomic16_read(&tp->processing_status);
+	ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 
 	/* Wait for worker lcores operations */
 	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
 		tp = &t_params[used_cores];
 
-		while ((rte_atomic16_read(&tp->nb_dequeued) <
-				op_params->num_to_process) &&
-				(rte_atomic16_read(&tp->processing_status) !=
-				TEST_FAILED))
+		while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+			op_params->num_to_process) &&
+			(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+			TEST_FAILED))
 			rte_pause();
 
 		tp->ops_per_sec /= TEST_REPETITIONS;
 		tp->mbps /= TEST_REPETITIONS;
-		ret |= (int)rte_atomic16_read(&tp->processing_status);
+		ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 	}
 
 	/* Print throughput if test passed */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (10 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 20:23   ` David Marchand
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Remove the unnecessary rte_atomic.h included in app modules.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/proc-info/main.c         | 1 -
 app/test-pipeline/config.c   | 1 -
 app/test-pipeline/init.c     | 1 -
 app/test-pipeline/main.c     | 1 -
 app/test-pipeline/runtime.c  | 1 -
 app/test-pmd/cmdline.c       | 1 -
 app/test-pmd/config.c        | 1 -
 app/test-pmd/csumonly.c      | 1 -
 app/test-pmd/flowgen.c       | 1 -
 app/test-pmd/icmpecho.c      | 1 -
 app/test-pmd/iofwd.c         | 1 -
 app/test-pmd/macfwd.c        | 1 -
 app/test-pmd/macswap.c       | 1 -
 app/test-pmd/parameters.c    | 1 -
 app/test-pmd/rxonly.c        | 1 -
 app/test-pmd/txonly.c        | 1 -
 app/test/test_barrier.c      | 1 -
 app/test/test_mbuf.c         | 1 -
 app/test/test_mp_secondary.c | 1 -
 app/test/test_ring.c         | 1 -
 20 files changed, 20 deletions(-)

diff --git a/app/proc-info/main.c b/app/proc-info/main.c
index a4271047e6..ebe2d77264 100644
--- a/app/proc-info/main.c
+++ b/app/proc-info/main.c
@@ -27,7 +27,6 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
 #include <rte_metrics.h>
diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c
index 33f3f1c827..daf838948b 100644
--- a/app/test-pipeline/config.c
+++ b/app/test-pipeline/config.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index c738019041..eee0719b67 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c
index 72e4797ff2..1e16794183 100644
--- a/app/test-pipeline/main.c
+++ b/app/test-pipeline/main.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 159192bcd8..d939a85d7e 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f51b259fe..4e93f535ff 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 26cadf39f7..d8b5032b58 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -27,7 +27,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8526d9158a..e0b00abe8c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index 5737eaa105..9ceef3b54a 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 8f1d68a83a..3a85ec3dd1 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -20,7 +20,6 @@
 #include <rte_cycles.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 83d098adcb..19cd920f70 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -23,7 +23,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memcpy.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index ac50d0b9f8..812a0c721f 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 310bca06af..4627ff83e9 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 0974b0a38f..2f4f944efa 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -30,7 +30,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_interrupts.h>
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index c78fc4609a..d1a579d8d8 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 34bb538379..b8497e733d 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c
index c27f8a0742..898c2516ed 100644
--- a/app/test/test_barrier.c
+++ b/app/test/test_barrier.c
@@ -24,7 +24,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index f93bcef8a9..d53126710f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 5b6f05dbb1..021ca0547f 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -28,7 +28,6 @@
 #include <rte_lcore.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
-#include <rte_atomic.h>
 #include <rte_ring.h>
 #include <rte_debug.h>
 #include <rte_log.h>
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fb8532a409..bde33ab4a1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-16 19:52   ` Honnappa Nagarahalli
  2021-11-16 20:20   ` David Marchand
  1 sibling, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 19:52 UTC (permalink / raw)
  To: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for lcore_state and
> collisions sync.
> 
> Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> guarantee lcore_state initialized correctly before the threads launched.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_timer.c           | 30 +++++++++++++-----------------
>  app/test/test_timer_secondary.c |  1 -
>  2 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/app/test/test_timer.c b/app/test/test_timer.c index
> a10b2fe9da..c97e5c891c 100644
> --- a/app/test/test_timer.c
> +++ b/app/test/test_timer.c
> @@ -102,7 +102,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_random.h>
>  #include <rte_malloc.h>
> @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
> 
>  /* Need to synchronize worker lcores through multiple steps. */  enum {
> WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING,
> WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
> +static uint16_t lcore_state[RTE_MAX_LCORE];
> 
>  static void
>  main_init_workers(void)
> @@ -211,7 +210,7 @@ main_init_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
> +		__atomic_store_n(&lcore_state[i], WORKER_WAITING,
> __ATOMIC_RELAXED);
>  	}
>  }
> 
> @@ -221,11 +220,10 @@ main_start_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
> +		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL,
> +__ATOMIC_RELEASE);
>  	}
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_RUNNING)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING,
> +__ATOMIC_ACQUIRE);
>  	}
>  }
> 
> @@ -235,8 +233,7 @@ main_wait_for_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_FINISHED)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED,
> +__ATOMIC_ACQUIRE);
>  	}
>  }
> 
> @@ -245,9 +242,8 @@ worker_wait_to_start(void)  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	while (rte_atomic16_read(&lcore_state[lcore_id]) !=
> WORKER_RUN_SIGNAL)
> -		rte_pause();
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
> +	rte_wait_until_equal_16(&lcore_state[lcore_id],
> WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING,
> +__ATOMIC_RELEASE);
>  }
> 
>  static void
> @@ -255,7 +251,7 @@ worker_finish(void)
>  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED,
> +__ATOMIC_RELEASE);
>  }
> 
> 
> @@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  	unsigned int lcore_id = rte_lcore_id();
>  	unsigned int main_lcore = rte_get_main_lcore();
>  	int32_t my_collisions = 0;
> -	static rte_atomic32_t collisions;
> +	static uint32_t collisions;
> 
>  	if (lcore_id == main_lcore) {
>  		cb_count = 0;
>  		test_failed = 0;
> -		rte_atomic32_set(&collisions, 0);
> -		main_init_workers();
> +		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
>  		timers = rte_malloc(NULL, sizeof(*timers) *
> NB_STRESS2_TIMERS, 0);
>  		if (timers == NULL) {
>  			printf("Test Failed\n");
> @@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  			my_collisions++;
>  	}
>  	if (my_collisions != 0)
> -		rte_atomic32_add(&collisions, my_collisions);
> +		__atomic_fetch_add(&collisions, my_collisions,
> __ATOMIC_RELAXED);
> 
>  	/* wait long enough for timers to expire */
>  	rte_delay_ms(100);
> @@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
> 
>  	/* now check that we get the right number of callbacks */
>  	if (lcore_id == main_lcore) {
> -		my_collisions = rte_atomic32_read(&collisions);
> +		my_collisions = __atomic_load_n(&collisions,
> __ATOMIC_RELAXED);
>  		if (my_collisions != 0)
>  			printf("- %d timer reset collisions (OK)\n",
> my_collisions);
>  		rte_timer_manage();
> @@ -573,6 +568,7 @@ test_timer(void)
>  	/* run a second, slightly different set of stress tests */
>  	printf("\nStart timer stress tests 2\n");
>  	test_failed = 0;
> +	main_init_workers();
>  	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL,
> CALL_MAIN);
>  	rte_eal_mp_wait_lcore();
>  	if (test_failed)
> diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
> index 16a9f1878b..5795c97f07 100644
> --- a/app/test/test_timer_secondary.c
> +++ b/app/test/test_timer_secondary.c
> @@ -9,7 +9,6 @@
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_memzone.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_cycles.h>
>  #include <rte_mempool.h>
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
@ 2021-11-16 20:15   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 20:15 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for
> display sync.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-compress-perf/comp_perf_test_common.h    |  2 +-
>  .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
>  .../comp_perf_test_throughput.c                   | 10 +++++++---
>  app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
>  4 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-
> compress-perf/comp_perf_test_common.h
> index 72705c6a2b..d039e5a29a 100644
> --- a/app/test-compress-perf/comp_perf_test_common.h
> +++ b/app/test-compress-perf/comp_perf_test_common.h
> @@ -14,7 +14,7 @@ struct cperf_mem_resources {
>  	uint16_t qp_id;
>  	uint8_t lcore_id;
> 
> -	rte_atomic16_t print_info_once;
> +	uint16_t print_info_once;
> 
>  	uint32_t total_bufs;
>  	uint8_t *compressed_data;
> diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-
> compress-perf/comp_perf_test_cyclecount.c
> index c875ddbdac..da55b02b74 100644
> --- a/app/test-compress-perf/comp_perf_test_cyclecount.c
> +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
> @@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  	struct cperf_cyclecount_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	static rte_spinlock_t print_spinlock;
>  	int i;
> 
> @@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  			(ctx->ver.mem.total_bufs * test_data->num_iter);
> 
>  	/* R E P O R T processing */
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	rte_spinlock_lock(&print_spinlock);
> 
> -		rte_spinlock_lock(&print_spinlock);
> +	if (display_once == 0) {
> +		display_once = 1;
> 
>  		printf("\nLegend for the table\n"
>  		"  - Retries section: number of retries for the following
> operations:\n"
> @@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  			"setup/op",
>  			"[C-e]", "[C-d]",
>  			"[D-e]", "[D-d]");
> -
> -		rte_spinlock_unlock(&print_spinlock);
>  	}
> 
> -	rte_spinlock_lock(&print_spinlock);
> -
>  	printf("%12u"
>  	       "%6u"
>  	       "%12zu"
> diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-
> compress-perf/comp_perf_test_throughput.c
> index 13922b658c..d3dff070b0 100644
> --- a/app/test-compress-perf/comp_perf_test_throughput.c
> +++ b/app/test-compress-perf/comp_perf_test_throughput.c
> @@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
>  	struct cperf_benchmark_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	int i, ret = EXIT_SUCCESS;
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
>  	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
>  			1000000000;
> 
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	exp = 0;
> +	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  		printf("\n%12s%6s%12s%17s%15s%16s\n",
>  			"lcore id", "Level", "Comp size", "Comp ratio [%]",
>  			"Comp [Gbps]", "Decomp [Gbps]");
> diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-
> compress-perf/comp_perf_test_verify.c
> index 5e13257b79..f6e21368e8 100644
> --- a/app/test-compress-perf/comp_perf_test_verify.c
> +++ b/app/test-compress-perf/comp_perf_test_verify.c
> @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
>  	struct cperf_verify_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->options;
>  	int ret = EXIT_SUCCESS;
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	uint32_t lcore = rte_lcore_id();
> 
>  	ctx->mem.lcore_id = lcore;
> @@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
>  	ctx->ratio = (double) ctx->comp_data_sz /
>  			test_data->input_data_sz * 100;
> 
> +	uint16_t exp = 0;
>  	if (!ctx->silent) {
> -		if (rte_atomic16_test_and_set(&display_once)) {
> +		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  			printf("%12s%6s%12s%17s\n",
>  			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
>  		}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
  2021-11-16 19:52   ` Honnappa Nagarahalli
@ 2021-11-16 20:20   ` David Marchand
  2021-11-16 21:21     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 36+ messages in thread
From: David Marchand @ 2021-11-16 20:20 UTC (permalink / raw)
  To: Joyce Kong, Honnappa Nagarahalli
  Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang

Joyce, Honnappa,

On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Convert rte_atomic usages to compiler atomic
> built-ins for lcore_state and collisions sync.
>
> Also, move 'main_init_workers' outside of
> 'timer_stress2_main_loop' to guarantee lcore_state
> initialized correctly before the threads launched.

Is this "also" part actually related to the change?
Or is it a separate fix?


>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>



-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-16 20:23   ` David Marchand
  2021-11-17  7:05     ` Joyce Kong
  0 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2021-11-16 20:23 UTC (permalink / raw)
  To: Joyce Kong
  Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev, dev, nd, Ruifeng Wang

On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Remove the unnecessary rte_atomic.h included in app modules.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

After patch, I still see:

$ git grep rte_atomic.h app/
app/test/commands.c:#include <rte_atomic.h>
app/test/test_atomic.c:#include <rte_atomic.h>
app/test/test_event_timer_adapter.c:#include <rte_atomic.h>

I can undertand why the test_atomic would depend on rte_atomic.h :-)
but not the rest.
Is there a reason? or is it just a miss?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16 20:20   ` David Marchand
@ 2021-11-16 21:21     ` Honnappa Nagarahalli
  2021-11-17  9:29       ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:21 UTC (permalink / raw)
  To: David Marchand, Joyce Kong
  Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang, nd

<snip>

> 
> Joyce, Honnappa,
> 
> On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
> >
> > Convert rte_atomic usages to compiler atomic built-ins for lcore_state
> > and collisions sync.
> >
> > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> > guarantee lcore_state initialized correctly before the threads
> > launched.
> 
> Is this "also" part actually related to the change?
> Or is it a separate fix?
'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple.

> 
> 
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-16 21:30   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:30 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for polling sync in
> pmd_perf test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_pmd_perf.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index
> 1df86ce080..546384a50d 100644
> --- a/app/test/test_pmd_perf.c
> +++ b/app/test/test_pmd_perf.c
> @@ -10,7 +10,6 @@
>  #include <rte_cycles.h>
>  #include <rte_ethdev.h>
>  #include <rte_byteorder.h>
> -#include <rte_atomic.h>
>  #include <rte_malloc.h>
>  #include "packet_burst_generator.h"
>  #include "test.h"
> @@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
>  	return 0;
>  }
> 
> -static rte_atomic64_t start;
> +static uint64_t start;
> 
>  static inline int
>  poll_burst(void *args)
> @@ -563,8 +562,7 @@ poll_burst(void *args)
>  		num[portid] = pkt_per_port;
>  	}
> 
> -	while (!rte_atomic64_read(&start))
> -		;
> +	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
> 
>  	cur_tsc = rte_rdtsc();
>  	while (total) {
> @@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore)
>  	pkt_per_port = MAX_TRAFFIC_BURST;
>  	num = pkt_per_port * conf->nb_ports;
> 
> -	rte_atomic64_init(&start);
> -
>  	/* start polling thread, but not actually poll yet */
>  	rte_eal_remote_launch(poll_burst,
>  			      (void *)&pkt_per_port, lcore);
> 
>  	/* Only when polling first */
>  	if (flags == SC_BURST_POLL_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
> +	else
> +		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
These lines need to be moved up before calling rte_eal_remote_launch, so that update to start is visible to the worker threads.

> 
>  	/* start xmit */
>  	i = 0;
> @@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore)
> 
>  	/* only when polling second  */
>  	if (flags == SC_BURST_XMIT_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
> 
>  	/* wait for polling finished */
>  	diff_tsc = rte_eal_wait_lcore(lcore);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 10/12] app/testpmd: remove atomic operations for port status
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-16 21:34   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:34 UTC (permalink / raw)
  To: Joyce Kong, Xiaoyun Li; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>
> 
> The port_status changes do not need to be handled atomically, as they are
> modified during initialization or through the testpmd prompt instead of
> multiple threads.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
>  1 file changed, 31 insertions(+), 27 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> a66dfb297c..ed472cacd2 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -36,7 +36,6 @@
>  #include <rte_alarm.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_malloc.h>
> @@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  			continue;
> 
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  			continue;
> 
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2729,8 +2728,9 @@ start_port(portid_t pid)
> 
>  		need_check_link_status = 0;
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STOPPED,
> -						 RTE_PORT_HANDLING) == 0)
> {
> +		if (port->port_status == RTE_PORT_STOPPED)
> +			port->port_status = RTE_PORT_HANDLING;
> +		else {
>  			fprintf(stderr, "Port %d is now not stopped\n", pi);
>  			continue;
>  		}
> @@ -2766,8 +2766,9 @@ start_port(portid_t pid)
>  						     nb_txq + nb_hairpinq,
>  						     &(port->dev_conf));
>  			if (diag != 0) {
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2828,9 +2829,9 @@ start_port(portid_t pid)
>  					continue;
> 
>  				/* Fail to setup tx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2880,9 +2881,9 @@ start_port(portid_t pid)
>  					continue;
> 
>  				/* Fail to setup rx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2917,16 +2918,18 @@ start_port(portid_t pid)
>  				pi, rte_strerror(-diag));
> 
>  			/* Fail to setup rx queue, return */
> -			if (rte_atomic16_cmpset(&(port->port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +			if (port->port_status == RTE_PORT_HANDLING)
> +				port->port_status = RTE_PORT_STOPPED;
> +			else
>  				fprintf(stderr,
>  					"Port %d can not be set back to
> stopped\n",
>  					pi);
>  			continue;
>  		}
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STARTED;
> +		else
>  			fprintf(stderr, "Port %d can not be set into started\n",
>  				pi);
> 
> @@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STARTED,
> -						RTE_PORT_HANDLING) == 0)
> +		if (port->port_status == RTE_PORT_STARTED)
> +			port->port_status = RTE_PORT_HANDLING;
> +		else
>  			continue;
> 
>  		if (hairpin_mode & 0xf) {
> @@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
>  			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port
> %u\n",
>  				pi);
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr, "Port %d can not be set into
> stopped\n",
>  				pi);
>  		need_check_link_status = 1;
> @@ -3119,8 +3124,7 @@ close_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
> +		if (port->port_status == RTE_PORT_CLOSED) {
>  			fprintf(stderr, "Port %d is already closed\n", pi);
>  			continue;
>  		}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16 20:23   ` David Marchand
@ 2021-11-17  7:05     ` Joyce Kong
  0 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  7:05 UTC (permalink / raw)
  To: David Marchand
  Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev, dev, nd, Ruifeng Wang

<snip>

> Subject: Re: [PATCH v2 12/12] app: remove unnecessary include of atomic
> header file
> 
> On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote:
> >
> > Remove the unnecessary rte_atomic.h included in app modules.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> After patch, I still see:
> 
> $ git grep rte_atomic.h app/
> app/test/commands.c:#include <rte_atomic.h>
> app/test/test_atomic.c:#include <rte_atomic.h>
> app/test/test_event_timer_adapter.c:#include <rte_atomic.h>
> 
> I can undertand why the test_atomic would depend on rte_atomic.h :-) but
> not the rest.
> Is there a reason? or is it just a miss?
> 
> --
> David Marchand

Hi David, I checked the rest and it was a miss. Thanks for the remind, would update in v3.

Joyce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 00/12] use compiler atomic builtins for app modules
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (11 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-17  8:21 ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
                     ` (12 more replies)
  12 siblings, 13 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong

Since atomic operations have been adopted in DPDK now[1],
change rte_atomicNN_xxx APIs to compiler atomic built-ins
in app modules[2].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html

v3:
  1. In pmd_perf test case, move the initialization of polling
     start before calling rte_eal_remote_launch, so the update
     is visible to the worker threads.(Honnappa Nagarahalli)
  2. Remove the rest rte_atomic.h which miss in v2.(David Marchand)

v2:
  By Honnappa Nagarahalli:
  1. Replace the RELAXED barriers with suitable ones for shared
     data sync in pmd_perf and timer test cases.
  2. Avoid unnecessary atomic operations in compress and testpmd
     modules.
  3. Fix some typo.

Joyce Kong (12):
  test/pmd_perf: use compiler atomic builtins for polling sync
  test/ring_perf: use compiler atomic builtins for lcores sync
  test/timer: use compiler atomic builtins for sync
  test/stack_perf: use compiler atomics for lcore sync
  test/bpf: use compiler atomics for calculation
  test/func_reentrancy: use compiler atomics for data sync
  app/eventdev: use compiler atomics for shared data sync
  app/crypto: use compiler atomic builtins for display sync
  app/compress: use compiler atomic builtins for display sync
  app/testpmd: remove atomic operations for port status
  app/bbdev: use compiler atomics for shared data sync
  app: remove unnecessary include of atomic header file

 app/proc-info/main.c                          |   1 -
 app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
 .../comp_perf_test_common.h                   |   2 +-
 .../comp_perf_test_cyclecount.c               |  15 +-
 .../comp_perf_test_throughput.c               |  10 +-
 .../comp_perf_test_verify.c                   |   6 +-
 app/test-crypto-perf/cperf_test_latency.c     |   6 +-
 .../cperf_test_pmd_cyclecount.c               |   9 +-
 app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
 app/test-crypto-perf/cperf_test_verify.c      |   9 +-
 app/test-eventdev/evt_main.c                  |   1 -
 app/test-eventdev/test_order_atq.c            |   4 +-
 app/test-eventdev/test_order_common.c         |   4 +-
 app/test-eventdev/test_order_common.h         |   8 +-
 app/test-eventdev/test_order_queue.c          |   4 +-
 app/test-pipeline/config.c                    |   1 -
 app/test-pipeline/init.c                      |   1 -
 app/test-pipeline/main.c                      |   1 -
 app/test-pipeline/runtime.c                   |   1 -
 app/test-pmd/cmdline.c                        |   1 -
 app/test-pmd/config.c                         |   1 -
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/icmpecho.c                       |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/parameters.c                     |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |  58 ++++----
 app/test-pmd/txonly.c                         |   1 -
 app/test/commands.c                           |   1 -
 app/test/test_barrier.c                       |   1 -
 app/test/test_bpf.c                           |  28 ++--
 app/test/test_event_timer_adapter.c           |   1 -
 app/test/test_func_reentrancy.c               |  27 ++--
 app/test/test_mbuf.c                          |   1 -
 app/test/test_mp_secondary.c                  |   1 -
 app/test/test_pmd_perf.c                      |  23 +--
 app/test/test_ring.c                          |   1 -
 app/test/test_ring_perf.c                     |   9 +-
 app/test/test_stack_perf.c                    |  14 +-
 app/test/test_timer.c                         |  30 ++--
 app/test/test_timer_secondary.c               |   1 -
 44 files changed, 203 insertions(+), 231 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for polling sync in pmd_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_pmd_perf.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 1df86ce080..a6bac9d45e 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -10,7 +10,6 @@
 #include <rte_cycles.h>
 #include <rte_ethdev.h>
 #include <rte_byteorder.h>
-#include <rte_atomic.h>
 #include <rte_malloc.h>
 #include "packet_burst_generator.h"
 #include "test.h"
@@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
 	return 0;
 }
 
-static rte_atomic64_t start;
+static uint64_t start;
 
 static inline int
 poll_burst(void *args)
@@ -563,8 +562,7 @@ poll_burst(void *args)
 		num[portid] = pkt_per_port;
 	}
 
-	while (!rte_atomic64_read(&start))
-		;
+	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
 
 	cur_tsc = rte_rdtsc();
 	while (total) {
@@ -616,16 +614,19 @@ exec_burst(uint32_t flags, int lcore)
 	pkt_per_port = MAX_TRAFFIC_BURST;
 	num = pkt_per_port * conf->nb_ports;
 
-	rte_atomic64_init(&start);
+	/* only when polling first */
+	if (flags == SC_BURST_POLL_FIRST)
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
+	else
+		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
 
-	/* start polling thread, but not actually poll yet */
+	/* start polling thread
+	 * if in POLL_FIRST mode, poll once launched;
+	 * otherwise, not actually poll yet
+	 */
 	rte_eal_remote_launch(poll_burst,
 			      (void *)&pkt_per_port, lcore);
 
-	/* Only when polling first */
-	if (flags == SC_BURST_POLL_FIRST)
-		rte_atomic64_set(&start, 1);
-
 	/* start xmit */
 	i = 0;
 	while (num) {
@@ -641,7 +642,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* only when polling second  */
 	if (flags == SC_BURST_XMIT_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
 
 	/* wait for polling finished */
 	diff_tsc = rte_eal_wait_lcore(lcore);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcores sync in ring_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring_perf.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index fd82e20412..2d8bb675a3 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 	return 0;
 }
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 static uint64_t queue_count[RTE_MAX_LCORE];
 
 #define TIME_MS 100
@@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			rte_pause();
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
@@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize)
 		param.r = r;
 
 		/* clear synchro and start workers */
-		rte_atomic32_set(&synchro, 0);
+		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN) < 0)
 			return -1;
 
 		/* start synchro and launch test on main */
-		rte_atomic32_set(&synchro, 1);
+		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic
built-ins for lcore_state and collisions sync.

Also, move 'main_init_workers' outside of
'timer_stress2_main_loop' to guarantee lcore_state
initialized correctly before the threads launched.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_timer.c           | 30 +++++++++++++-----------------
 app/test/test_timer_secondary.c |  1 -
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/app/test/test_timer.c b/app/test/test_timer.c
index a10b2fe9da..c97e5c891c 100644
--- a/app/test/test_timer.c
+++ b/app/test/test_timer.c
@@ -102,7 +102,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_random.h>
 #include <rte_malloc.h>
@@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
 
 /* Need to synchronize worker lcores through multiple steps. */
 enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED };
-static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
+static uint16_t lcore_state[RTE_MAX_LCORE];
 
 static void
 main_init_workers(void)
@@ -211,7 +210,7 @@ main_init_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
+		__atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -221,11 +220,10 @@ main_start_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
+		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE);
 	}
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -235,8 +233,7 @@ main_wait_for_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -245,9 +242,8 @@ worker_wait_to_start(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL)
-		rte_pause();
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
+	rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE);
 }
 
 static void
@@ -255,7 +251,7 @@ worker_finish(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE);
 }
 
 
@@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
 	unsigned int lcore_id = rte_lcore_id();
 	unsigned int main_lcore = rte_get_main_lcore();
 	int32_t my_collisions = 0;
-	static rte_atomic32_t collisions;
+	static uint32_t collisions;
 
 	if (lcore_id == main_lcore) {
 		cb_count = 0;
 		test_failed = 0;
-		rte_atomic32_set(&collisions, 0);
-		main_init_workers();
+		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
 		timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0);
 		if (timers == NULL) {
 			printf("Test Failed\n");
@@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 			my_collisions++;
 	}
 	if (my_collisions != 0)
-		rte_atomic32_add(&collisions, my_collisions);
+		__atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED);
 
 	/* wait long enough for timers to expire */
 	rte_delay_ms(100);
@@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 
 	/* now check that we get the right number of callbacks */
 	if (lcore_id == main_lcore) {
-		my_collisions = rte_atomic32_read(&collisions);
+		my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED);
 		if (my_collisions != 0)
 			printf("- %d timer reset collisions (OK)\n", my_collisions);
 		rte_timer_manage();
@@ -573,6 +568,7 @@ test_timer(void)
 	/* run a second, slightly different set of stress tests */
 	printf("\nStart timer stress tests 2\n");
 	test_failed = 0;
+	main_init_workers();
 	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN);
 	rte_eal_mp_wait_lcore();
 	if (test_failed)
diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
index 16a9f1878b..5795c97f07 100644
--- a/app/test/test_timer_secondary.c
+++ b/app/test/test_timer_secondary.c
@@ -9,7 +9,6 @@
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_memzone.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_cycles.h>
 #include <rte_mempool.h>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (2 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcore sync in stack_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_stack_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c
index 4ee40d5d19..1eae00a334 100644
--- a/app/test/test_stack_perf.c
+++ b/app/test/test_stack_perf.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <inttypes.h>
 
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_launch.h>
 #include <rte_pause.h>
@@ -24,7 +23,7 @@
  */
 static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
 
-static rte_atomic32_t lcore_barrier;
+static uint32_t lcore_barrier;
 
 struct lcore_pair {
 	unsigned int c1;
@@ -144,9 +143,8 @@ bulk_push_pop(void *p)
 	s = args->s;
 	size = args->sz;
 
-	rte_atomic32_sub(&lcore_barrier, 1);
-	while (rte_atomic32_read(&lcore_barrier) != 0)
-		rte_pause();
+	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	uint64_t start = rte_rdtsc();
 
@@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s,
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
-		rte_atomic32_set(&lcore_barrier, 2);
+		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
 
 		args[0].sz = args[1].sz = bulk_sizes[i];
 		args[0].s = args[1].s = s;
@@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n)
 		int cnt = 0;
 		double avg;
 
-		rte_atomic32_set(&lcore_barrier, n);
+		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
 
 		RTE_LCORE_FOREACH_WORKER(lcore_id) {
 			if (++cnt >= n)
@@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
 	struct lcore_pair cores;
 	struct rte_stack *s;
 
-	rte_atomic32_init(&lcore_barrier);
+	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 05/12] test/bpf: use compiler atomics for calculation
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (3 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for calculation in bpf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_bpf.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e3e9a1b0b5..b8be1e3d30 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg)
 	memset(&dfe, 0, sizeof(dfe));
 
 	rv = 1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = -1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = (int32_t)TEST_FILL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_3;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (4 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko, Bruce Richardson,
	Vladimir Medvedkin, Honnappa Nagarahalli, Konstantin Ananyev,
	Anatoly Burakov, Yipeng Wang, Sameh Gobriel
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in func_reentrancy test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 838ab6f0f9..7825c6cb86 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
@@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
 
 #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
 
-static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
-static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
+static uint32_t obj_count;
+static uint32_t synchro;
 
 #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
 	if (lcore_self != rte_get_main_lcore())                  \
-		while (rte_atomic32_read(&synchro) == 0);        \
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
 } while(0)
 
 /*
@@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
 
 	WAIT_SYNCHRO_FOR_WORKERS();
 
-	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
+	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */
 	if (rte_eal_init(0, NULL) != -1)
 		return -1;
 
@@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
 		if (rp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg)
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
 		if (mp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_hash_create(&hash_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple times simultaneously */
@@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_fbk_hash_create(&fbk_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, &config);
 		if (lpm != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case)
 	if (pt_case->func == NULL)
 		return -1;
 
-	rte_atomic32_set(&obj_count, 0);
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	cores = RTE_MIN(rte_lcore_count(), MAX_LCORES);
 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
@@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case)
 		rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
 	}
 
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	if (pt_case->func(pt_case->arg) < 0)
 		ret = -1;
@@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case)
 			pt_case->clean(lcore_id);
 	}
 
-	count = rte_atomic32_read(&obj_count);
+	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
 	if (count != 1) {
 		printf("%s: common object allocated %d times (should be 1)\n",
 			pt_case->name, count);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 07/12] app/eventdev: use compiler atomics for shared data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (5 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in eventdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/evt_main.c          | 1 -
 app/test-eventdev/test_order_atq.c    | 4 ++--
 app/test-eventdev/test_order_common.c | 4 ++--
 app/test-eventdev/test_order_common.h | 8 ++++----
 app/test-eventdev/test_order_queue.c  | 4 ++--
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index 3534aabca7..194c980c7a 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -6,7 +6,6 @@
 #include <unistd.h>
 #include <signal.h>
 
-#include <rte_atomic.h>
 #include <rte_debug.h>
 #include <rte_eal.h>
 #include <rte_eventdev.h>
diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c
index 71215a07b6..2fee4b4daa 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c
index d7760061ba..ff7813f9c2 100644
--- a/app/test-eventdev/test_order_common.c
+++ b/app/test-eventdev/test_order_common.c
@@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt)
 		evt_err("failed to allocate t->expected_flow_seq memory");
 		goto exp_nomem;
 	}
-	rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts);
+	__atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED);
 	t->err = false;
 	t->nb_pkts = opt->nb_pkts;
 	t->nb_flows = opt->nb_flows;
@@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt,
 
 	while (t->err == false) {
 		uint64_t new_cycles = rte_get_timer_cycles();
-		int64_t remaining = rte_atomic64_read(&t->outstand_pkts);
+		int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED);
 
 		if (remaining <= 0) {
 			t->result = EVT_TEST_SUCCESS;
diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h
index cd9d6009ec..92781d9587 100644
--- a/app/test-eventdev/test_order_common.h
+++ b/app/test-eventdev/test_order_common.h
@@ -48,7 +48,7 @@ struct test_order {
 	 * The atomic_* is an expensive operation,Since it is a functional test,
 	 * We are using the atomic_ operation to reduce the code complexity.
 	 */
-	rte_atomic64_t outstand_pkts;
+	uint64_t outstand_pkts;
 	enum evt_test_result result;
 	uint32_t nb_flows;
 	uint64_t nb_pkts;
@@ -95,7 +95,7 @@ static __rte_always_inline void
 order_process_stage_1(struct test_order *const t,
 		struct rte_event *const ev, const uint32_t nb_flows,
 		uint32_t *const expected_flow_seq,
-		rte_atomic64_t *const outstand_pkts)
+		uint64_t *const outstand_pkts)
 {
 	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
 	/* compare the seqn against expected value */
@@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t,
 	 */
 	expected_flow_seq[flow]++;
 	rte_pktmbuf_free(ev->mbuf);
-	rte_atomic64_sub(outstand_pkts, 1);
+	__atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline void
@@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t,
 	const uint8_t port = w->port_id;\
 	const uint32_t nb_flows = t->nb_flows;\
 	uint32_t *expected_flow_seq = t->expected_flow_seq;\
-	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
+	uint64_t *outstand_pkts = &t->outstand_pkts;\
 	if (opt->verbose_level > 1)\
 		printf("%s(): lcore %d dev_id %d port=%d\n",\
 			__func__, rte_lcore_id(), dev_id, port)
diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c
index 621367805a..80eaea5cf5 100644
--- a/app/test-eventdev/test_order_queue.c
+++ b/app/test-eventdev/test_order_queue.c
@@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (6 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Declan Doherty, Ciara Power
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync in crypto cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
 app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
 app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index 69f55de50a..ce49feaba9 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	if (ctx == NULL)
 		return 0;
@@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg)
 		time_max = tunit*(double)(tsc_max) / tsc_hz;
 		time_min = tunit*(double)(tsc_min) / tsc_hz;
 
+		uint16_t exp = 0;
 		if (ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, "
 						"cycles, time (us)");
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index fda97e8ab9..ba1f104f72 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 	state.lcore = rte_lcore_id();
 	state.linearize = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static bool warmup = true;
 
 	/*
@@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 			continue;
 		}
 
+		uint16_t exp = 0;
 		if (!opts->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(PRETTY_HDR_FMT, "lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
@@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 					state.cycles_per_enq,
 					state.cycles_per_deq);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 739ed9e573..51512af2ad 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	struct rte_crypto_op *ops[ctx->options->max_burst_size];
 	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
@@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx)
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
 
+		uint16_t exp = 0;
 		if (!ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 					"lcore id", "Buf Size", "Burst Size",
 					"Enqueued", "Dequeued", "Failed Enq",
@@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx)
 					throughput_gbps,
 					cycles_per_packet);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("#lcore id,Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 1962438034..496eb0de00 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
 	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 	uint64_t ops_failed = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	uint64_t i;
 	uint16_t ops_unused = 0;
@@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
 		ops_deqd_total += ops_deqd;
 	}
 
+	uint16_t exp = 0;
 	if (!ctx->options->csv) {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 				"lcore id", "Buf Size", "Burst size",
 				"Enqueued", "Dequeued", "Failed Enq",
@@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
 				ops_deqd_failed,
 				ops_failed);
 	} else {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("\n# lcore id, Buffer Size(B), "
 				"Burst Size,Enqueued,Dequeued,Failed Enq,"
 				"Failed Deq,Failed Ops\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (7 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-compress-perf/comp_perf_test_common.h    |  2 +-
 .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
 .../comp_perf_test_throughput.c                   | 10 +++++++---
 app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h
index 72705c6a2b..d039e5a29a 100644
--- a/app/test-compress-perf/comp_perf_test_common.h
+++ b/app/test-compress-perf/comp_perf_test_common.h
@@ -14,7 +14,7 @@ struct cperf_mem_resources {
 	uint16_t qp_id;
 	uint8_t lcore_id;
 
-	rte_atomic16_t print_info_once;
+	uint16_t print_info_once;
 
 	uint32_t total_bufs;
 	uint8_t *compressed_data;
diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c
index c875ddbdac..da55b02b74 100644
--- a/app/test-compress-perf/comp_perf_test_cyclecount.c
+++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
@@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	struct cperf_cyclecount_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static rte_spinlock_t print_spinlock;
 	int i;
 
@@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED,  __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			(ctx->ver.mem.total_bufs * test_data->num_iter);
 
 	/* R E P O R T processing */
-	if (rte_atomic16_test_and_set(&display_once)) {
+	rte_spinlock_lock(&print_spinlock);
 
-		rte_spinlock_lock(&print_spinlock);
+	if (display_once == 0) {
+		display_once = 1;
 
 		printf("\nLegend for the table\n"
 		"  - Retries section: number of retries for the following operations:\n"
@@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			"setup/op",
 			"[C-e]", "[C-d]",
 			"[D-e]", "[D-d]");
-
-		rte_spinlock_unlock(&print_spinlock);
 	}
 
-	rte_spinlock_lock(&print_spinlock);
-
 	printf("%12u"
 	       "%6u"
 	       "%12zu"
diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c
index 13922b658c..d3dff070b0 100644
--- a/app/test-compress-perf/comp_perf_test_throughput.c
+++ b/app/test-compress-perf/comp_perf_test_throughput.c
@@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
 	struct cperf_benchmark_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	int i, ret = EXIT_SUCCESS;
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
 	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
 			1000000000;
 
-	if (rte_atomic16_test_and_set(&display_once)) {
+	exp = 0;
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 		printf("\n%12s%6s%12s%17s%15s%16s\n",
 			"lcore id", "Level", "Comp size", "Comp ratio [%]",
 			"Comp [Gbps]", "Decomp [Gbps]");
diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c
index 5e13257b79..f6e21368e8 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
 	struct cperf_verify_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->options;
 	int ret = EXIT_SUCCESS;
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	uint32_t lcore = rte_lcore_id();
 
 	ctx->mem.lcore_id = lcore;
@@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
 	ctx->ratio = (double) ctx->comp_data_sz /
 			test_data->input_data_sz * 100;
 
+	uint16_t exp = 0;
 	if (!ctx->silent) {
-		if (rte_atomic16_test_and_set(&display_once)) {
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			printf("%12s%6s%12s%17s\n",
 			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 10/12] app/testpmd: remove atomic operations for port status
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (8 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

The port_status changes do not need to be handled
atomically, as they are modified during initialization
or through the testpmd prompt instead of multiple
threads.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a66dfb297c..ed472cacd2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -36,7 +36,6 @@
 #include <rte_alarm.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
@@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2729,8 +2728,9 @@ start_port(portid_t pid)
 
 		need_check_link_status = 0;
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED,
-						 RTE_PORT_HANDLING) == 0) {
+		if (port->port_status == RTE_PORT_STOPPED)
+			port->port_status = RTE_PORT_HANDLING;
+		else {
 			fprintf(stderr, "Port %d is now not stopped\n", pi);
 			continue;
 		}
@@ -2766,8 +2766,9 @@ start_port(portid_t pid)
 						     nb_txq + nb_hairpinq,
 						     &(port->dev_conf));
 			if (diag != 0) {
-				if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2828,9 +2829,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup tx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2880,9 +2881,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup rx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2917,16 +2918,18 @@ start_port(portid_t pid)
 				pi, rte_strerror(-diag));
 
 			/* Fail to setup rx queue, return */
-			if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+			if (port->port_status == RTE_PORT_HANDLING)
+				port->port_status = RTE_PORT_STOPPED;
+			else
 				fprintf(stderr,
 					"Port %d can not be set back to stopped\n",
 					pi);
 			continue;
 		}
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STARTED;
+		else
 			fprintf(stderr, "Port %d can not be set into started\n",
 				pi);
 
@@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED,
-						RTE_PORT_HANDLING) == 0)
+		if (port->port_status == RTE_PORT_STARTED)
+			port->port_status = RTE_PORT_HANDLING;
+		else
 			continue;
 
 		if (hairpin_mode & 0xf) {
@@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
 			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
 				pi);
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr, "Port %d can not be set into stopped\n",
 				pi);
 		need_check_link_status = 1;
@@ -3119,8 +3124,7 @@ close_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
+		if (port->port_status == RTE_PORT_CLOSED) {
 			fprintf(stderr, "Port %d is already closed\n", pi);
 			continue;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (9 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
  2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in bbdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 7b4529789b..0fa119a502 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -133,7 +133,7 @@ struct test_op_params {
 	uint16_t num_to_process;
 	uint16_t num_lcores;
 	int vector_mask;
-	rte_atomic16_t sync;
+	uint16_t sync;
 	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
 };
 
@@ -148,9 +148,9 @@ struct thread_params {
 	uint8_t iter_count;
 	double iter_average;
 	double bler;
-	rte_atomic16_t nb_dequeued;
-	rte_atomic16_t processing_status;
-	rte_atomic16_t burst_sz;
+	uint16_t nb_dequeued;
+	int16_t processing_status;
+	uint16_t burst_sz;
 	struct test_op_params *op_params;
 	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
 	struct rte_bbdev_enc_op *enc_ops[MAX_BURST];
@@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id,
 	}
 
 	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		printf(
 			"Dequeue interrupt handler called for incorrect event!\n");
 		return;
 	}
 
-	burst_sz = rte_atomic16_read(&tp->burst_sz);
+	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
 	num_ops = tp->op_params->num_to_process;
 
 	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
 		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
 		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
 		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else /*RTE_BBDEV_OP_TURBO_ENC*/
 		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 
 	if (deq < burst_sz) {
 		printf(
 			"After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n",
 			burst_sz, deq);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
-	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
-		rte_atomic16_add(&tp->nb_dequeued, deq);
+	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) {
+		__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id,
 
 	if (ret) {
 		printf("Buffers validation failed\n");
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 	}
 
 	switch (test_vector.op_type) {
@@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id,
 		break;
 	default:
 		printf("Unknown op type: %d\n", test_vector.op_type);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id,
 	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
 			((double)total_time / (double)rte_get_tsc_hz());
 
-	rte_atomic16_add(&tp->nb_dequeued, deq);
+	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 }
 
 static int
@@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad,
 	else
 		return TEST_SKIPPED;
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = bler_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad,
 			throughput_function = throughput_pmd_lcore_enc;
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = throughput_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad,
 	 * Wait for main lcore operations.
 	 */
 	tp = &t_params[0];
-	while ((rte_atomic16_read(&tp->nb_dequeued) <
-			op_params->num_to_process) &&
-			(rte_atomic16_read(&tp->processing_status) !=
-			TEST_FAILED))
+	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+		op_params->num_to_process) &&
+		(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+		TEST_FAILED))
 		rte_pause();
 
 	tp->ops_per_sec /= TEST_REPETITIONS;
 	tp->mbps /= TEST_REPETITIONS;
-	ret |= (int)rte_atomic16_read(&tp->processing_status);
+	ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 
 	/* Wait for worker lcores operations */
 	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
 		tp = &t_params[used_cores];
 
-		while ((rte_atomic16_read(&tp->nb_dequeued) <
-				op_params->num_to_process) &&
-				(rte_atomic16_read(&tp->processing_status) !=
-				TEST_FAILED))
+		while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+			op_params->num_to_process) &&
+			(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+			TEST_FAILED))
 			rte_pause();
 
 		tp->ops_per_sec /= TEST_REPETITIONS;
 		tp->mbps /= TEST_REPETITIONS;
-		ret |= (int)rte_atomic16_read(&tp->processing_status);
+		ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 	}
 
 	/* Print throughput if test passed */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 12/12] app: remove unnecessary include of atomic header file
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (10 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
@ 2021-11-17  8:22   ` Joyce Kong
  2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:22 UTC (permalink / raw)
  To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Erik Gabriel Carrillo, Olivier Matz, Anatoly Burakov,
	Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Remove the unnecessary rte_atomic.h included in app modules.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/proc-info/main.c                | 1 -
 app/test-pipeline/config.c          | 1 -
 app/test-pipeline/init.c            | 1 -
 app/test-pipeline/main.c            | 1 -
 app/test-pipeline/runtime.c         | 1 -
 app/test-pmd/cmdline.c              | 1 -
 app/test-pmd/config.c               | 1 -
 app/test-pmd/csumonly.c             | 1 -
 app/test-pmd/flowgen.c              | 1 -
 app/test-pmd/icmpecho.c             | 1 -
 app/test-pmd/iofwd.c                | 1 -
 app/test-pmd/macfwd.c               | 1 -
 app/test-pmd/macswap.c              | 1 -
 app/test-pmd/parameters.c           | 1 -
 app/test-pmd/rxonly.c               | 1 -
 app/test-pmd/txonly.c               | 1 -
 app/test/commands.c                 | 1 -
 app/test/test_barrier.c             | 1 -
 app/test/test_event_timer_adapter.c | 1 -
 app/test/test_mbuf.c                | 1 -
 app/test/test_mp_secondary.c        | 1 -
 app/test/test_ring.c                | 1 -
 22 files changed, 22 deletions(-)

diff --git a/app/proc-info/main.c b/app/proc-info/main.c
index a4271047e6..ebe2d77264 100644
--- a/app/proc-info/main.c
+++ b/app/proc-info/main.c
@@ -27,7 +27,6 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
 #include <rte_metrics.h>
diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c
index 33f3f1c827..daf838948b 100644
--- a/app/test-pipeline/config.c
+++ b/app/test-pipeline/config.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index c738019041..eee0719b67 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c
index 72e4797ff2..1e16794183 100644
--- a/app/test-pipeline/main.c
+++ b/app/test-pipeline/main.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 159192bcd8..d939a85d7e 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f51b259fe..4e93f535ff 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 26cadf39f7..d8b5032b58 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -27,7 +27,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8526d9158a..e0b00abe8c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index 5737eaa105..9ceef3b54a 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 8f1d68a83a..3a85ec3dd1 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -20,7 +20,6 @@
 #include <rte_cycles.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 83d098adcb..19cd920f70 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -23,7 +23,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memcpy.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index ac50d0b9f8..812a0c721f 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 310bca06af..4627ff83e9 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 0974b0a38f..2f4f944efa 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -30,7 +30,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_interrupts.h>
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index c78fc4609a..d1a579d8d8 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 34bb538379..b8497e733d 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test/commands.c b/app/test/commands.c
index 76f6ee5d23..2dced3bc44 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -25,7 +25,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_malloc.h>
diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c
index c27f8a0742..898c2516ed 100644
--- a/app/test/test_barrier.c
+++ b/app/test/test_barrier.c
@@ -24,7 +24,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
diff --git a/app/test/test_event_timer_adapter.c b/app/test/test_event_timer_adapter.c
index 12c00e678e..25bac2d155 100644
--- a/app/test/test_event_timer_adapter.c
+++ b/app/test/test_event_timer_adapter.c
@@ -5,7 +5,6 @@
 
 #include <math.h>
 
-#include <rte_atomic.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_debug.h>
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index f93bcef8a9..d53126710f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 5b6f05dbb1..021ca0547f 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -28,7 +28,6 @@
 #include <rte_lcore.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
-#include <rte_atomic.h>
 #include <rte_ring.h>
 #include <rte_debug.h>
 #include <rte_log.h>
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fb8532a409..bde33ab4a1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16 21:21     ` Honnappa Nagarahalli
@ 2021-11-17  9:29       ` David Marchand
  0 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2021-11-17  9:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang

On Tue, Nov 16, 2021 at 10:21 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > Joyce, Honnappa,
> >
> > On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
> > >
> > > Convert rte_atomic usages to compiler atomic built-ins for lcore_state
> > > and collisions sync.
> > >
> > > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> > > guarantee lcore_state initialized correctly before the threads
> > > launched.
> >
> > Is this "also" part actually related to the change?
> > Or is it a separate fix?
> 'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple.

This is indeed better this way.
Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 00/12] use compiler atomic builtins for app modules
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (11 preceding siblings ...)
  2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-17 10:02   ` David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2021-11-17 10:02 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, Honnappa Nagarahalli, nd

On Wed, Nov 17, 2021 at 9:22 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Since atomic operations have been adopted in DPDK now[1],
> change rte_atomicNN_xxx APIs to compiler atomic built-ins
> in app modules[2].
>
> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
> [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html
>
> v3:
>   1. In pmd_perf test case, move the initialization of polling
>      start before calling rte_eal_remote_launch, so the update
>      is visible to the worker threads.(Honnappa Nagarahalli)
>   2. Remove the rest rte_atomic.h which miss in v2.(David Marchand)
>
> v2:
>   By Honnappa Nagarahalli:
>   1. Replace the RELAXED barriers with suitable ones for shared
>      data sync in pmd_perf and timer test cases.
>   2. Avoid unnecessary atomic operations in compress and testpmd
>      modules.
>   3. Fix some typo.
>
> Joyce Kong (12):
>   test/pmd_perf: use compiler atomic builtins for polling sync
>   test/ring_perf: use compiler atomic builtins for lcores sync
>   test/timer: use compiler atomic builtins for sync
>   test/stack_perf: use compiler atomics for lcore sync
>   test/bpf: use compiler atomics for calculation
>   test/func_reentrancy: use compiler atomics for data sync
>   app/eventdev: use compiler atomics for shared data sync
>   app/crypto: use compiler atomic builtins for display sync
>   app/compress: use compiler atomic builtins for display sync
>   app/testpmd: remove atomic operations for port status
>   app/bbdev: use compiler atomics for shared data sync
>   app: remove unnecessary include of atomic header file

There were cleanups of unneeded rte_atomic.h inclusion along the series:
I moved all of them to the last patch so that patches focus on what
their commitlog describes.

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-11-17 10:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
2021-11-16 21:30   ` Honnappa Nagarahalli
2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
2021-11-16 19:52   ` Honnappa Nagarahalli
2021-11-16 20:20   ` David Marchand
2021-11-16 21:21     ` Honnappa Nagarahalli
2021-11-17  9:29       ` David Marchand
2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
2021-11-16 20:15   ` Honnappa Nagarahalli
2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
2021-11-16 21:34   ` Honnappa Nagarahalli
2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
2021-11-16 20:23   ` David Marchand
2021-11-17  7:05     ` Joyce Kong
2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.