All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Minor mistake in ring (en|de)queueing
@ 2022-01-03 14:22 Andrzej Ostruszka
  2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
  0 siblings, 1 reply; 11+ messages in thread
From: Andrzej Ostruszka @ 2022-01-03 14:22 UTC (permalink / raw)
  To: dev; +Cc: upstream, Andrzej Ostruszka

Hi all,

Recently I was going through the ring implementation and I believe I've
found a small mistake.  Not a functional error, just a slightly
suboptimal behaviour for the specific case when we want to enqueue
exactly the number of elements that we can before wrapping to the ring
beginning (the same goes for dequeueing).

Imagine we have a ring of size 16 depicted below with indexes underneath
and consumer/producer pointing as shown.

|  ******        |
 0123456789ABCDEF
   ^     ^
   c     p

Now if we try to enqueue 8 elements we will end up at the check:

	if (likely(idx + n < size)) {

where idx (=8) is a producer head and n=8.  We will fail this check
even though we have the case of 8 elements available without wrapping to
the beginning of the ring.

I hope I'm not completely off the base here :), if I'm not then the
subsequent patch attempts to fix this.

Wit regards
Andrzej Ostruszka

Andrzej Ostruszka (1):
  ring: fix off by 1 mistake

 lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] ring: fix off by 1 mistake
  2022-01-03 14:22 [PATCH 0/1] Minor mistake in ring (en|de)queueing Andrzej Ostruszka
@ 2022-01-03 14:22 ` Andrzej Ostruszka
  2022-01-03 14:56   ` Morten Brørup
                     ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrzej Ostruszka @ 2022-01-03 14:22 UTC (permalink / raw)
  To: dev; +Cc: upstream, Andrzej Ostruszka

When enqueueing/dequeueing to/from the ring we try to optimize by manual
loop unrolling.  The check for this optimization looks like:

	if (likely(idx + n < size)) {

where 'idx' points to the first usable element (empty slot for enqueue,
data for dequeue).  The correct comparison here should be '<=' instead
of '<'.

This is not a functional error since we fall back to the loop with
correct checks on indexes.  Just a minor suboptimal behaviour for the
case when we want to enqueue/dequeue exactly the number of elements that
we have in the ring before wrapping to its beginning.

Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
---
 lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index 275ec55393..83788c56e6 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size,
 	unsigned int i;
 	uint32_t *ring = (uint32_t *)&r[1];
 	const uint32_t *obj = (const uint32_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
 			ring[idx] = obj[i];
 			ring[idx + 1] = obj[i + 1];
@@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
 	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			ring[idx] = obj[i];
 			ring[idx + 1] = obj[i + 1];
@@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	rte_int128_t *ring = (rte_int128_t *)&r[1];
 	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
 			memcpy((void *)(ring + idx),
 				(const void *)(obj + i), 32);
@@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size,
 	unsigned int i;
 	uint32_t *ring = (uint32_t *)&r[1];
 	uint32_t *obj = (uint32_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
 			obj[i] = ring[idx];
 			obj[i + 1] = ring[idx + 1];
@@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
 	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			obj[i] = ring[idx];
 			obj[i + 1] = ring[idx + 1];
@@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	rte_int128_t *ring = (rte_int128_t *)&r[1];
 	rte_int128_t *obj = (rte_int128_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
 			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
 		switch (n & 0x1) {
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/1] ring: fix off by 1 mistake
  2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
@ 2022-01-03 14:56   ` Morten Brørup
  2022-01-06 10:45   ` Olivier Matz
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Morten Brørup @ 2022-01-03 14:56 UTC (permalink / raw)
  To: Andrzej Ostruszka, Honnappa Nagarahalli, Konstantin Ananyev; +Cc: upstream, dev

+Ring queue maintainers: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>, Konstantin Ananyev <konstantin.ananyev@intel.com>

> From: Andrzej Ostruszka [mailto:amo@semihalf.com]
> Sent: Monday, 3 January 2022 15.22
> 
> When enqueueing/dequeueing to/from the ring we try to optimize by
> manual
> loop unrolling.  The check for this optimization looks like:
> 
> 	if (likely(idx + n < size)) {
> 
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
> 
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements
> that
> we have in the ring before wrapping to its beginning.
> 
> Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
> ---
>  lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_elem_pvt.h
> b/lib/ring/rte_ring_elem_pvt.h
> index 275ec55393..83788c56e6 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const
> uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	const uint32_t *obj = (const uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(ring + idx),
>  				(const void *)(obj + i), 32);
> @@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r,
> const uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	uint32_t *obj = (uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	rte_int128_t *obj = (rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
>  		switch (n & 0x1) {
> --
> 2.34.1.448.ga2b2bfdf31-goog
> 

Well spotted! I took a very good look at it, and came to the same conclusion: It not a functional bug; the only consequence is that the optimized code path may not be taken in a situation where it could be taken. But it should be fixed as suggested in your patch.

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ring: fix off by 1 mistake
  2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
  2022-01-03 14:56   ` Morten Brørup
@ 2022-01-06 10:45   ` Olivier Matz
  2022-01-10 15:09   ` Ananyev, Konstantin
  2022-01-11 11:37   ` [PATCH] ring: optimize corner case for enqueue/dequeue Andrzej Ostruszka
  3 siblings, 0 replies; 11+ messages in thread
From: Olivier Matz @ 2022-01-06 10:45 UTC (permalink / raw)
  To: Andrzej Ostruszka; +Cc: dev, upstream

Hi Andrzej,

On Mon, Jan 03, 2022 at 03:22:01PM +0100, Andrzej Ostruszka wrote:
> ring: fix off by 1 mistake

I suggest something less scary for the title:

  ring: optimize corner case for enqueue/dequeue

> When enqueueing/dequeueing to/from the ring we try to optimize by manual
> loop unrolling.  The check for this optimization looks like:
> 
> 	if (likely(idx + n < size)) {
> 
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
> 
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements that
> we have in the ring before wrapping to its beginning.
> 
> Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

I'll tend to add:
Fixes: cc4b218790f6 ("ring: support configurable element size")

But the same error was in the ENQUEUE_PTRS()/DEQUEUE_PTRS() macros
since the beginning, so we may also add:
Fixes: 286bd05bf70d ("ring: optimisations")

This macro was removed in commit 2d6ed071a8b9 ("ring: use custom element
for fixed size API")

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/1] ring: fix off by 1 mistake
  2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
  2022-01-03 14:56   ` Morten Brørup
  2022-01-06 10:45   ` Olivier Matz
@ 2022-01-10 15:09   ` Ananyev, Konstantin
  2022-01-11 11:49     ` Andrzej Ostruszka
  2022-01-11 11:37   ` [PATCH] ring: optimize corner case for enqueue/dequeue Andrzej Ostruszka
  3 siblings, 1 reply; 11+ messages in thread
From: Ananyev, Konstantin @ 2022-01-10 15:09 UTC (permalink / raw)
  To: Andrzej Ostruszka, dev; +Cc: upstream

> When enqueueing/dequeueing to/from the ring we try to optimize by manual
> loop unrolling.  The check for this optimization looks like:
> 
> 	if (likely(idx + n < size)) {
> 
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
> 
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements that
> we have in the ring before wrapping to its beginning.
> 
> Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
> ---
>  lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
> index 275ec55393..83788c56e6 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	const uint32_t *obj = (const uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(ring + idx),
>  				(const void *)(obj + i), 32);
> @@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	uint32_t *obj = (uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	rte_int128_t *obj = (rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
>  		switch (n & 0x1) {
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ring: optimize corner case for enqueue/dequeue
  2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
                     ` (2 preceding siblings ...)
  2022-01-10 15:09   ` Ananyev, Konstantin
@ 2022-01-11 11:37   ` Andrzej Ostruszka
  2022-01-11 12:00     ` Morten Brørup
  2022-02-05 16:48     ` David Marchand
  3 siblings, 2 replies; 11+ messages in thread
From: Andrzej Ostruszka @ 2022-01-11 11:37 UTC (permalink / raw)
  To: dev; +Cc: Andrzej Ostruszka, Olivier Matz, Konstantin Ananyev

When enqueueing/dequeueing to/from the ring we try to optimize by manual
loop unrolling.  The check for this optimization looks like:

	if (likely(idx + n < size)) {

where 'idx' points to the first usable element (empty slot for enqueue,
data for dequeue).  The correct comparison here should be '<=' instead
of '<'.

This is not a functional error since we fall back to the loop with
correct checks on indexes.  Just a minor suboptimal behaviour for the
case when we want to enqueue/dequeue exactly the number of elements that
we have in the ring before wrapping to its beginning.

Fixes: cc4b218790f6 ("ring: support configurable element size")
Fixes: 286bd05bf70d ("ring: optimisations")

Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index 275ec55393..83788c56e6 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size,
 	unsigned int i;
 	uint32_t *ring = (uint32_t *)&r[1];
 	const uint32_t *obj = (const uint32_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
 			ring[idx] = obj[i];
 			ring[idx + 1] = obj[i + 1];
@@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
 	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			ring[idx] = obj[i];
 			ring[idx + 1] = obj[i + 1];
@@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	rte_int128_t *ring = (rte_int128_t *)&r[1];
 	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
 			memcpy((void *)(ring + idx),
 				(const void *)(obj + i), 32);
@@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size,
 	unsigned int i;
 	uint32_t *ring = (uint32_t *)&r[1];
 	uint32_t *obj = (uint32_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
 			obj[i] = ring[idx];
 			obj[i + 1] = ring[idx + 1];
@@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
 	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			obj[i] = ring[idx];
 			obj[i + 1] = ring[idx + 1];
@@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
 	uint32_t idx = prod_head & r->mask;
 	rte_int128_t *ring = (rte_int128_t *)&r[1];
 	rte_int128_t *obj = (rte_int128_t *)obj_table;
-	if (likely(idx + n < size)) {
+	if (likely(idx + n <= size)) {
 		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
 			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
 		switch (n & 0x1) {
-- 
2.34.1.575.g55b058a8bb-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ring: fix off by 1 mistake
  2022-01-10 15:09   ` Ananyev, Konstantin
@ 2022-01-11 11:49     ` Andrzej Ostruszka
  0 siblings, 0 replies; 11+ messages in thread
From: Andrzej Ostruszka @ 2022-01-11 11:49 UTC (permalink / raw)
  To: dev; +Cc: Morten Brørup, Olivier Matz, Konstantin Ananyev

Thank you Morten, Olivier and Konstantin for taking look at it.
I've just sent another version, with updates in commit message suggested
by Olivier.

With regards
Andrzej Ostruszka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] ring: optimize corner case for enqueue/dequeue
  2022-01-11 11:37   ` [PATCH] ring: optimize corner case for enqueue/dequeue Andrzej Ostruszka
@ 2022-01-11 12:00     ` Morten Brørup
  2022-01-11 13:46       ` Andrzej Ostruszka
  2022-02-05 16:48     ` David Marchand
  1 sibling, 1 reply; 11+ messages in thread
From: Morten Brørup @ 2022-01-11 12:00 UTC (permalink / raw)
  To: Andrzej Ostruszka, dev; +Cc: Olivier Matz, Konstantin Ananyev

> From: Andrzej Ostruszka [mailto:amo@semihalf.com]
> Sent: Tuesday, 11 January 2022 12.38
> 
> When enqueueing/dequeueing to/from the ring we try to optimize by
> manual
> loop unrolling.  The check for this optimization looks like:
> 
> 	if (likely(idx + n < size)) {
> 
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
> 
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements
> that
> we have in the ring before wrapping to its beginning.
> 
> Fixes: cc4b218790f6 ("ring: support configurable element size")
> Fixes: 286bd05bf70d ("ring: optimisations")
> 
> Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/ring/rte_ring_elem_pvt.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_elem_pvt.h
> b/lib/ring/rte_ring_elem_pvt.h
> index 275ec55393..83788c56e6 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const
> uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	const uint32_t *obj = (const uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			ring[idx] = obj[i];
>  			ring[idx + 1] = obj[i + 1];
> @@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(ring + idx),
>  				(const void *)(obj + i), 32);
> @@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r,
> const uint32_t size,
>  	unsigned int i;
>  	uint32_t *ring = (uint32_t *)&r[1];
>  	uint32_t *obj = (uint32_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
>  	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			obj[i] = ring[idx];
>  			obj[i + 1] = ring[idx + 1];
> @@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r,
> uint32_t prod_head,
>  	uint32_t idx = prod_head & r->mask;
>  	rte_int128_t *ring = (rte_int128_t *)&r[1];
>  	rte_int128_t *obj = (rte_int128_t *)obj_table;
> -	if (likely(idx + n < size)) {
> +	if (likely(idx + n <= size)) {
>  		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>  			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
>  		switch (n & 0x1) {
> --
> 2.34.1.575.g55b058a8bb-goog
> 

Also this version of the patch:

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ring: optimize corner case for enqueue/dequeue
  2022-01-11 12:00     ` Morten Brørup
@ 2022-01-11 13:46       ` Andrzej Ostruszka
  2022-01-11 13:53         ` Morten Brørup
  0 siblings, 1 reply; 11+ messages in thread
From: Andrzej Ostruszka @ 2022-01-11 13:46 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Tue, Jan 11, 2022 at 01:00:25PM +0100, Morten Brørup wrote:
[...]
> Also this version of the patch:
> 
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>

My apologies Morten, don't know how I missed your tag.

With regards
Andrzej Ostruszka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] ring: optimize corner case for enqueue/dequeue
  2022-01-11 13:46       ` Andrzej Ostruszka
@ 2022-01-11 13:53         ` Morten Brørup
  0 siblings, 0 replies; 11+ messages in thread
From: Morten Brørup @ 2022-01-11 13:53 UTC (permalink / raw)
  To: Andrzej Ostruszka; +Cc: dev

> From: Andrzej Ostruszka [mailto:amo@semihalf.com]
> Sent: Tuesday, 11 January 2022 14.46
> 
> On Tue, Jan 11, 2022 at 01:00:25PM +0100, Morten Brørup wrote:
> [...]
> > Also this version of the patch:
> >
> > Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> 
> My apologies Morten, don't know how I missed your tag.
> 
> With regards
> Andrzej Ostruszka

No worries. :-)

I added the tag again, for Patchwork to show a sufficient level of acceptance when the maintainer considers the patch for merging.

-Morten


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ring: optimize corner case for enqueue/dequeue
  2022-01-11 11:37   ` [PATCH] ring: optimize corner case for enqueue/dequeue Andrzej Ostruszka
  2022-01-11 12:00     ` Morten Brørup
@ 2022-02-05 16:48     ` David Marchand
  1 sibling, 0 replies; 11+ messages in thread
From: David Marchand @ 2022-02-05 16:48 UTC (permalink / raw)
  To: Andrzej Ostruszka; +Cc: dev, Olivier Matz, Konstantin Ananyev

On Tue, Jan 11, 2022 at 12:38 PM Andrzej Ostruszka <amo@semihalf.com> wrote:
> When enqueueing/dequeueing to/from the ring we try to optimize by manual
> loop unrolling.  The check for this optimization looks like:
>
>         if (likely(idx + n < size)) {
>
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
>
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements that
> we have in the ring before wrapping to its beginning.
>
> Fixes: cc4b218790f6 ("ring: support configurable element size")
> Fixes: 286bd05bf70d ("ring: optimisations")
>
> Signed-off-by: Andrzej Ostruszka <amo@semihalf.com>
> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-05 16:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-03 14:22 [PATCH 0/1] Minor mistake in ring (en|de)queueing Andrzej Ostruszka
2022-01-03 14:22 ` [PATCH 1/1] ring: fix off by 1 mistake Andrzej Ostruszka
2022-01-03 14:56   ` Morten Brørup
2022-01-06 10:45   ` Olivier Matz
2022-01-10 15:09   ` Ananyev, Konstantin
2022-01-11 11:49     ` Andrzej Ostruszka
2022-01-11 11:37   ` [PATCH] ring: optimize corner case for enqueue/dequeue Andrzej Ostruszka
2022-01-11 12:00     ` Morten Brørup
2022-01-11 13:46       ` Andrzej Ostruszka
2022-01-11 13:53         ` Morten Brørup
2022-02-05 16:48     ` David Marchand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.