All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
@ 2021-05-26 12:29 Yunsheng Lin
  2021-05-27  4:57 ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2021-05-26 12:29 UTC (permalink / raw)
  To: davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer, jasowang

Currently r->queue[] is cleared after r->consumer_head is moved
forward, which makes the __ptr_ring_empty() checking called in
page_pool_refill_alloc_cache() unreliable if the checking is done
after the r->queue clearing and before the consumer_head moving
forward.

Move the r->queue[] clearing after consumer_head moving forward
to make __ptr_ring_empty() checking more reliable.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/ptr_ring.h | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 808f9d3..f32f052 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
 	 * to work correctly.
 	 */
-	int consumer_head = r->consumer_head;
-	int head = consumer_head++;
+	int consumer_head = r->consumer_head + 1;
 
 	/* Once we have processed enough entries invalidate them in
 	 * the ring all at once so producer can reuse their space in the ring.
@@ -271,19 +270,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 	 */
 	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
 		     consumer_head >= r->size)) {
+		int tail = r->consumer_tail;
+		int head = consumer_head;
+
+		if (unlikely(consumer_head >= r->size)) {
+			r->consumer_tail = 0;
+			WRITE_ONCE(r->consumer_head, 0);
+		} else {
+			r->consumer_tail = consumer_head;
+			WRITE_ONCE(r->consumer_head, consumer_head);
+		}
+
 		/* Zero out entries in the reverse order: this way we touch the
 		 * cache line that producer might currently be reading the last;
 		 * producer won't make progress and touch other cache lines
 		 * besides the first one until we write out all entries.
 		 */
-		while (likely(head >= r->consumer_tail))
-			r->queue[head--] = NULL;
-		r->consumer_tail = consumer_head;
-	}
-	if (unlikely(consumer_head >= r->size)) {
-		consumer_head = 0;
-		r->consumer_tail = 0;
+		while (likely(--head >= tail))
+			r->queue[head] = NULL;
+
+		return;
 	}
+
 	/* matching READ_ONCE in __ptr_ring_empty for lockless tests */
 	WRITE_ONCE(r->consumer_head, consumer_head);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-26 12:29 [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable Yunsheng Lin
@ 2021-05-27  4:57 ` Jason Wang
  2021-05-27  6:07   ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2021-05-27  4:57 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer


在 2021/5/26 下午8:29, Yunsheng Lin 写道:
> Currently r->queue[] is cleared after r->consumer_head is moved
> forward, which makes the __ptr_ring_empty() checking called in
> page_pool_refill_alloc_cache() unreliable if the checking is done
> after the r->queue clearing and before the consumer_head moving
> forward.
>
> Move the r->queue[] clearing after consumer_head moving forward
> to make __ptr_ring_empty() checking more reliable.


If I understand this correctly, this can only happens if you run 
__ptr_ring_empty() in parallel with ptr_ring_discard_one().

I think those two needs to be serialized. Or did I miss anything?

Thanks


>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>   include/linux/ptr_ring.h | 26 +++++++++++++++++---------
>   1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 808f9d3..f32f052 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>   	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>   	 * to work correctly.
>   	 */
> -	int consumer_head = r->consumer_head;
> -	int head = consumer_head++;
> +	int consumer_head = r->consumer_head + 1;
>   
>   	/* Once we have processed enough entries invalidate them in
>   	 * the ring all at once so producer can reuse their space in the ring.
> @@ -271,19 +270,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>   	 */
>   	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
>   		     consumer_head >= r->size)) {
> +		int tail = r->consumer_tail;
> +		int head = consumer_head;
> +
> +		if (unlikely(consumer_head >= r->size)) {
> +			r->consumer_tail = 0;
> +			WRITE_ONCE(r->consumer_head, 0);
> +		} else {
> +			r->consumer_tail = consumer_head;
> +			WRITE_ONCE(r->consumer_head, consumer_head);
> +		}
> +
>   		/* Zero out entries in the reverse order: this way we touch the
>   		 * cache line that producer might currently be reading the last;
>   		 * producer won't make progress and touch other cache lines
>   		 * besides the first one until we write out all entries.
>   		 */
> -		while (likely(head >= r->consumer_tail))
> -			r->queue[head--] = NULL;
> -		r->consumer_tail = consumer_head;
> -	}
> -	if (unlikely(consumer_head >= r->size)) {
> -		consumer_head = 0;
> -		r->consumer_tail = 0;
> +		while (likely(--head >= tail))
> +			r->queue[head] = NULL;
> +
> +		return;
>   	}
> +
>   	/* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>   	WRITE_ONCE(r->consumer_head, consumer_head);
>   }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  4:57 ` Jason Wang
@ 2021-05-27  6:07   ` Yunsheng Lin
  2021-05-27  6:53     ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2021-05-27  6:07 UTC (permalink / raw)
  To: Jason Wang, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer

On 2021/5/27 12:57, Jason Wang wrote:
> 
> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>> Currently r->queue[] is cleared after r->consumer_head is moved
>> forward, which makes the __ptr_ring_empty() checking called in
>> page_pool_refill_alloc_cache() unreliable if the checking is done
>> after the r->queue clearing and before the consumer_head moving
>> forward.
>>
>> Move the r->queue[] clearing after consumer_head moving forward
>> to make __ptr_ring_empty() checking more reliable.
> 
> 
> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().

Yes.

> 
> I think those two needs to be serialized. Or did I miss anything?

As the below comment in __ptr_ring_discard_one, if the above is true, I
do not think we need to keep consumer_head valid at all times, right?


	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
	 * to work correctly.
	 */

> 
> Thanks
> 
> 
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>   include/linux/ptr_ring.h | 26 +++++++++++++++++---------
>>   1 file changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>> index 808f9d3..f32f052 100644
>> --- a/include/linux/ptr_ring.h
>> +++ b/include/linux/ptr_ring.h
>> @@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>       /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>        * to work correctly.
>>        */
>> -    int consumer_head = r->consumer_head;
>> -    int head = consumer_head++;
>> +    int consumer_head = r->consumer_head + 1;
>>         /* Once we have processed enough entries invalidate them in
>>        * the ring all at once so producer can reuse their space in the ring.
>> @@ -271,19 +270,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>        */
>>       if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
>>                consumer_head >= r->size)) {
>> +        int tail = r->consumer_tail;
>> +        int head = consumer_head;
>> +
>> +        if (unlikely(consumer_head >= r->size)) {
>> +            r->consumer_tail = 0;
>> +            WRITE_ONCE(r->consumer_head, 0);
>> +        } else {
>> +            r->consumer_tail = consumer_head;
>> +            WRITE_ONCE(r->consumer_head, consumer_head);
>> +        }
>> +
>>           /* Zero out entries in the reverse order: this way we touch the
>>            * cache line that producer might currently be reading the last;
>>            * producer won't make progress and touch other cache lines
>>            * besides the first one until we write out all entries.
>>            */
>> -        while (likely(head >= r->consumer_tail))
>> -            r->queue[head--] = NULL;
>> -        r->consumer_tail = consumer_head;
>> -    }
>> -    if (unlikely(consumer_head >= r->size)) {
>> -        consumer_head = 0;
>> -        r->consumer_tail = 0;
>> +        while (likely(--head >= tail))
>> +            r->queue[head] = NULL;
>> +
>> +        return;
>>       }
>> +
>>       /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>>       WRITE_ONCE(r->consumer_head, consumer_head);
>>   }
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  6:07   ` Yunsheng Lin
@ 2021-05-27  6:53     ` Jason Wang
  2021-05-27  7:21       ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2021-05-27  6:53 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer


在 2021/5/27 下午2:07, Yunsheng Lin 写道:
> On 2021/5/27 12:57, Jason Wang wrote:
>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>> forward, which makes the __ptr_ring_empty() checking called in
>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>> after the r->queue clearing and before the consumer_head moving
>>> forward.
>>>
>>> Move the r->queue[] clearing after consumer_head moving forward
>>> to make __ptr_ring_empty() checking more reliable.
>>
>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
> Yes.
>
>> I think those two needs to be serialized. Or did I miss anything?
> As the below comment in __ptr_ring_discard_one, if the above is true, I
> do not think we need to keep consumer_head valid at all times, right?
>
>
> 	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
> 	 * to work correctly.
> 	 */


I'm not sure I understand. But my point is that you need to synchronize 
the __ptr_ring_discard_one() and __ptr_empty() as explained in the 
comment above __ptr_ring_empty():

/*
  * Test ring empty status without taking any locks.
  *
  * NB: This is only safe to call if ring is never resized.
  *
  * However, if some other CPU consumes ring entries at the same time, 
the value
  * returned is not guaranteed to be correct.
  *
  * In this case - to avoid incorrectly detecting the ring
  * as empty - the CPU consuming the ring entries is responsible
  * for either consuming all ring entries until the ring is empty,
  * or synchronizing with some other CPU and causing it to
  * re-test __ptr_ring_empty and/or consume the ring enteries
  * after the synchronization point.
  *
  * Note: callers invoking this in a loop must use a compiler barrier,
  * for example cpu_relax().
  */

Thanks



>> Thanks
>>
>>
>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>> ---
>>>    include/linux/ptr_ring.h | 26 +++++++++++++++++---------
>>>    1 file changed, 17 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>>> index 808f9d3..f32f052 100644
>>> --- a/include/linux/ptr_ring.h
>>> +++ b/include/linux/ptr_ring.h
>>> @@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>>        /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>         * to work correctly.
>>>         */
>>> -    int consumer_head = r->consumer_head;
>>> -    int head = consumer_head++;
>>> +    int consumer_head = r->consumer_head + 1;
>>>          /* Once we have processed enough entries invalidate them in
>>>         * the ring all at once so producer can reuse their space in the ring.
>>> @@ -271,19 +270,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>>         */
>>>        if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
>>>                 consumer_head >= r->size)) {
>>> +        int tail = r->consumer_tail;
>>> +        int head = consumer_head;
>>> +
>>> +        if (unlikely(consumer_head >= r->size)) {
>>> +            r->consumer_tail = 0;
>>> +            WRITE_ONCE(r->consumer_head, 0);
>>> +        } else {
>>> +            r->consumer_tail = consumer_head;
>>> +            WRITE_ONCE(r->consumer_head, consumer_head);
>>> +        }
>>> +
>>>            /* Zero out entries in the reverse order: this way we touch the
>>>             * cache line that producer might currently be reading the last;
>>>             * producer won't make progress and touch other cache lines
>>>             * besides the first one until we write out all entries.
>>>             */
>>> -        while (likely(head >= r->consumer_tail))
>>> -            r->queue[head--] = NULL;
>>> -        r->consumer_tail = consumer_head;
>>> -    }
>>> -    if (unlikely(consumer_head >= r->size)) {
>>> -        consumer_head = 0;
>>> -        r->consumer_tail = 0;
>>> +        while (likely(--head >= tail))
>>> +            r->queue[head] = NULL;
>>> +
>>> +        return;
>>>        }
>>> +
>>>        /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>>>        WRITE_ONCE(r->consumer_head, consumer_head);
>>>    }
>>
>> .
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  6:53     ` Jason Wang
@ 2021-05-27  7:21       ` Yunsheng Lin
  2021-05-27  8:05         ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2021-05-27  7:21 UTC (permalink / raw)
  To: Jason Wang, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer

On 2021/5/27 14:53, Jason Wang wrote:
> 
> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>> On 2021/5/27 12:57, Jason Wang wrote:
>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>> after the r->queue clearing and before the consumer_head moving
>>>> forward.
>>>>
>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>> to make __ptr_ring_empty() checking more reliable.
>>>
>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>> Yes.
>>
>>> I think those two needs to be serialized. Or did I miss anything?
>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>> do not think we need to keep consumer_head valid at all times, right?
>>
>>
>>     /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>      * to work correctly.
>>      */
> 
> 
> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():

I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
always serialized, then it seems that the below commit is unnecessary?

406de7555424 ("ptr_ring: keep consumer_head valid at all times")

> 
> /*
>  * Test ring empty status without taking any locks.
>  *
>  * NB: This is only safe to call if ring is never resized.
>  *
>  * However, if some other CPU consumes ring entries at the same time, the value
>  * returned is not guaranteed to be correct.
>  *
>  * In this case - to avoid incorrectly detecting the ring
>  * as empty - the CPU consuming the ring entries is responsible
>  * for either consuming all ring entries until the ring is empty,
>  * or synchronizing with some other CPU and causing it to
>  * re-test __ptr_ring_empty and/or consume the ring enteries
>  * after the synchronization point.

I am not sure I understand "incorrectly detecting the ring as empty"
means, is it because of the data race described in the commit log?
Or other data race? I can not think of other data race if consuming
and __ptr_ring_empty() is serialized:)

I am agreed that __ptr_ring_empty() checking is not totally reliable
without taking r->consumer_lock, that is why I use "more reliable"
in the title:)



>  *
>  * Note: callers invoking this in a loop must use a compiler barrier,
>  * for example cpu_relax().
>  */
> 
> Thanks
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  7:21       ` Yunsheng Lin
@ 2021-05-27  8:05         ` Jason Wang
  2021-05-27  9:03           ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2021-05-27  8:05 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer


在 2021/5/27 下午3:21, Yunsheng Lin 写道:
> On 2021/5/27 14:53, Jason Wang wrote:
>> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>>> On 2021/5/27 12:57, Jason Wang wrote:
>>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>>> after the r->queue clearing and before the consumer_head moving
>>>>> forward.
>>>>>
>>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>>> to make __ptr_ring_empty() checking more reliable.
>>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>>> Yes.
>>>
>>>> I think those two needs to be serialized. Or did I miss anything?
>>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>>> do not think we need to keep consumer_head valid at all times, right?
>>>
>>>
>>>      /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>       * to work correctly.
>>>       */
>>
>> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
> I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
> always serialized, then it seems that the below commit is unnecessary?


Just to make sure we are at the same page. What I really meant is 
"synchronized" not "serialized". So they can be called at the same time 
but need synchronization.


>
> 406de7555424 ("ptr_ring: keep consumer_head valid at all times")


This still needed in this case.


>
>> /*
>>   * Test ring empty status without taking any locks.
>>   *
>>   * NB: This is only safe to call if ring is never resized.
>>   *
>>   * However, if some other CPU consumes ring entries at the same time, the value
>>   * returned is not guaranteed to be correct.
>>   *
>>   * In this case - to avoid incorrectly detecting the ring
>>   * as empty - the CPU consuming the ring entries is responsible
>>   * for either consuming all ring entries until the ring is empty,
>>   * or synchronizing with some other CPU and causing it to
>>   * re-test __ptr_ring_empty and/or consume the ring enteries
>>   * after the synchronization point.
> I am not sure I understand "incorrectly detecting the ring as empty"
> means, is it because of the data race described in the commit log?


It means "the ring might be empty but __ptr_ring_empty() returns false".


> Or other data race? I can not think of other data race if consuming
> and __ptr_ring_empty() is serialized:)
>
> I am agreed that __ptr_ring_empty() checking is not totally reliable
> without taking r->consumer_lock, that is why I use "more reliable"
> in the title:)


Is __ptr_ring_empty() synchronized with the consumer in your case? If 
yes, have you done some benchmark to see the difference?

Have a look at page pool, this only helps when multiple refill request 
happens in parallel which can make some of the refill return early if 
the ring has been consumed.

This is the slow-path and I'm not sure we see any difference. If one the 
request runs faster then the following request will go through the fast 
path.

If it really helps, can we do it more simpler by:


diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 808f9d3ee546..c3a72dc77337 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -264,6 +264,10 @@ static inline void __ptr_ring_discard_one(struct 
ptr_ring *r)
         int consumer_head = r->consumer_head;
         int head = consumer_head++;

+        /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
+       WRITE_ONCE(r->consumer_head,
+                   consumer_head < r->size ? consumer_head : 0);
+
         /* Once we have processed enough entries invalidate them in
          * the ring all at once so producer can reuse their space in 
the ring.
          * We also do this when we reach end of the ring - not mandatory
@@ -281,11 +285,8 @@ static inline void __ptr_ring_discard_one(struct 
ptr_ring *r)
                 r->consumer_tail = consumer_head;
         }
         if (unlikely(consumer_head >= r->size)) {
-               consumer_head = 0;
                 r->consumer_tail = 0;
         }
-       /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
-       WRITE_ONCE(r->consumer_head, consumer_head);
  }

  static inline void *__ptr_ring_consume(struct ptr_ring *r)


Thanks


>
>
>
>>   *
>>   * Note: callers invoking this in a loop must use a compiler barrier,
>>   * for example cpu_relax().
>>   */
>>
>> Thanks
>>
>>
>>


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  8:05         ` Jason Wang
@ 2021-05-27  9:03           ` Yunsheng Lin
  2021-05-28  1:31             ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2021-05-27  9:03 UTC (permalink / raw)
  To: Jason Wang, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer

On 2021/5/27 16:05, Jason Wang wrote:
> 
> 在 2021/5/27 下午3:21, Yunsheng Lin 写道:
>> On 2021/5/27 14:53, Jason Wang wrote:
>>> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>>>> On 2021/5/27 12:57, Jason Wang wrote:
>>>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>>>> after the r->queue clearing and before the consumer_head moving
>>>>>> forward.
>>>>>>
>>>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>>>> to make __ptr_ring_empty() checking more reliable.
>>>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>>>> Yes.
>>>>
>>>>> I think those two needs to be serialized. Or did I miss anything?
>>>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>>>> do not think we need to keep consumer_head valid at all times, right?
>>>>
>>>>
>>>>      /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>>       * to work correctly.
>>>>       */
>>>
>>> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
>> I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
>> always serialized, then it seems that the below commit is unnecessary?
> 
> 
> Just to make sure we are at the same page. What I really meant is "synchronized" not "serialized". So they can be called at the same time but need synchronization.
> 
> 
>>
>> 406de7555424 ("ptr_ring: keep consumer_head valid at all times")
> 
> 
> This still needed in this case.
> 
> 
>>
>>> /*
>>>   * Test ring empty status without taking any locks.
>>>   *
>>>   * NB: This is only safe to call if ring is never resized.
>>>   *
>>>   * However, if some other CPU consumes ring entries at the same time, the value
>>>   * returned is not guaranteed to be correct.
>>>   *
>>>   * In this case - to avoid incorrectly detecting the ring
>>>   * as empty - the CPU consuming the ring entries is responsible
>>>   * for either consuming all ring entries until the ring is empty,
>>>   * or synchronizing with some other CPU and causing it to
>>>   * re-test __ptr_ring_empty and/or consume the ring enteries
>>>   * after the synchronization point.
>> I am not sure I understand "incorrectly detecting the ring as empty"
>> means, is it because of the data race described in the commit log?
> 
> 
> It means "the ring might be empty but __ptr_ring_empty() returns false".

But the ring might be non-empty but __ptr_ring_empty() returns true
for the data race described in the commit log:)

> 
> 
>> Or other data race? I can not think of other data race if consuming
>> and __ptr_ring_empty() is serialized:)
>>
>> I am agreed that __ptr_ring_empty() checking is not totally reliable
>> without taking r->consumer_lock, that is why I use "more reliable"
>> in the title:)
> 
> 
> Is __ptr_ring_empty() synchronized with the consumer in your case? If yes, have you done some benchmark to see the difference?
> 
> Have a look at page pool, this only helps when multiple refill request happens in parallel which can make some of the refill return early if the ring has been consumed.
> 
> This is the slow-path and I'm not sure we see any difference. If one the request runs faster then the following request will go through the fast path.

Yes, I am agreed there may not be any difference.
But it is better to make it more reliable, right?

> 
> If it really helps, can we do it more simpler by:
> 
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 808f9d3ee546..c3a72dc77337 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -264,6 +264,10 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>         int consumer_head = r->consumer_head;
>         int head = consumer_head++;
> 
> +        /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
> +       WRITE_ONCE(r->consumer_head,
> +                   consumer_head < r->size ? consumer_head : 0);
> +
>         /* Once we have processed enough entries invalidate them in
>          * the ring all at once so producer can reuse their space in the ring.
>          * We also do this when we reach end of the ring - not mandatory
> @@ -281,11 +285,8 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>                 r->consumer_tail = consumer_head;
>         }
>         if (unlikely(consumer_head >= r->size)) {

What I am thinking is that we can remove the above testing for
the likely case when the above checking is moved into the body
of "if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
consumer_head >= r->size))".

Or is there any specific reason why we keep the testing for likely
case?


> -               consumer_head = 0;
>                 r->consumer_tail = 0;
>         }
> -       /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
> -       WRITE_ONCE(r->consumer_head, consumer_head);
>  }
> 
>  static inline void *__ptr_ring_consume(struct ptr_ring *r)
> 
> 
> Thanks
> 
> 
>>
>>
>>
>>>   *
>>>   * Note: callers invoking this in a loop must use a compiler barrier,
>>>   * for example cpu_relax().
>>>   */
>>>
>>> Thanks
>>>
>>>
>>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-27  9:03           ` Yunsheng Lin
@ 2021-05-28  1:31             ` Jason Wang
  2021-05-28  2:26               ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2021-05-28  1:31 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer


在 2021/5/27 下午5:03, Yunsheng Lin 写道:
> On 2021/5/27 16:05, Jason Wang wrote:
>> 在 2021/5/27 下午3:21, Yunsheng Lin 写道:
>>> On 2021/5/27 14:53, Jason Wang wrote:
>>>> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>>>>> On 2021/5/27 12:57, Jason Wang wrote:
>>>>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>>>>> after the r->queue clearing and before the consumer_head moving
>>>>>>> forward.
>>>>>>>
>>>>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>>>>> to make __ptr_ring_empty() checking more reliable.
>>>>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>>>>> Yes.
>>>>>
>>>>>> I think those two needs to be serialized. Or did I miss anything?
>>>>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>>>>> do not think we need to keep consumer_head valid at all times, right?
>>>>>
>>>>>
>>>>>       /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>>>        * to work correctly.
>>>>>        */
>>>> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
>>> I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
>>> always serialized, then it seems that the below commit is unnecessary?
>>
>> Just to make sure we are at the same page. What I really meant is "synchronized" not "serialized". So they can be called at the same time but need synchronization.
>>
>>
>>> 406de7555424 ("ptr_ring: keep consumer_head valid at all times")
>>
>> This still needed in this case.
>>
>>
>>>> /*
>>>>    * Test ring empty status without taking any locks.
>>>>    *
>>>>    * NB: This is only safe to call if ring is never resized.
>>>>    *
>>>>    * However, if some other CPU consumes ring entries at the same time, the value
>>>>    * returned is not guaranteed to be correct.
>>>>    *
>>>>    * In this case - to avoid incorrectly detecting the ring
>>>>    * as empty - the CPU consuming the ring entries is responsible
>>>>    * for either consuming all ring entries until the ring is empty,
>>>>    * or synchronizing with some other CPU and causing it to
>>>>    * re-test __ptr_ring_empty and/or consume the ring enteries
>>>>    * after the synchronization point.
>>> I am not sure I understand "incorrectly detecting the ring as empty"
>>> means, is it because of the data race described in the commit log?
>>
>> It means "the ring might be empty but __ptr_ring_empty() returns false".
> But the ring might be non-empty but __ptr_ring_empty() returns true
> for the data race described in the commit log:)


Which commit log?


>
>>
>>> Or other data race? I can not think of other data race if consuming
>>> and __ptr_ring_empty() is serialized:)
>>>
>>> I am agreed that __ptr_ring_empty() checking is not totally reliable
>>> without taking r->consumer_lock, that is why I use "more reliable"
>>> in the title:)
>>
>> Is __ptr_ring_empty() synchronized with the consumer in your case? If yes, have you done some benchmark to see the difference?
>>
>> Have a look at page pool, this only helps when multiple refill request happens in parallel which can make some of the refill return early if the ring has been consumed.
>>
>> This is the slow-path and I'm not sure we see any difference. If one the request runs faster then the following request will go through the fast path.
> Yes, I am agreed there may not be any difference.
> But it is better to make it more reliable, right?


No, any performance optimization must be benchmark to show obvious 
difference to be accepted.

ptr_ring has been used by various subsystems so we should not risk our 
self-eves to accept theoretical optimizations.


>
>> If it really helps, can we do it more simpler by:
>>
>>
>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>> index 808f9d3ee546..c3a72dc77337 100644
>> --- a/include/linux/ptr_ring.h
>> +++ b/include/linux/ptr_ring.h
>> @@ -264,6 +264,10 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>          int consumer_head = r->consumer_head;
>>          int head = consumer_head++;
>>
>> +        /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>> +       WRITE_ONCE(r->consumer_head,
>> +                   consumer_head < r->size ? consumer_head : 0);
>> +
>>          /* Once we have processed enough entries invalidate them in
>>           * the ring all at once so producer can reuse their space in the ring.
>>           * We also do this when we reach end of the ring - not mandatory
>> @@ -281,11 +285,8 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>>                  r->consumer_tail = consumer_head;
>>          }
>>          if (unlikely(consumer_head >= r->size)) {
> What I am thinking is that we can remove the above testing for
> the likely case when the above checking is moved into the body
> of "if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
> consumer_head >= r->size))".
>
> Or is there any specific reason why we keep the testing for likely
> case?


No reason but any optimization must be tested to show differences before 
being accepted.

Thanks


>
>
>> -               consumer_head = 0;
>>                  r->consumer_tail = 0;
>>          }
>> -       /* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>> -       WRITE_ONCE(r->consumer_head, consumer_head);
>>   }
>>
>>   static inline void *__ptr_ring_consume(struct ptr_ring *r)
>>
>>
>> Thanks
>>
>>
>>>
>>>
>>>>    *
>>>>    * Note: callers invoking this in a loop must use a compiler barrier,
>>>>    * for example cpu_relax().
>>>>    */
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>
>> .
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-28  1:31             ` Jason Wang
@ 2021-05-28  2:26               ` Yunsheng Lin
  2021-05-28  2:29                 ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2021-05-28  2:26 UTC (permalink / raw)
  To: Jason Wang, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer

On 2021/5/28 9:31, Jason Wang wrote:
> 
> 在 2021/5/27 下午5:03, Yunsheng Lin 写道:
>> On 2021/5/27 16:05, Jason Wang wrote:
>>> 在 2021/5/27 下午3:21, Yunsheng Lin 写道:
>>>> On 2021/5/27 14:53, Jason Wang wrote:
>>>>> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>>>>>> On 2021/5/27 12:57, Jason Wang wrote:
>>>>>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>>>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>>>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>>>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>>>>>> after the r->queue clearing and before the consumer_head moving
>>>>>>>> forward.
>>>>>>>>
>>>>>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>>>>>> to make __ptr_ring_empty() checking more reliable.
>>>>>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>>>>>> Yes.
>>>>>>
>>>>>>> I think those two needs to be serialized. Or did I miss anything?
>>>>>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>>>>>> do not think we need to keep consumer_head valid at all times, right?
>>>>>>
>>>>>>
>>>>>>       /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>>>>        * to work correctly.
>>>>>>        */
>>>>> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
>>>> I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
>>>> always serialized, then it seems that the below commit is unnecessary?
>>>
>>> Just to make sure we are at the same page. What I really meant is "synchronized" not "serialized". So they can be called at the same time but need synchronization.
>>>
>>>
>>>> 406de7555424 ("ptr_ring: keep consumer_head valid at all times")
>>>
>>> This still needed in this case.
>>>
>>>
>>>>> /*
>>>>>    * Test ring empty status without taking any locks.
>>>>>    *
>>>>>    * NB: This is only safe to call if ring is never resized.
>>>>>    *
>>>>>    * However, if some other CPU consumes ring entries at the same time, the value
>>>>>    * returned is not guaranteed to be correct.
>>>>>    *
>>>>>    * In this case - to avoid incorrectly detecting the ring
>>>>>    * as empty - the CPU consuming the ring entries is responsible
>>>>>    * for either consuming all ring entries until the ring is empty,
>>>>>    * or synchronizing with some other CPU and causing it to
>>>>>    * re-test __ptr_ring_empty and/or consume the ring enteries
>>>>>    * after the synchronization point.
>>>> I am not sure I understand "incorrectly detecting the ring as empty"
>>>> means, is it because of the data race described in the commit log?
>>>
>>> It means "the ring might be empty but __ptr_ring_empty() returns false".
>> But the ring might be non-empty but __ptr_ring_empty() returns true
>> for the data race described in the commit log:)
> 
> 
> Which commit log?

this commit log.
If the data race described in this commit log happens, the ring might be
non-empty, but __ptr_ring_empty() returns true.


> 
> 
>>
>>>
>>>> Or other data race? I can not think of other data race if consuming
>>>> and __ptr_ring_empty() is serialized:)
>>>>
>>>> I am agreed that __ptr_ring_empty() checking is not totally reliable
>>>> without taking r->consumer_lock, that is why I use "more reliable"
>>>> in the title:)
>>>
>>> Is __ptr_ring_empty() synchronized with the consumer in your case? If yes, have you done some benchmark to see the difference?
>>>
>>> Have a look at page pool, this only helps when multiple refill request happens in parallel which can make some of the refill return early if the ring has been consumed.
>>>
>>> This is the slow-path and I'm not sure we see any difference. If one the request runs faster then the following request will go through the fast path.
>> Yes, I am agreed there may not be any difference.
>> But it is better to make it more reliable, right?
> 
> 
> No, any performance optimization must be benchmark to show obvious difference to be accepted.
> 
> ptr_ring has been used by various subsystems so we should not risk our self-eves to accept theoretical optimizations.

As a matter of fact, I am not treating it as a performance optimization for this patch.
I treated it as improvement for the checking of __ptr_ring_empty().
But you are right that we need to ensure there is not performance regression when improving
it.

Any existing and easy-to-setup testcase to benchmark the ptr_ring performance?

> 
> 
>>
>>> If it really helps, can we do it more simpler by:
>>>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable
  2021-05-28  2:26               ` Yunsheng Lin
@ 2021-05-28  2:29                 ` Jason Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2021-05-28  2:29 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: will, peterz, paulmck, linux-kernel, netdev, mst, brouer


在 2021/5/28 上午10:26, Yunsheng Lin 写道:
> On 2021/5/28 9:31, Jason Wang wrote:
>> 在 2021/5/27 下午5:03, Yunsheng Lin 写道:
>>> On 2021/5/27 16:05, Jason Wang wrote:
>>>> 在 2021/5/27 下午3:21, Yunsheng Lin 写道:
>>>>> On 2021/5/27 14:53, Jason Wang wrote:
>>>>>> 在 2021/5/27 下午2:07, Yunsheng Lin 写道:
>>>>>>> On 2021/5/27 12:57, Jason Wang wrote:
>>>>>>>> 在 2021/5/26 下午8:29, Yunsheng Lin 写道:
>>>>>>>>> Currently r->queue[] is cleared after r->consumer_head is moved
>>>>>>>>> forward, which makes the __ptr_ring_empty() checking called in
>>>>>>>>> page_pool_refill_alloc_cache() unreliable if the checking is done
>>>>>>>>> after the r->queue clearing and before the consumer_head moving
>>>>>>>>> forward.
>>>>>>>>>
>>>>>>>>> Move the r->queue[] clearing after consumer_head moving forward
>>>>>>>>> to make __ptr_ring_empty() checking more reliable.
>>>>>>>> If I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
>>>>>>> Yes.
>>>>>>>
>>>>>>>> I think those two needs to be serialized. Or did I miss anything?
>>>>>>> As the below comment in __ptr_ring_discard_one, if the above is true, I
>>>>>>> do not think we need to keep consumer_head valid at all times, right?
>>>>>>>
>>>>>>>
>>>>>>>        /* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>>>>>>>         * to work correctly.
>>>>>>>         */
>>>>>> I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
>>>>> I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
>>>>> always serialized, then it seems that the below commit is unnecessary?
>>>> Just to make sure we are at the same page. What I really meant is "synchronized" not "serialized". So they can be called at the same time but need synchronization.
>>>>
>>>>
>>>>> 406de7555424 ("ptr_ring: keep consumer_head valid at all times")
>>>> This still needed in this case.
>>>>
>>>>
>>>>>> /*
>>>>>>     * Test ring empty status without taking any locks.
>>>>>>     *
>>>>>>     * NB: This is only safe to call if ring is never resized.
>>>>>>     *
>>>>>>     * However, if some other CPU consumes ring entries at the same time, the value
>>>>>>     * returned is not guaranteed to be correct.
>>>>>>     *
>>>>>>     * In this case - to avoid incorrectly detecting the ring
>>>>>>     * as empty - the CPU consuming the ring entries is responsible
>>>>>>     * for either consuming all ring entries until the ring is empty,
>>>>>>     * or synchronizing with some other CPU and causing it to
>>>>>>     * re-test __ptr_ring_empty and/or consume the ring enteries
>>>>>>     * after the synchronization point.
>>>>> I am not sure I understand "incorrectly detecting the ring as empty"
>>>>> means, is it because of the data race described in the commit log?
>>>> It means "the ring might be empty but __ptr_ring_empty() returns false".
>>> But the ring might be non-empty but __ptr_ring_empty() returns true
>>> for the data race described in the commit log:)
>>
>> Which commit log?
> this commit log.
> If the data race described in this commit log happens, the ring might be
> non-empty, but __ptr_ring_empty() returns true.
>
>
>>
>>>>> Or other data race? I can not think of other data race if consuming
>>>>> and __ptr_ring_empty() is serialized:)
>>>>>
>>>>> I am agreed that __ptr_ring_empty() checking is not totally reliable
>>>>> without taking r->consumer_lock, that is why I use "more reliable"
>>>>> in the title:)
>>>> Is __ptr_ring_empty() synchronized with the consumer in your case? If yes, have you done some benchmark to see the difference?
>>>>
>>>> Have a look at page pool, this only helps when multiple refill request happens in parallel which can make some of the refill return early if the ring has been consumed.
>>>>
>>>> This is the slow-path and I'm not sure we see any difference. If one the request runs faster then the following request will go through the fast path.
>>> Yes, I am agreed there may not be any difference.
>>> But it is better to make it more reliable, right?
>>
>> No, any performance optimization must be benchmark to show obvious difference to be accepted.
>>
>> ptr_ring has been used by various subsystems so we should not risk our self-eves to accept theoretical optimizations.
> As a matter of fact, I am not treating it as a performance optimization for this patch.
> I treated it as improvement for the checking of __ptr_ring_empty().
> But you are right that we need to ensure there is not performance regression when improving
> it.
>
> Any existing and easy-to-setup testcase to benchmark the ptr_ring performance?


You probably can start with a simple test in:

tools/virtio/ringtest/ptr_ring.c

Thanks


>
>>
>>>> If it really helps, can we do it more simpler by:
>>>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-28  2:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26 12:29 [PATCH net-next] ptr_ring: make __ptr_ring_empty() checking more reliable Yunsheng Lin
2021-05-27  4:57 ` Jason Wang
2021-05-27  6:07   ` Yunsheng Lin
2021-05-27  6:53     ` Jason Wang
2021-05-27  7:21       ` Yunsheng Lin
2021-05-27  8:05         ` Jason Wang
2021-05-27  9:03           ` Yunsheng Lin
2021-05-28  1:31             ` Jason Wang
2021-05-28  2:26               ` Yunsheng Lin
2021-05-28  2:29                 ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.