[net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds

All of lore.kernel.org
 help / color / mirror / Atom feed

* [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
@ 2017-12-28  3:50 John Fastabend
  2018-01-02 16:52 ` David Miller
  2018-01-02 16:53 ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: John Fastabend @ 2017-12-28  3:50 UTC (permalink / raw)
  To: jakub.kicinski, davem, mst; +Cc: xiyou.wangcong, jiri, netdev

When running consumer and/or producer operations and empty checks in
parallel its possible to have the empty check run past the end of the
array. The scenario occurs when an empty check is run while
__ptr_ring_discard_one() is in progress. Specifically after the
consumer_head is incremented but before (consumer_head >= ring_size)
check is made and the consumer head is zeroe'd.

To resolve this, without having to rework how consumer/producer ops
work on the array, simply add an extra dummy slot to the end of the
array. Even if we did a rework to avoid the extra slot it looks
like the normal case checks would suffer some so best to just
allocate an extra pointer.

Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/ptr_ring.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 6866df4..13fb06a 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -447,7 +447,12 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,

 static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
 {
-	return kcalloc(size, sizeof(void *), gfp);
+	/* Allocate an extra dummy element at end of ring to avoid consumer head
+	 * or produce head access past the end of the array. Possible when
+	 * producer/consumer operations and __ptr_ring_peek operations run in
+	 * parallel.
+	 */
+	return kcalloc(size + 1, sizeof(void *), gfp);
 }

 static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2017-12-28  3:50 [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds John Fastabend
@ 2018-01-02 16:52 ` David Miller
  2018-01-02 17:01   ` Michael S. Tsirkin
  2018-01-02 16:53 ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2018-01-02 16:52 UTC (permalink / raw)
  To: john.fastabend; +Cc: jakub.kicinski, mst, xiyou.wangcong, jiri, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 27 Dec 2017 19:50:25 -0800

> When running consumer and/or producer operations and empty checks in
> parallel its possible to have the empty check run past the end of the
> array. The scenario occurs when an empty check is run while
> __ptr_ring_discard_one() is in progress. Specifically after the
> consumer_head is incremented but before (consumer_head >= ring_size)
> check is made and the consumer head is zeroe'd.
> 
> To resolve this, without having to rework how consumer/producer ops
> work on the array, simply add an extra dummy slot to the end of the
> array. Even if we did a rework to avoid the extra slot it looks
> like the normal case checks would suffer some so best to just
> allocate an extra pointer.
> 
> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied, thanks John.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2017-12-28  3:50 [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds John Fastabend
  2018-01-02 16:52 ` David Miller
@ 2018-01-02 16:53 ` Michael S. Tsirkin
  2018-01-02 17:01   ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-02 16:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: jakub.kicinski, davem, xiyou.wangcong, jiri, netdev

On Wed, Dec 27, 2017 at 07:50:25PM -0800, John Fastabend wrote:
> When running consumer and/or producer operations and empty checks in
> parallel its possible to have the empty check run past the end of the
> array. The scenario occurs when an empty check is run while
> __ptr_ring_discard_one() is in progress. Specifically after the
> consumer_head is incremented but before (consumer_head >= ring_size)
> check is made and the consumer head is zeroe'd.
> 
> To resolve this, without having to rework how consumer/producer ops
> work on the array, simply add an extra dummy slot to the end of the
> array. Even if we did a rework to avoid the extra slot it looks
> like the normal case checks would suffer some so best to just
> allocate an extra pointer.
> 
> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>




> ---
>  include/linux/ptr_ring.h |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 6866df4..13fb06a 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -447,7 +447,12 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
>  
>  static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
>  {
> -	return kcalloc(size, sizeof(void *), gfp);
> +	/* Allocate an extra dummy element at end of ring to avoid consumer head
> +	 * or produce head access past the end of the array. Possible when
> +	 * producer/consumer operations and __ptr_ring_peek operations run in
> +	 * parallel.
> +	 */
> +	return kcalloc(size + 1, sizeof(void *), gfp);
>  }
>  
>  static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)


Well the peek will return a false negative then, won't it?

So I kind of prefer just fixing the consumer.  The first step I think
would look something like the below untested patch.  Pls take a look.  I
suspect we'll need a memory barrier too.

I wonder though: are false positives or negatives ever a problem?

Would it be a big deal to just take a lock there, and
avoid trying to support a lockless peek?


It would definitely be more straight-forward to just
remove the promise to support a lockless peek.

Thoughts?

-->

ptr_ring: keep consumer_head valid at all times

The comment near __ptr_ring_peek says: 
 
 * If ring is never resized, and if the pointer is merely 
 * tested, there's no need to take the lock - see e.g.  __ptr_ring_empty. 

but this was in fact never possible.

Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 37b4bb2..802375f 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -236,22 +236,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 	/* Fundamentally, what we want to do is update consumer
 	 * index and zero out the entry so producer can reuse it.
 	 * Doing it naively at each consume would be as simple as:
-	 *       r->queue[r->consumer++] = NULL;
-	 *       if (unlikely(r->consumer >= r->size))
-	 *               r->consumer = 0;
+	 *       consumer = r->consumer;
+	 *       r->queue[consumer++] = NULL;
+	 *       if (unlikely(consumer >= r->size))
+	 *               consumer = 0;
+	 *       r->consumer = consumer;
 	 * but that is suboptimal when the ring is full as producer is writing
 	 * out new entries in the same cache line.  Defer these updates until a
 	 * batch of entries has been consumed.
 	 */
-	int head = r->consumer_head++;
+	/* Note: we must keep consumer_head valid at all times for __ptr_ring_peek
+	 * to work correctly.
+	 */
+	int consumer_head = r->consumer_head;
+	int head = consumer_head++;
 
 	/* Once we have processed enough entries invalidate them in
 	 * the ring all at once so producer can reuse their space in the ring.
 	 * We also do this when we reach end of the ring - not mandatory
 	 * but helps keep the implementation simple.
 	 */
-	if (unlikely(r->consumer_head - r->consumer_tail >= r->batch ||
-		     r->consumer_head >= r->size)) {
+	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
+		     consumer_head >= r->size)) {
 		/* Zero out entries in the reverse order: this way we touch the
 		 * cache line that producer might currently be reading the last;
 		 * producer won't make progress and touch other cache lines
@@ -259,12 +265,13 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 		 */
 		while (likely(head >= r->consumer_tail))
 			r->queue[head--] = NULL;
-		r->consumer_tail = r->consumer_head;
+		r->consumer_tail = consumer_head;
 	}
-	if (unlikely(r->consumer_head >= r->size)) {
-		r->consumer_head = 0;
+	if (unlikely(consumer_head >= r->size)) {
+		consumer_head = 0;
 		r->consumer_tail = 0;
 	}
+	r->consumer_head = consumer_head;
 }
 
 static inline void *__ptr_ring_consume(struct ptr_ring *r)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 16:52 ` David Miller
@ 2018-01-02 17:01   ` Michael S. Tsirkin
  2018-01-02 17:17     ` Michael S. Tsirkin
  2018-01-02 18:33     ` David Miller
  0 siblings, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-02 17:01 UTC (permalink / raw)
  To: David Miller; +Cc: john.fastabend, jakub.kicinski, xiyou.wangcong, jiri, netdev

On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 27 Dec 2017 19:50:25 -0800
> 
> > When running consumer and/or producer operations and empty checks in
> > parallel its possible to have the empty check run past the end of the
> > array. The scenario occurs when an empty check is run while
> > __ptr_ring_discard_one() is in progress. Specifically after the
> > consumer_head is incremented but before (consumer_head >= ring_size)
> > check is made and the consumer head is zeroe'd.
> > 
> > To resolve this, without having to rework how consumer/producer ops
> > work on the array, simply add an extra dummy slot to the end of the
> > array. Even if we did a rework to avoid the extra slot it looks
> > like the normal case checks would suffer some so best to just
> > allocate an extra pointer.
> > 
> > Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> 
> Applied, thanks John.

I think that patch is wrong. I'd rather it got reverted.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 16:53 ` Michael S. Tsirkin
@ 2018-01-02 17:01   ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-02 17:01 UTC (permalink / raw)
  To: John Fastabend; +Cc: jakub.kicinski, davem, xiyou.wangcong, jiri, netdev

On Tue, Jan 02, 2018 at 06:53:08PM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 27, 2017 at 07:50:25PM -0800, John Fastabend wrote:
> > When running consumer and/or producer operations and empty checks in
> > parallel its possible to have the empty check run past the end of the
> > array. The scenario occurs when an empty check is run while
> > __ptr_ring_discard_one() is in progress. Specifically after the
> > consumer_head is incremented but before (consumer_head >= ring_size)
> > check is made and the consumer head is zeroe'd.
> > 
> > To resolve this, without having to rework how consumer/producer ops
> > work on the array, simply add an extra dummy slot to the end of the
> > array. Even if we did a rework to avoid the extra slot it looks
> > like the normal case checks would suffer some so best to just
> > allocate an extra pointer.
> > 
> > Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> 
> 
> 
> 
> > ---
> >  include/linux/ptr_ring.h |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 6866df4..13fb06a 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -447,7 +447,12 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
> >  
> >  static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
> >  {
> > -	return kcalloc(size, sizeof(void *), gfp);
> > +	/* Allocate an extra dummy element at end of ring to avoid consumer head
> > +	 * or produce head access past the end of the array. Possible when
> > +	 * producer/consumer operations and __ptr_ring_peek operations run in
> > +	 * parallel.
> > +	 */
> > +	return kcalloc(size + 1, sizeof(void *), gfp);
> >  }
> >  
> >  static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
> 
> 
> Well the peek will return a false negative then, won't it?
> 
> So I kind of prefer just fixing the consumer.  The first step I think
> would look something like the below untested patch.  Pls take a look.  I
> suspect we'll need a memory barrier too.
> 
> I wonder though: are false positives or negatives ever a problem?
> 
> Would it be a big deal to just take a lock there, and
> avoid trying to support a lockless peek?
> 
> 
> It would definitely be more straight-forward to just
> remove the promise to support a lockless peek.
> 
> Thoughts?

In fact, the API says:

 * Callers must take consumer_lock
 * if they dereference the pointer - see e.g. PTR_RING_PEEK_CALL.
 * If ring is never resized, and if the pointer is merely
 * tested, there's no need to take the lock - see e.g.  __ptr_ring_empty.

So it looks like the API is actually misused here as callers
will dereferences the skb returned.


> -->
> 
> ptr_ring: keep consumer_head valid at all times
> 
> The comment near __ptr_ring_peek says: 
>  
>  * If ring is never resized, and if the pointer is merely 
>  * tested, there's no need to take the lock - see e.g.  __ptr_ring_empty. 
> 
> but this was in fact never possible.
> 
> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 37b4bb2..802375f 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -236,22 +236,28 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>  	/* Fundamentally, what we want to do is update consumer
>  	 * index and zero out the entry so producer can reuse it.
>  	 * Doing it naively at each consume would be as simple as:
> -	 *       r->queue[r->consumer++] = NULL;
> -	 *       if (unlikely(r->consumer >= r->size))
> -	 *               r->consumer = 0;
> +	 *       consumer = r->consumer;
> +	 *       r->queue[consumer++] = NULL;
> +	 *       if (unlikely(consumer >= r->size))
> +	 *               consumer = 0;
> +	 *       r->consumer = consumer;
>  	 * but that is suboptimal when the ring is full as producer is writing
>  	 * out new entries in the same cache line.  Defer these updates until a
>  	 * batch of entries has been consumed.
>  	 */
> -	int head = r->consumer_head++;
> +	/* Note: we must keep consumer_head valid at all times for __ptr_ring_peek
> +	 * to work correctly.
> +	 */
> +	int consumer_head = r->consumer_head;
> +	int head = consumer_head++;
>  
>  	/* Once we have processed enough entries invalidate them in
>  	 * the ring all at once so producer can reuse their space in the ring.
>  	 * We also do this when we reach end of the ring - not mandatory
>  	 * but helps keep the implementation simple.
>  	 */
> -	if (unlikely(r->consumer_head - r->consumer_tail >= r->batch ||
> -		     r->consumer_head >= r->size)) {
> +	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
> +		     consumer_head >= r->size)) {
>  		/* Zero out entries in the reverse order: this way we touch the
>  		 * cache line that producer might currently be reading the last;
>  		 * producer won't make progress and touch other cache lines
> @@ -259,12 +265,13 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>  		 */
>  		while (likely(head >= r->consumer_tail))
>  			r->queue[head--] = NULL;
> -		r->consumer_tail = r->consumer_head;
> +		r->consumer_tail = consumer_head;
>  	}
> -	if (unlikely(r->consumer_head >= r->size)) {
> -		r->consumer_head = 0;
> +	if (unlikely(consumer_head >= r->size)) {
> +		consumer_head = 0;
>  		r->consumer_tail = 0;
>  	}
> +	r->consumer_head = consumer_head;
>  }
>  
>  static inline void *__ptr_ring_consume(struct ptr_ring *r)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 17:01   ` Michael S. Tsirkin
@ 2018-01-02 17:17     ` Michael S. Tsirkin
  2018-01-02 21:27       ` John Fastabend
  2018-01-02 18:33     ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-02 17:17 UTC (permalink / raw)
  To: David Miller; +Cc: john.fastabend, jakub.kicinski, xiyou.wangcong, jiri, netdev

On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
> > From: John Fastabend <john.fastabend@gmail.com>
> > Date: Wed, 27 Dec 2017 19:50:25 -0800
> > 
> > > When running consumer and/or producer operations and empty checks in
> > > parallel its possible to have the empty check run past the end of the
> > > array. The scenario occurs when an empty check is run while
> > > __ptr_ring_discard_one() is in progress. Specifically after the
> > > consumer_head is incremented but before (consumer_head >= ring_size)
> > > check is made and the consumer head is zeroe'd.
> > > 
> > > To resolve this, without having to rework how consumer/producer ops
> > > work on the array, simply add an extra dummy slot to the end of the
> > > array. Even if we did a rework to avoid the extra slot it looks
> > > like the normal case checks would suffer some so best to just
> > > allocate an extra pointer.
> > > 
> > > Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > 
> > Applied, thanks John.
> 
> I think that patch is wrong. I'd rather it got reverted.

And replaced with something like the following - stil untested, but
apparently there's some urgency to fix it so posting for review ASAP.

John, others, could you pls confirm it's not too bad performance-wise?
I'll then split it up properly and re-post.

-->

net: don't misuse ptr_ring_peek

ptr_ring_peek only claims to be safe if the result is never
dereferenced, which isn't the case for its use in sch_generic.
Add locked API variants and use the bh one here.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 6866df4..f3d96c7 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -236,6 +236,51 @@ static inline bool ptr_ring_empty_bh(struct ptr_ring *r)
 	return ret;
 }
 
+static inline void *ptr_ring_peek(struct ptr_ring *r)
+{
+	void *ret;
+
+	spin_lock(&r->consumer_lock);
+	ret = __ptr_ring_peek(r);
+	spin_unlock(&r->consumer_lock);
+
+	return ret;
+}
+
+static inline void *ptr_ring_peek_irq(struct ptr_ring *r)
+{
+	void *ret;
+
+	spin_lock_irq(&r->consumer_lock);
+	ret = __ptr_ring_peek(r);
+	spin_unlock_irq(&r->consumer_lock);
+
+	return ret;
+}
+
+static inline void *ptr_ring_peek_any(struct ptr_ring *r)
+{
+	unsigned long flags;
+	void *ret;
+
+	spin_lock_irqsave(&r->consumer_lock, flags);
+	ret = __ptr_ring_peek(r);
+	spin_unlock_irqrestore(&r->consumer_lock, flags);
+
+	return ret;
+}
+
+static inline void *ptr_ring_peek_bh(struct ptr_ring *r)
+{
+	void *ret;
+
+	spin_lock_bh(&r->consumer_lock);
+	ret = __ptr_ring_peek(r);
+	spin_unlock_bh(&r->consumer_lock);
+
+	return ret;
+}
+
 /* Must only be called after __ptr_ring_peek returned !NULL */
 static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 {
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index c7addf3..ee3625c 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -97,6 +97,26 @@ static inline bool skb_array_empty_any(struct skb_array *a)
 	return ptr_ring_empty_any(&a->ring);
 }
 
+static inline struct sk_buff *skb_array_peek(struct skb_array *a)
+{
+	return ptr_ring_peek(&a->ring);
+}
+
+static inline struct sk_buff *skb_array_peek_bh(struct skb_array *a)
+{
+	return ptr_ring_peek_bh(&a->ring);
+}
+
+static inline struct sk_buff *skb_array_peek_irq(struct skb_array *a)
+{
+	return ptr_ring_peek_irq(&a->ring);
+}
+
+static inline struct sk_buff *skb_array_peek_any(struct skb_array *a)
+{
+	return ptr_ring_peek_any(&a->ring);
+}
+
 static inline struct sk_buff *skb_array_consume(struct skb_array *a)
 {
 	return ptr_ring_consume(&a->ring);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cc069b2..9288e84 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
 	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
 		struct skb_array *q = band2list(priv, band);
 
-		skb = __skb_array_peek(q);
+		skb = skb_array_peek_bh(q);
 	}
 
 	return skb;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 17:01   ` Michael S. Tsirkin
  2018-01-02 17:17     ` Michael S. Tsirkin
@ 2018-01-02 18:33     ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2018-01-02 18:33 UTC (permalink / raw)
  To: mst; +Cc: john.fastabend, jakub.kicinski, xiyou.wangcong, jiri, netdev

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 2 Jan 2018 19:01:33 +0200

> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Wed, 27 Dec 2017 19:50:25 -0800
>> 
>> > When running consumer and/or producer operations and empty checks in
>> > parallel its possible to have the empty check run past the end of the
>> > array. The scenario occurs when an empty check is run while
>> > __ptr_ring_discard_one() is in progress. Specifically after the
>> > consumer_head is incremented but before (consumer_head >= ring_size)
>> > check is made and the consumer head is zeroe'd.
>> > 
>> > To resolve this, without having to rework how consumer/producer ops
>> > work on the array, simply add an extra dummy slot to the end of the
>> > array. Even if we did a rework to avoid the extra slot it looks
>> > like the normal case checks would suffer some so best to just
>> > allocate an extra pointer.
>> > 
>> > Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> > Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
>> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> 
>> Applied, thanks John.
> 
> I think that patch is wrong. I'd rather it got reverted.

I agree with John's logic, the asynchronous test is always safe in
this parallel access case and John's change solves the out-of-bounds
test.

If you propose a cleaner way to handle this as a follow-on patch I'll
be happy to consider it.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 17:17     ` Michael S. Tsirkin
@ 2018-01-02 21:27       ` John Fastabend
  2018-01-02 23:12         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2018-01-02 21:27 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: jakub.kicinski, xiyou.wangcong, jiri, netdev

On 01/02/2018 09:17 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
>> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Wed, 27 Dec 2017 19:50:25 -0800
>>>
>>>> When running consumer and/or producer operations and empty checks in
>>>> parallel its possible to have the empty check run past the end of the
>>>> array. The scenario occurs when an empty check is run while
>>>> __ptr_ring_discard_one() is in progress. Specifically after the
>>>> consumer_head is incremented but before (consumer_head >= ring_size)
>>>> check is made and the consumer head is zeroe'd.
>>>>
>>>> To resolve this, without having to rework how consumer/producer ops
>>>> work on the array, simply add an extra dummy slot to the end of the
>>>> array. Even if we did a rework to avoid the extra slot it looks
>>>> like the normal case checks would suffer some so best to just
>>>> allocate an extra pointer.
>>>>
>>>> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
>>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>>
>>> Applied, thanks John.
>>
>> I think that patch is wrong. I'd rather it got reverted.
> 
> And replaced with something like the following - stil untested, but
> apparently there's some urgency to fix it so posting for review ASAP.
> 

So the above ptr_ring patch is meant for the dequeue() case not
the peek case. Dequeue case shown here,

static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
{
        struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
        struct sk_buff *skb = NULL;
        int band;

        for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
                struct skb_array *q = band2list(priv, band);

                if (__skb_array_empty(q))
                        continue;

                skb = skb_array_consume_bh(q);
        }
        if (likely(skb)) {
                qdisc_qstats_cpu_backlog_dec(qdisc, skb);
                qdisc_bstats_cpu_update(qdisc, skb);
                qdisc_qstats_cpu_qlen_dec(qdisc);
        }

        return skb;
}

In the dequeue case we use it purely as a hint and then do a proper
consume (with locks) if needed. A false negative in this case means
a consume is happening on another cpu due to a reset op or a
dequeue op. (an aside but I'm evaluating if we should only allow a
single dequeue'ing cpu at a time more below?) If its a reset op
that caused the false negative it is OK because we are flushing the
array anyways. If it is a dequeue op it is also OK because this core
will abort but the running core will continue to dequeue avoiding a
stall. So I believe false negatives are OK in the above function.

> John, others, could you pls confirm it's not too bad performance-wise?
> I'll then split it up properly and re-post.
> 

I haven't benchmarked it but in the dequeue case taking a lock for
every priority even when its empty seems unneeded.

> -->
> 
> net: don't misuse ptr_ring_peek
> 
> ptr_ring_peek only claims to be safe if the result is never
> dereferenced, which isn't the case for its use in sch_generic.
> Add locked API variants and use the bh one here.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 

[...]

> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>  		struct skb_array *q = band2list(priv, band);
>  
> -		skb = __skb_array_peek(q);
> +		skb = skb_array_peek_bh(q);

Ah I should have added a comment here. For now peek() is only used from
locking qdiscs. So peek and consume/produce operations will never happen
in parallel. In this case we should never hit the false negative case with
my patch or the out of bounds reference without my patch.

Doing a peek() op without qdisc lock is a bit problematic anyways. With
current code another cpu could consume the skb and free it. Either we
can ensure a single consumer runs at a time on an array (not the same as
qdisc maybe) or just avoid peek operations in this case. My current plan
was to just avoid peek() ops altogether, they seem unnecessary with the
types of qdiscs I want to be build.

Thanks,
John

>  	}
>  
>  	return skb;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 21:27       ` John Fastabend
@ 2018-01-02 23:12         ` Michael S. Tsirkin
  2018-01-03  0:25           ` John Fastabend
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-02 23:12 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev

On Tue, Jan 02, 2018 at 01:27:23PM -0800, John Fastabend wrote:
> On 01/02/2018 09:17 AM, Michael S. Tsirkin wrote:
> > On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
> >> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
> >>> From: John Fastabend <john.fastabend@gmail.com>
> >>> Date: Wed, 27 Dec 2017 19:50:25 -0800
> >>>
> >>>> When running consumer and/or producer operations and empty checks in
> >>>> parallel its possible to have the empty check run past the end of the
> >>>> array. The scenario occurs when an empty check is run while
> >>>> __ptr_ring_discard_one() is in progress. Specifically after the
> >>>> consumer_head is incremented but before (consumer_head >= ring_size)
> >>>> check is made and the consumer head is zeroe'd.
> >>>>
> >>>> To resolve this, without having to rework how consumer/producer ops
> >>>> work on the array, simply add an extra dummy slot to the end of the
> >>>> array. Even if we did a rework to avoid the extra slot it looks
> >>>> like the normal case checks would suffer some so best to just
> >>>> allocate an extra pointer.
> >>>>
> >>>> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
> >>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> >>>
> >>> Applied, thanks John.
> >>
> >> I think that patch is wrong. I'd rather it got reverted.
> > 
> > And replaced with something like the following - stil untested, but
> > apparently there's some urgency to fix it so posting for review ASAP.
> > 
> 
> So the above ptr_ring patch is meant for the dequeue() case not
> the peek case. Dequeue case shown here,
> 
> static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> {
>         struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>         struct sk_buff *skb = NULL;
>         int band;
> 
>         for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>                 struct skb_array *q = band2list(priv, band);
> 
>                 if (__skb_array_empty(q))
>                         continue;
> 
>                 skb = skb_array_consume_bh(q);
>         }
>         if (likely(skb)) {
>                 qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>                 qdisc_bstats_cpu_update(qdisc, skb);
>                 qdisc_qstats_cpu_qlen_dec(qdisc);
>         }
> 
>         return skb;
> }
> 
> In the dequeue case we use it purely as a hint and then do a proper
> consume (with locks) if needed.  A false negative in this case means
> a consume is happening on another cpu due to a reset op or a
> dequeue op. (an aside but I'm evaluating if we should only allow a
> single dequeue'ing cpu at a time more below?) If its a reset op
> that caused the false negative it is OK because we are flushing the
> array anyways. If it is a dequeue op it is also OK because this core
> will abort but the running core will continue to dequeue avoiding a
> stall. So I believe false negatives are OK in the above function.

I'm not 100% sure I understand. We don't always dequeue until empty.
E.g. it seems that another CPU could dequeue an skb and then stop
because e.g. it reached a byte queue limit, then this CPU gets a false
negativesand stops because of that.


More generally, what makes this usage safe?
Is there a way to formalize it at the API level?

> > John, others, could you pls confirm it's not too bad performance-wise?
> > I'll then split it up properly and re-post.
> > 
> 
> I haven't benchmarked it but in the dequeue case taking a lock for
> every priority even when its empty seems unneeded.

Well it does seem to make the code more comprehensible and more robust.

But OK -  I was looking at fixing the unlocked empty API to make sure it
actually does what it's supposed to. I posted a draft earlier in this
thread, it needs to be looked at in depth to figure out whether it can
ever give false negatives or positives, and document the results.



> > -->
> > 
> > net: don't misuse ptr_ring_peek
> > 
> > ptr_ring_peek only claims to be safe if the result is never
> > dereferenced, which isn't the case for its use in sch_generic.
> > Add locked API variants and use the bh one here.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> 
> [...]
> 
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> >  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >  		struct skb_array *q = band2list(priv, band);
> >  
> > -		skb = __skb_array_peek(q);
> > +		skb = skb_array_peek_bh(q);
> 
> Ah I should have added a comment here. For now peek() is only used from
> locking qdiscs. So peek and consume/produce operations will never happen
> in parallel. In this case we should never hit the false negative case with
> my patch or the out of bounds reference without my patch.
> 
> Doing a peek() op without qdisc lock is a bit problematic anyways. With
> current code another cpu could consume the skb and free it. Either we
> can ensure a single consumer runs at a time on an array (not the same as
> qdisc maybe) or just avoid peek operations in this case. My current plan
> was to just avoid peek() ops altogether, they seem unnecessary with the
> types of qdiscs I want to be build.
> 
> Thanks,
> John

For the lockless qdisc, for net-next, do we need the patch above until you fix that though?

> >  	}
> >  
> >  	return skb;
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-02 23:12         ` Michael S. Tsirkin
@ 2018-01-03  0:25           ` John Fastabend
  2018-01-03 15:50             ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2018-01-03  0:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev

On 01/02/2018 03:12 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 01:27:23PM -0800, John Fastabend wrote:
>> On 01/02/2018 09:17 AM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
>>>>> From: John Fastabend <john.fastabend@gmail.com>
>>>>> Date: Wed, 27 Dec 2017 19:50:25 -0800
>>>>>
>>>>>> When running consumer and/or producer operations and empty checks in
>>>>>> parallel its possible to have the empty check run past the end of the
>>>>>> array. The scenario occurs when an empty check is run while
>>>>>> __ptr_ring_discard_one() is in progress. Specifically after the
>>>>>> consumer_head is incremented but before (consumer_head >= ring_size)
>>>>>> check is made and the consumer head is zeroe'd.
>>>>>>
>>>>>> To resolve this, without having to rework how consumer/producer ops
>>>>>> work on the array, simply add an extra dummy slot to the end of the
>>>>>> array. Even if we did a rework to avoid the extra slot it looks
>>>>>> like the normal case checks would suffer some so best to just
>>>>>> allocate an extra pointer.
>>>>>>
>>>>>> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>>>> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
>>>>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>>>>
>>>>> Applied, thanks John.
>>>>
>>>> I think that patch is wrong. I'd rather it got reverted.
>>>
>>> And replaced with something like the following - stil untested, but
>>> apparently there's some urgency to fix it so posting for review ASAP.
>>>
>>
>> So the above ptr_ring patch is meant for the dequeue() case not
>> the peek case. Dequeue case shown here,
>>
>> static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>> {
>>         struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>>         struct sk_buff *skb = NULL;
>>         int band;
>>
>>         for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>                 struct skb_array *q = band2list(priv, band);
>>
>>                 if (__skb_array_empty(q))
>>                         continue;
>>
>>                 skb = skb_array_consume_bh(q);
>>         }
>>         if (likely(skb)) {
>>                 qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>                 qdisc_bstats_cpu_update(qdisc, skb);
>>                 qdisc_qstats_cpu_qlen_dec(qdisc);
>>         }
>>
>>         return skb;
>> }
>>
>> In the dequeue case we use it purely as a hint and then do a proper
>> consume (with locks) if needed.  A false negative in this case means
>> a consume is happening on another cpu due to a reset op or a
>> dequeue op. (an aside but I'm evaluating if we should only allow a
>> single dequeue'ing cpu at a time more below?) If its a reset op
>> that caused the false negative it is OK because we are flushing the
>> array anyways. If it is a dequeue op it is also OK because this core
>> will abort but the running core will continue to dequeue avoiding a
>> stall. So I believe false negatives are OK in the above function.
> 
> I'm not 100% sure I understand. We don't always dequeue until empty.
> E.g. it seems that another CPU could dequeue an skb and then stop
> because e.g. it reached a byte queue limit, then this CPU gets a false
> negativesand stops because of that.
> 

If we don't dequeue until empty, byte limits or skb limits are reached,
then netif_reschedule is called and we reschedule another qdisc_run()
in tx_action.

> 
> More generally, what makes this usage safe?
> Is there a way to formalize it at the API level?
> 

Right I think these are good questions. I think the ptr_ring API should
allow a peek operation to be used without a lock. The user has to ensure
they only use it as a hint and if its dereferenced the user needs to
ensure the object is not free'd from some other codepath while it is
being dereferenced. The existing API seems to match this.

This is how I used it in pfifo_fast expecting the above to be true. The
API allows for false negatives which _should_ be OK if the user is expecting
this. Alternatively, we could make it false positives if you want and
that would also work for me considering this case is hit very rarely.

>>> John, others, could you pls confirm it's not too bad performance-wise?
>>> I'll then split it up properly and re-post.
>>>
>>
>> I haven't benchmarked it but in the dequeue case taking a lock for
>> every priority even when its empty seems unneeded.
> 
> Well it does seem to make the code more comprehensible and more robust.
> 

Its a trade-off between performance and robustness.

> But OK -  I was looking at fixing the unlocked empty API to make sure it
> actually does what it's supposed to. I posted a draft earlier in this
> thread, it needs to be looked at in depth to figure out whether it can
> ever give false negatives or positives, and document the results.
> 
> 

I'll look at it. But I still think keeping a lockless version makes sense
for many use cases.

> 
>>> -->
>>>
>>> net: don't misuse ptr_ring_peek
>>>
>>> ptr_ring_peek only claims to be safe if the result is never
>>> dereferenced, which isn't the case for its use in sch_generic.
>>> Add locked API variants and use the bh one here.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> ---
>>>
>>
>> [...]
>>
>>> --- a/net/sched/sch_generic.c
>>> +++ b/net/sched/sch_generic.c
>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
>>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>>  		struct skb_array *q = band2list(priv, band);
>>>  
>>> -		skb = __skb_array_peek(q);
>>> +		skb = skb_array_peek_bh(q);
>>
>> Ah I should have added a comment here. For now peek() is only used from
>> locking qdiscs. So peek and consume/produce operations will never happen
>> in parallel. In this case we should never hit the false negative case with
>> my patch or the out of bounds reference without my patch.
>>
>> Doing a peek() op without qdisc lock is a bit problematic anyways. With
>> current code another cpu could consume the skb and free it. Either we
>> can ensure a single consumer runs at a time on an array (not the same as
>> qdisc maybe) or just avoid peek operations in this case. My current plan
>> was to just avoid peek() ops altogether, they seem unnecessary with the
>> types of qdiscs I want to be build.
>>
>> Thanks,
>> John
> 
> For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
> 

No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
applied we do not need any additional fixes in net-next. Future work will
require the above patch (the one you provided) though so its useful work.

I'll do another review of the false positive case though to be sure the
current code is OK wrt handling false positives and any potential stalls.

>>>  	}
>>>  
>>>  	return skb;
>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-03  0:25           ` John Fastabend
@ 2018-01-03 15:50             ` Michael S. Tsirkin
  2018-01-03 17:46               ` John Fastabend
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-03 15:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev

On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
> > 
> > More generally, what makes this usage safe?
> > Is there a way to formalize it at the API level?
> > 
> 
> Right I think these are good questions. I think the ptr_ring API should
> allow a peek operation to be used without a lock. The user has to ensure
> they only use it as a hint and if its dereferenced the user needs to
> ensure the object is not free'd from some other codepath while it is
> being dereferenced. The existing API seems to match this.
> 
> This is how I used it in pfifo_fast expecting the above to be true. The
> API allows for false negatives which _should_ be OK if the user is expecting
> this. Alternatively, we could make it false positives if you want and
> that would also work for me considering this case is hit very rarely.

By now I'm confused by which are positives and which are negatives :)
OK so the guarantees we want would be:

- empty can return false if ring is empty.
  caller must re-check with consumer lock taken
- if multiple threads call it, only one thread
  is guaranteed that empty will not return true
  if ring is non empty.
  after detecting that ring is not empty,
  this thread shall ....

can you help me fill in the blank please?





> >>> John, others, could you pls confirm it's not too bad performance-wise?
> >>> I'll then split it up properly and re-post.
> >>>
> >>
> >> I haven't benchmarked it but in the dequeue case taking a lock for
> >> every priority even when its empty seems unneeded.
> > 
> > Well it does seem to make the code more comprehensible and more robust.
> > 
> 
> Its a trade-off between performance and robustness.
> 
> > But OK -  I was looking at fixing the unlocked empty API to make sure it
> > actually does what it's supposed to. I posted a draft earlier in this
> > thread, it needs to be looked at in depth to figure out whether it can
> > ever give false negatives or positives, and document the results.
> > 
> > 
> 
> I'll look at it. But I still think keeping a lockless version makes sense
> for many use cases.

Fine. Just let's try to document what are the guarantees.

> > 
> >>> -->
> >>>
> >>> net: don't misuse ptr_ring_peek
> >>>
> >>> ptr_ring_peek only claims to be safe if the result is never
> >>> dereferenced, which isn't the case for its use in sch_generic.
> >>> Add locked API variants and use the bh one here.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>> ---
> >>>
> >>
> >> [...]
> >>
> >>> --- a/net/sched/sch_generic.c
> >>> +++ b/net/sched/sch_generic.c
> >>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> >>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >>>  		struct skb_array *q = band2list(priv, band);
> >>>  
> >>> -		skb = __skb_array_peek(q);
> >>> +		skb = skb_array_peek_bh(q);
> >>
> >> Ah I should have added a comment here. For now peek() is only used from
> >> locking qdiscs. So peek and consume/produce operations will never happen
> >> in parallel. In this case we should never hit the false negative case with
> >> my patch or the out of bounds reference without my patch.

OK so this is the part I missed. Can you add a comment please?


> >> Doing a peek() op without qdisc lock is a bit problematic anyways. With
> >> current code another cpu could consume the skb and free it. Either we
> >> can ensure a single consumer runs at a time on an array (not the same as
> >> qdisc maybe) or just avoid peek operations in this case. My current plan
> >> was to just avoid peek() ops altogether, they seem unnecessary with the
> >> types of qdiscs I want to be build.
> >>
> >> Thanks,
> >> John
> > 
> > For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
> > 
> 
> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
> applied we do not need any additional fixes in net-next. Future work will
> require the above patch (the one you provided) though so its useful work.

The one that avoids allocating extra memory?


> I'll do another review of the false positive case though to be sure the
> current code is OK wrt handling false positives and any potential stalls.


Thanks!

> >>>  	}
> >>>  
> >>>  	return skb;
> >>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-03 15:50             ` Michael S. Tsirkin
@ 2018-01-03 17:46               ` John Fastabend
  2018-01-03 18:34                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2018-01-03 17:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev

On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
>>>
>>> More generally, what makes this usage safe?
>>> Is there a way to formalize it at the API level?
>>>
>>
>> Right I think these are good questions. I think the ptr_ring API should
>> allow a peek operation to be used without a lock. The user has to ensure
>> they only use it as a hint and if its dereferenced the user needs to
>> ensure the object is not free'd from some other codepath while it is
>> being dereferenced. The existing API seems to match this.
>>
>> This is how I used it in pfifo_fast expecting the above to be true. The
>> API allows for false negatives which _should_ be OK if the user is expecting
>> this. Alternatively, we could make it false positives if you want and
>> that would also work for me considering this case is hit very rarely.
> 
> By now I'm confused by which are positives and which are negatives :)
> OK so the guarantees we want would be:
> 
> - empty can return false if ring is empty.
>   caller must re-check with consumer lock taken

Correct.

> - if multiple threads call it, only one thread

What is "it" here, __skb_array_empty() presumably.

>   is guaranteed that empty will not return true
>   if ring is non empty.
>   after detecting that ring is not empty,
>   this thread shall ....
> 
The guarantee is even weaker.

  - if __skb_array_empty() returns true, either the
    ring is empty OR a consumer operation is in
    progress.

For pfifo_fast this is OK because some thread is making
progress at this point. Note, even if multiple threads
are calling __skb_array_empty() there is no guarantee
any of them will return false even if the ring has
elements.

> can you help me fill in the blank please?
> 

Did that help?

> 
> 
> 
> 
>>>>> John, others, could you pls confirm it's not too bad performance-wise?
>>>>> I'll then split it up properly and re-post.
>>>>>
>>>>
>>>> I haven't benchmarked it but in the dequeue case taking a lock for
>>>> every priority even when its empty seems unneeded.
>>>
>>> Well it does seem to make the code more comprehensible and more robust.
>>>
>>
>> Its a trade-off between performance and robustness.
>>
>>> But OK -  I was looking at fixing the unlocked empty API to make sure it
>>> actually does what it's supposed to. I posted a draft earlier in this
>>> thread, it needs to be looked at in depth to figure out whether it can
>>> ever give false negatives or positives, and document the results.
>>>
>>>
>>
>> I'll look at it. But I still think keeping a lockless version makes sense
>> for many use cases.
> 
> Fine. Just let's try to document what are the guarantees.
> 

Yep. Guarantees should be (on ptr_ring implementation and similar
on skb_array implementation)

  - if __ptr_ring_empty returns true, the ring may be empty
    user must use operation with consumer lock to guarantee the
    ring is empty. Note, when run with concurrent consumer operations
    it is possible to return true when ring has elements. This
    occurs when ring consumer head rolls over ring size. Also,
    producer operations may put object in the ring concurrently
    making empty check incorrect. To avoid this use locked
    ptr_ring_empty() API.

  - if __ptr_ring_empty returns false, the ring may have elements.
    Possibly, scenario is consumer operation and __ptr_ring_empty
    operation run concurrently causing false to be returned when
    ring is empty. To avoid this use locked ptr_ring_empty() API.

>>>
>>>>> -->
>>>>>
>>>>> net: don't misuse ptr_ring_peek
>>>>>
>>>>> ptr_ring_peek only claims to be safe if the result is never
>>>>> dereferenced, which isn't the case for its use in sch_generic.
>>>>> Add locked API variants and use the bh one here.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>
>>>>> ---
>>>>>
>>>>
>>>> [...]
>>>>
>>>>> --- a/net/sched/sch_generic.c
>>>>> +++ b/net/sched/sch_generic.c
>>>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
>>>>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>>>>  		struct skb_array *q = band2list(priv, band);
>>>>>  
>>>>> -		skb = __skb_array_peek(q);
>>>>> +		skb = skb_array_peek_bh(q);
>>>>
>>>> Ah I should have added a comment here. For now peek() is only used from
>>>> locking qdiscs. So peek and consume/produce operations will never happen
>>>> in parallel. In this case we should never hit the false negative case with
>>>> my patch or the out of bounds reference without my patch.
> 
> OK so this is the part I missed. Can you add a comment please?
> 
> 

Yes, working on a documentation patch set to address misleading and missing
comments/documentation in qdisc layer now.

>>>> Doing a peek() op without qdisc lock is a bit problematic anyways. With
>>>> current code another cpu could consume the skb and free it. Either we
>>>> can ensure a single consumer runs at a time on an array (not the same as
>>>> qdisc maybe) or just avoid peek operations in this case. My current plan
>>>> was to just avoid peek() ops altogether, they seem unnecessary with the
>>>> types of qdiscs I want to be build.
>>>>
>>>> Thanks,
>>>> John
>>>
>>> For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
>>>
>>
>> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
>> applied we do not need any additional fixes in net-next. Future work will
>> require the above patch (the one you provided) though so its useful work.
> 
> The one that avoids allocating extra memory?
> 

Let me try summarize again.

We need the extra memory slot patch if we expose the
__ptr_ring_empty/__skb_array_empty API. Otherwise we hit the out
of bounds issues. And for qdisc layer the __skb_array_empty() API
call is useful so we should not remove it.

In addition to the qdisc layer using the __skb_array_empty() API if
we need a peek() API that works without the qdisc lock then your patch,
the one adding the locked peek API, will be needed. It is not needed
though until we have a qdisc that would use it without the lock.

In summary we keep the extra allocation to make the __skb_array_empty() API
usable. And hold off exposing the skb_array_peek()/ptr_ring_peek() until we
have a user for it. We can roll your proposed patch into any series we come
up with for a new/updated qdisc.

> 
>> I'll do another review of the false positive case though to be sure the
>> current code is OK wrt handling false positives and any potential stalls.
> 
> 
> Thanks!
> 
>>>>>  	}
>>>>>  
>>>>>  	return skb;
>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
  2018-01-03 17:46               ` John Fastabend
@ 2018-01-03 18:34                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-03 18:34 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev

On Wed, Jan 03, 2018 at 09:46:15AM -0800, John Fastabend wrote:
> On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote:
> > On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
> >>>
> >>> More generally, what makes this usage safe?
> >>> Is there a way to formalize it at the API level?
> >>>
> >>
> >> Right I think these are good questions. I think the ptr_ring API should
> >> allow a peek operation to be used without a lock. The user has to ensure
> >> they only use it as a hint and if its dereferenced the user needs to
> >> ensure the object is not free'd from some other codepath while it is
> >> being dereferenced. The existing API seems to match this.
> >>
> >> This is how I used it in pfifo_fast expecting the above to be true. The
> >> API allows for false negatives which _should_ be OK if the user is expecting
> >> this. Alternatively, we could make it false positives if you want and
> >> that would also work for me considering this case is hit very rarely.
> > 
> > By now I'm confused by which are positives and which are negatives :)
> > OK so the guarantees we want would be:
> > 
> > - empty can return false if ring is empty.
> >   caller must re-check with consumer lock taken
> 
> Correct.
> 
> > - if multiple threads call it, only one thread
> 
> What is "it" here, __skb_array_empty() presumably.
> 
> >   is guaranteed that empty will not return true
> >   if ring is non empty.
> >   after detecting that ring is not empty,
> >   this thread shall ....
> > 
> The guarantee is even weaker.
> 
>   - if __skb_array_empty() returns true, either the
>     ring is empty OR a consumer operation is in
>     progress.

Well I'm not sure that's guaranteed actually, in that
in progress isn't all that well defined on multi-core
systems.

E.g. write of NULL could bypass index write and
empty will return true on another CPU even though
both completed on our CPU.

I still think our use is ok, I'm just still having
trouble formulating the exact rules.

> 
> For pfifo_fast this is OK because some thread is making
> progress at this point.
>
>
> Note, even if multiple threads
> are calling __skb_array_empty() there is no guarantee
> any of them will return false even if the ring has
> elements.
>
> > can you help me fill in the blank please?
> > 
> 
> Did that help?
> 
> > 
> > 
> > 
> > 
> >>>>> John, others, could you pls confirm it's not too bad performance-wise?
> >>>>> I'll then split it up properly and re-post.
> >>>>>
> >>>>
> >>>> I haven't benchmarked it but in the dequeue case taking a lock for
> >>>> every priority even when its empty seems unneeded.
> >>>
> >>> Well it does seem to make the code more comprehensible and more robust.
> >>>
> >>
> >> Its a trade-off between performance and robustness.
> >>
> >>> But OK -  I was looking at fixing the unlocked empty API to make sure it
> >>> actually does what it's supposed to. I posted a draft earlier in this
> >>> thread, it needs to be looked at in depth to figure out whether it can
> >>> ever give false negatives or positives, and document the results.
> >>>
> >>>
> >>
> >> I'll look at it. But I still think keeping a lockless version makes sense
> >> for many use cases.
> > 
> > Fine. Just let's try to document what are the guarantees.
> > 
> 
> Yep. Guarantees should be (on ptr_ring implementation and similar
> on skb_array implementation)
> 
>   - if __ptr_ring_empty returns true, the ring may be empty
>     user must use operation with consumer lock to guarantee the
>     ring is empty. Note, when run with concurrent consumer operations
>     it is possible to return true when ring has elements. This
>     occurs when ring consumer head rolls over ring size. Also,
>     producer operations may put object in the ring concurrently
>     making empty check incorrect. To avoid this use locked
>     ptr_ring_empty() API.
> 
>   - if __ptr_ring_empty returns false, the ring may have elements.
>     Possibly, scenario is consumer operation and __ptr_ring_empty
>     operation run concurrently causing false to be returned when
>     ring is empty. To avoid this use locked ptr_ring_empty() API.
> 
> >>>
> >>>>> -->
> >>>>>
> >>>>> net: don't misuse ptr_ring_peek
> >>>>>
> >>>>> ptr_ring_peek only claims to be safe if the result is never
> >>>>> dereferenced, which isn't the case for its use in sch_generic.
> >>>>> Add locked API variants and use the bh one here.
> >>>>>
> >>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>
> >>>>> ---
> >>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>> --- a/net/sched/sch_generic.c
> >>>>> +++ b/net/sched/sch_generic.c
> >>>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> >>>>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >>>>>  		struct skb_array *q = band2list(priv, band);
> >>>>>  
> >>>>> -		skb = __skb_array_peek(q);
> >>>>> +		skb = skb_array_peek_bh(q);
> >>>>
> >>>> Ah I should have added a comment here. For now peek() is only used from
> >>>> locking qdiscs. So peek and consume/produce operations will never happen
> >>>> in parallel. In this case we should never hit the false negative case with
> >>>> my patch or the out of bounds reference without my patch.
> > 
> > OK so this is the part I missed. Can you add a comment please?
> > 
> > 
> 
> Yes, working on a documentation patch set to address misleading and missing
> comments/documentation in qdisc layer now.
> 
> >>>> Doing a peek() op without qdisc lock is a bit problematic anyways. With
> >>>> current code another cpu could consume the skb and free it. Either we
> >>>> can ensure a single consumer runs at a time on an array (not the same as
> >>>> qdisc maybe) or just avoid peek operations in this case. My current plan
> >>>> was to just avoid peek() ops altogether, they seem unnecessary with the
> >>>> types of qdiscs I want to be build.
> >>>>
> >>>> Thanks,
> >>>> John
> >>>
> >>> For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
> >>>
> >>
> >> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
> >> applied we do not need any additional fixes in net-next. Future work will
> >> require the above patch (the one you provided) though so its useful work.
> > 
> > The one that avoids allocating extra memory?
> > 
> 
> Let me try summarize again.
> 
> We need the extra memory slot patch if we expose the
> __ptr_ring_empty/__skb_array_empty API. Otherwise we hit the out
> of bounds issues. And for qdisc layer the __skb_array_empty() API
> call is useful so we should not remove it.
> 
> In addition to the qdisc layer using the __skb_array_empty() API if
> we need a peek() API that works without the qdisc lock then your patch,
> the one adding the locked peek API, will be needed. It is not needed
> though until we have a qdisc that would use it without the lock.
> 
> In summary we keep the extra allocation to make the __skb_array_empty() API
> usable. And hold off exposing the skb_array_peek()/ptr_ring_peek() until we
> have a user for it. We can roll your proposed patch into any series we come
> up with for a new/updated qdisc.
> 
> > 
> >> I'll do another review of the false positive case though to be sure the
> >> current code is OK wrt handling false positives and any potential stalls.
> > 
> > 
> > Thanks!
> > 
> >>>>>  	}
> >>>>>  
> >>>>>  	return skb;
> >>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-01-03 18:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-28  3:50 [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds John Fastabend
2018-01-02 16:52 ` David Miller
2018-01-02 17:01   ` Michael S. Tsirkin
2018-01-02 17:17     ` Michael S. Tsirkin
2018-01-02 21:27       ` John Fastabend
2018-01-02 23:12         ` Michael S. Tsirkin
2018-01-03  0:25           ` John Fastabend
2018-01-03 15:50             ` Michael S. Tsirkin
2018-01-03 17:46               ` John Fastabend
2018-01-03 18:34                 ` Michael S. Tsirkin
2018-01-02 18:33     ` David Miller
2018-01-02 16:53 ` Michael S. Tsirkin
2018-01-02 17:01   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.